Closed Bug 1698855 Opened 3 years ago Closed 2 years ago

Poor performance when Telegram and Google Chat web clients are in use.

Categories

(Core :: Performance, defect)

Firefox 88
x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED
Performance Impact low
Tracking Status
firefox88 --- affected
firefox89 --- affected
firefox90 --- affected
firefox91 --- affected
firefox97 --- affected
firefox98 --- affected
firefox99 --- affected

People

(Reporter: alex_mayorga, Unassigned)

References

Details

(Keywords: nightly-community, perf:responsiveness, Whiteboard: [wfh])

Attachments

(1 file)

Basic information

Steps to Reproduce:

Load and login to both
https://web.telegram.org/
and
https://chat.google.com/

Expected Results:

Good performance

Actual Results:

Poor performance


More information

Screenshot: N/A

Profile URL: https://share.firefox.dev/3cBpZsR

Basic systems configuration:

OS version: Windows 10 Pro Insider Preview 21332.1000

GPU model: Intel HD Graphics 620

Number of cores: 2

Amount of memory (RAM): 24 GB

Thanks so much for your help.

In high CPU load on the main thread is all about Window.getWorkspaceID (bug 1640852). But not sure how much the web content process should be affected by that. As such it might be a dupe of bug 1640852.

Florian, what do you think?

Flags: needinfo?(florian)

There are some slow restyles (and to a lesser extent, some slow reflows) in there, too. Here's a zoomed in selection of the profile (for a chat.google.com process):
https://share.firefox.dev/3twmYB2

I'll spin off a helper bug for those.

Whiteboard: [qf:p1:responsiveness]
Depends on: 1699500
Depends on: 1699501

Alex, want to test the latest Nightly? I think bug 1640852 is fixed there.

Flags: needinfo?(alex_mayorga)

¡Hi Olli!

I tested it and got a new profile FWIW:

https://share.firefox.dev/3u821NC

Please let me know if you need anything further from the profile.

¡Gracias!
Alex

Flags: needinfo?(alex_mayorga)

¡Hola Olli!

There was a redesign of GChat but performance is still pretty horrible IMHO.

I captured another profile FWIW:

https://share.firefox.dev/3nTVnrU

¡Gracias!
Alex

Flags: needinfo?(bugs)

(In reply to alex_mayorga from comment #5)

I captured another profile FWIW:

Would you mind recording another profile? That one from comment 5 seems to have failed to get symbols for some reason, which makes it hard to interpret. (Its backtrace show a lot of things like 0x7ff9ffa85799 xul.dll instead of the function name.) (I think this can happen randomly sometimes, if e.g. there's a network error communicating with the web-hosted symbols server just after you capture the profile.)

Flags: needinfo?(alex_mayorga)

(In reply to Daniel Holbert [:dholbert] from comment #6)

(In reply to alex_mayorga from comment #5)

I captured another profile FWIW:

Would you mind recording another profile? That one from comment 5 seems to have failed to get symbols for some reason, which makes it hard to interpret. (Its backtrace show a lot of things like 0x7ff9ffa85799 xul.dll instead of the function name.) (I think this can happen randomly sometimes, if e.g. there's a network error communicating with the web-hosted symbols server just after you capture the profile.)

Symbolication errors are now easily recoverable without needing to re-capture a profile. In the profiler UI, you can click on the "Profile Info" button to open the profile meta data panel. There you can click a "Re-symbolicate profile" button.

I re-symbolicated and re-uploaded for you the profile from comment 5: https://share.firefox.dev/3eoGoDl

Flags: needinfo?(florian)
Flags: needinfo?(alex_mayorga)

That is a confusing profile. It shows lots of network activity from chat.google.com, but there doesn't seem to be child process for that.
And network response time is very low. Did you have some network issues when you captured the profile?

Flags: needinfo?(bugs) → needinfo?(alex_mayorga)

¡Hola Olli!

IIRC, there were no network issues at the time.

FWIW, here's a fresh profile from today:

https://share.firefox.dev/3hgxf1f

Clicking chat rooms icons on Google Chat doesn't even open them now.

It work on Chromium just fine.

Please let me know if you need anything further from the profile.

¡Gracias!
Alex

Flags: needinfo?(alex_mayorga) → needinfo?(bugs)

¡Hola!

Seems like the previous one didn't even capture anything for Google Chat.

Here's one with only Google Chat:

https://share.firefox.dev/33w816S

The client is unusable on Nightly now as clicking on the chat icons and titles doesn't even renders them...

Hope this is helpful on identifying the problem.

¡Gracias!
Alex

Whiteboard: [qf:p1:responsiveness] → [qf:p1:responsiveness][wfh]

I'm trying to see some slowness locally, but so far no luck. Both Firefox and Chrome are a bit slow to load emojis initially and that is when there is some higher cpu usage but after that things are fine.

The profile in comment 10 shows that Lastpass extension is taking some time. Somehow is it rather well hidden, but it does do some non-trivial stuff. (Lastpass has traditionally caused performance issues)

Flags: needinfo?(bugs)

¡Hola y'all!

Had another pretty terrible experiencing typing in chat today.

Here's a profile FWIW:
https://share.firefox.dev/3wkOZx2

I now wonder if this relates to https://bugzilla.mozilla.org/show_bug.cgi?id=1707762 as well.

Hope this is useful on the investigation.

¡Gracias!
Alex

https://share.firefox.dev/3v8f7tJ is a part of the profile in comment 12.
There are very few samples on the main thread of the child process.

¡Hola Olli!

Had another occurrence of very slow typing in Google Chat today.

Captured another profile in case it is useful:

https://share.firefox.dev/3xmgoyO

Please let me know if there's anything else you might need form the profile or device.

¡Gracias!
Alex

Flags: needinfo?(bugs)

Is that on a system which has low cpu count but possibly quite a bit memory?

¡Hola Olli!

Here are the specs:

Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz 2.90 GHz
2 cores, 4 logical processors
24.0 GB (23.8 GB usable)

Please let me know if there's anything else you might need form the profile or device.

¡Gracias!
Alex

By any chance do you have lots of other programs running at the same time as Firefox? Somehow some Firefox threads don't get cpu time too often.

Flags: needinfo?(bugs)

¡Hola Olli!

I do have other programs open but CPU hardly ever goes above 75%.

I do keep Task Manager open and visible at all times.

Please let me know if there's anything else you might need from the profile or device.

¡Gracias!
Alex

Flags: needinfo?(bugs)

Bas, I think we should explicitly do some testing on systems which have rather low cpu count (but possibly quite a bit memory, so that lack of memory doesn't cause perf issues). There might be some very specific scheduling issues which show up only with such setup.

Flags: needinfo?(bugs) → needinfo?(bas)

(In reply to Olli Pettay [:smaug] from comment #19)

Bas, I think we should explicitly do some testing on systems which have rather low cpu count (but possibly quite a bit memory, so that lack of memory doesn't cause perf issues). There might be some very specific scheduling issues which show up only with such setup.

If you mean by the OS scheduler, I agree.. we've seen profiles like this before. Other than process priority changes I'm not sure what we can do to affect this though, it's interesting that the parent process seems to be fine but the child processes are not. Could there be something about our sandboxed processes that causes the OS to deprioritize them?

Otherwise, is it possible something is wrong with profiler visualization for non-parent processes, Florian? (i.e. I see plenty of samples in the parent process here, but not much in the child: https://share.firefox.dev/2VgJXo1)

None of that explains what's happening here though: https://share.firefox.dev/36niJxX

Flags: needinfo?(bas) → needinfo?(florian)

(In reply to Bas Schouten (:bas.schouten) from comment #20)

(In reply to Olli Pettay [:smaug] from comment #19)

None of that explains what's happening here though: https://share.firefox.dev/36niJxX

On the other hand... maybe it does if the profiler is being scheduled but not the actual main thread. It's interesting that this type of jank never seems to occur in the parent process either. And it seems to occur both when the child process is actively doing work and when it's waiting in the event loop, reinforcing the idea that it's simply not being scheduled.

(In reply to Bas Schouten (:bas.schouten) from comment #20)

Otherwise, is it possible something is wrong with profiler visualization for non-parent processes, Florian?

Seems unlikely. My best guess is still that we might have a process priority issue, so I would be very interested in seeing a similar profile in a recent Nightly including the patches from bug 1667818 that will show process priorities in profiles.

Flags: needinfo?(florian)

Alex, I wonder if you can see any difference if you increase the priority of the parent process and/or gpu process using Windows
Task manager.
You can get the relevant process ids for example by capturing a memory profile using about:memory, and then in the
task manager set the priority through some context menu to something like "higher than normal".

Flags: needinfo?(alex_mayorga)

¡Hola Olli!

Hope these lines find you well.

As mentioned in #perf:mozilla.org the hardware on which this bug was 1st reported is no longer available to me.

I've now captured a new profile on the new hardware and hope this is useful on the investigation of this bug

https://share.firefox.dev/3iWTji9

Edition Windows 10 Pro
Version 21H1
Installed on ‎2021-‎06-‎17
OS build 19043.1110
Experience Windows Feature Experience Pack 120.2212.3530.0

Processor 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 1.80 GHz
Installed RAM 32.0 GB (31.7 GB usable)
System type 64-bit operating system, x64-based processor
Pen and touch No pen or touch input is available for this display
Number of cores: 4

Thanks so much for your help.

¡Gracias!
Alex

Flags: needinfo?(alex_mayorga)

¡Hola Olli!

As requested, a profile with "DOM Worker" and "IPC Messages":

https://share.firefox.dev/3skrVxi

Hope this is useful.

Thanks so much for your help.

¡Gracias!
Alex

Flags: needinfo?(bugs)

Oh, you had about:performance open at the same time as profiling google chat. about:performance is adding quite a bit overhead there.
Would be better to profile without any extra stuff open (no devtools, no about:performance etc.)
Other than that there isn't anything too interesting in the profile and plenty of idle time.

Flags: needinfo?(bugs)

¡Hola Olli!

As requested, a profile without about:performance loaded:

https://share.firefox.dev/3sv78r9

Hope this is useful.

Thanks so much for your help.

¡Gracias!
Alex

Flags: needinfo?(bugs)

Other than having unusually long GC slices, everything looks normal.
I wonder why there is those GCs. Did you have something heavy running before you did profiling?
(But even those GC slices use idle time and are nicely split to small pieces)

Flags: needinfo?(bugs)

¡Hola Olli!

Thanks for looking into this.

There was a meeting going on https://meet.google.com/ in another window IIRC.

Perhaps related to https://bugzilla.mozilla.org/show_bug.cgi?id=1720511 ?

¡Gracias!
Alex

Flags: needinfo?(bugs)

Hey I got a linux profile of the telegram login:
https://profiler.firefox.com/public/gs5ft74r4akzbqpgv
7r44j3eecr3re3jg0krfpr/calltree/?globalTrackOrder=780w6&hiddenGlobalTracks=1w5&hiddenLocalTracksByPid=5111-0&localTrackOrderByPid=5111-60w55686-05522-0&thread=d&timelineType=cpu-category&v=6

This is with software render, and am attempting to use nightly with non-software render is crashing nightly.... separate bz about that.

Talked with Bas: in the most recent profile from bug 1698855 comment 27 we noticed:

:jonco, do you know what might be up with the GCs in the parent process?

Flags: needinfo?(jcoppeard)

Alex, are you able to reproduce this issue on your new hardware? I wasn't sure from bug 1698855 comment 24 whether or not you could.

Flags: needinfo?(alex_mayorga)

I wasn't able to reproduce this issue (on my high-end dev hardware) using telegram, Google Chat, and Google Meet in the background.

If Alex is unable to reproduce on the new hardware, this sounds like this may be related to core count and process priority as discussed in comment 19 and later. If it's related to core count, we should determine how prominent low core-count devices are in our user population: if it's low, we may want to remove P1.

Note: we've already identified several issues to improve already (see the dependent bugs) and have a few more under investigation (comment 31 above).

(In reply to Michael Comella (:mcomella) [needinfo or I won't see it] from comment #31)
The GC in the linked profile is 400ms mostly split into 10-15ms slices. This is to collect a 73MB heap. I don't think it's that bad or unusual.

One thing that's strange is that the first two GCs in the profile as a whole take ~400ms while the second two take ~200ms to collect the same size heap. This may be due to the system being busy.

Flags: needinfo?(jcoppeard)

With the original hardware no longer being available to test, and the new hardware showing more "normal" performance characteristics in the profiles, I'm going to downgrade the priority of this bug. We can upgrade it again if it becomes actionable.

Whiteboard: [qf:p1:responsiveness][wfh] → [qf:p3:responsiveness][wfh]

¡Hola y'all!

¡Feliz 2022!

I believe the problem is still present even on the newest hardware available to me.
Please see https://bugzilla.mozilla.org/show_bug.cgi?id=1699500#c10

Hope this is useful.

Thanks so much for your help.

¡Gracias!
Alex

Flags: needinfo?(alex_mayorga)

Alex's profile that he referenced above is here: https://share.firefox.dev/35mk23a

He adds over on bug 1699500:
"I believe this issue still exists on today's Nightly, i.e. the redraw of icons for the "Spaces" is visible to my untrained eye and the fans on the laptop go crazy for quite a bit."

I've closed as WFM bug 1699500 (filed for slow restyles) and bug 1699501 (filed for slow reflows), since it seems restyle/reflow are no longer a problem here (we're not spending significant time in those areas when loading this site, like we were back when I filed them).

¡Hola Daniel!

So I gave chat.google.com another try today on

but it seemed to never finish loading (there's just a dot bouncing left and right where the tab icon should have been).

I captured this profile:

https://share.firefox.dev/3gQd498

Hope this is useful.

Please let me know if anything further is needed from the profile or device.

¡Gracias!
Alex

Flags: needinfo?(dholbert)
Attached image image.png

(In reply to alex_mayorga from comment #38)

but it seemed to never finish loading (there's just a dot bouncing left and right where the tab icon should have been).

I captured this profile:
https://share.firefox.dev/3gQd498

Thanks!

So in that profile, the biggest thing I see are some network requests to URLs on https://signaler-pa.clients6.google.com/ that are taking a loooong time for responses. You can seem them in the marker-chart at the bottom here -- the first one apparently was a request from before you started profiling, and it took 61 seconds before it got a response (based on "HTTP Response: 61,553ms" when you hover that bar); and the last one was still un-answered ("Status: waiting for response") at the end of your profile (after it had been waiting for ~16 seconds).

I'm assuming those requests are what the pageload is blocked on, which is why it never finishes loading.

This seems pretty-bad, but also seems explainable by network issues upstream of you. Could you check if it's still not-finishing-loading (i.e. maybe the network issues went away, due to a router reboot in some datacenter)? And if it's still happening, could you see if it's not-finishing-loading in other browsers as well?

Flags: needinfo?(dholbert) → needinfo?(alex_mayorga)

¡Hola Daniel!

Thanks for looking into this!

At the time I profiled this chat.google.com was loaded just fine and functional in an Edge window on the very same device.

I just tried loading it again on Nightly and I noticed NS_BINDING_ABORTED on GETs to these resources:
https://chat.google.com/u/0/frame?shell=9&origin=https%3A%2F%2Fmail.google.com#cb=gtn-roster-iframe-id&id=spareFrame8&n=1
https://chat.google.com/u/0/frame?shell=9&origin=https%3A%2F%2Fmail.google.com#cb=gtn-roster-iframe-id&id=spareFrame8&n=2

This shows in the console:

00:44:32.681 Content Security Policy: Couldn’t process unknown directive ‘require-trusted-types-for’
00:44:32.711 Content Security Policy: The page’s settings blocked the loading of a resource at inline (“script-src”). 2 moz-extension:71:796982
00:44:32.882 Content Security Policy: The page’s settings blocked the loading of a resource at inline (“script-src”). moz-extension:71:796982
00:44:32.882 Content Security Policy: The page’s settings blocked the loading of a resource at inline (“script-src”). moz-extension:71:796982
00:44:33.841 Content Security Policy: Ignoring “'unsafe-inline'” within script-src: ‘strict-dynamic’ specified
00:44:33.841 Content Security Policy: Ignoring “https:” within script-src: ‘strict-dynamic’ specified

Here's a fresh profile:

https://share.firefox.dev/3HVz7XM

Loads fine on both Edge and Chrome.

Hope this is useful.

Please let me know if anything further is needed from the profile or device.

¡Gracias!
Alex

Flags: needinfo?(alex_mayorga) → needinfo?(dholbert)

Looks like comment 38 - comment 41 is actually a recent regression and is tracked in bug 1755416 (which has a patch making its way through the review process now)

Flags: needinfo?(dholbert)
Performance Impact: --- → P3
Whiteboard: [qf:p3:responsiveness][wfh] → [wfh]

Hi Daniel, now that bug 1755416 is fixed - is there anything left to do here?

Flags: needinfo?(bugs) → needinfo?(dholbert)

Yeah, that bug's fix should've addressed the issues discussed in comment 38 through comment 41, I think.

I'd defer to reporter alex_mayorga on whether the performance is on-par with other browsers at this point. Though, given that we're up to 40+ comments and a handful of profiles and identified helper-bugs that have been spun off and fixed (on our end or on Google's end via site-redesigns etc.), I suspect it'd be most-manageable start fresh with a new bug for any remaining issues here, and consider this bug fixed-to-a-certain-extent-via-helper-bugs.

alex_mayorga: would you mind retesting your Google Chat experience in Nightly when you get a chance, and see how we're looking comparable to other browsers? And if we're still noticeably not-as-good, could you file a new Core::Performance bug with a profile and link it from here?

Thank you for all your help & profile-capturing here!

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(dholbert) → needinfo?(alex_mayorga)
Resolution: --- → FIXED

¡Hola Daniel!

I've probably done it wrong and this is not "apples to apples" as I do use uBlock Origin on Nightly but not on Edge but here's what I got by testing from about:blank to developer tools on both getting quiet after putting chat.google.com in the URL bars:

Nightly:
344 solicitudes
17.62 MB / 3.94 MB transferido
Finalizado: 30.94 s
DOMContentLoaded: 252 ms
load: 701 ms

Edge:
436 solicitudes
2.5 MB transferidos
19.7 MB recursos
Finalizar: 29.88 s
DOMContentLoaded: 171 ms
Cargar: 7.29 s

I don't really know how to interpret these so please let me know if I should file a new bug.

¡Gracias!
Alex

Flags: needinfo?(alex_mayorga) → needinfo?(dholbert)

(In reply to alex_mayorga from comment #45)

I don't really know how to interpret these so please let me know if I should file a new bug.

I don't really know how to interpret those either. :)

That looks like output from the bottom bar of the devtools "network" panel about how many requests there were & how expensive (in time & size) they were in aggregate, plus a few other basic page load metrics.

From the metrics you posted, DOMContentLoaded is probably the most relevant one to compare between browsers there; if that's reliably higher in Firefox vs. other browsers (as it was in your comment above, 252 ms vs 171 ms), then that'd potentially be worth filing a new bug on, but mostly if it "feels slower" in Firefox (if you can tell that it's taking longer).

If there's not a meaningful difference that you can detect, I suspect it's not worth worrying about at this point. (And as you noted, it's not exactly apples-to-apples comparison.)

Flags: needinfo?(dholbert)

But DOMContentLoaded itself isn't a useful performance metric. It just happens to tell when we stop parsing HTML - it doesn't tell anything about resource loading or such. And depending on some scheduling decisions, one may do other stuff before or after DOMContentLoaded.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: