Closed Bug 1293127 Opened 3 years ago Closed 11 months ago

Twitter bogs down entire Firefox instance and performs poorly

Categories

(Core :: JavaScript: GC, defect, P3, major)

50 Branch
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
platform-rel --- -
firefox49 --- wontfix
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: kael, Unassigned, NeedInfo)

References

(Depends on 1 open bug, )

Details

(Keywords: triage-deferred, Whiteboard: [platform-rel-Twitter])

Attachments

(6 files)

While e10s improves this scenario significantly, having Firefox open in a tab bogs down the entire browser to a degree that is noticeable in any page - hover styles are often delayed for a noticeable period of time, scrolling lags, etc. Profiling Twitter shows that the browser is doing CC/GC almost nonstop and this results in painting falling well behind compositor scrolling. Twitter seems to be the main problem here but it's unfortunate that it seems to degrade the entire browser.

Mobile Twitter (mobile.twitter.com) is a good comparison because generally speaking it has none of these problems. I don't think this is necessarily a Firefox regression (Twitter seems to make their frontend JS slower all the time) but it's possible something about FF GC or otherwise got worse at dealing with Twitter in the last couple weeks. I think it has gotten worse.

Attaching profile and memory report. Note that some of the gaps in the profile are from switching away to another tab, then back to Twitter. Most of the gaps are just the site choking.
Anonymized memory report of my session. You can see I have a lot of stuff open, but under normal circumstances nothing bogs down unless I load desktop twitter.

I can provide a non-anonymized report if it helps.
Re: my recently filed bug 1294700, opening mobile twitter in the background is sufficient to make a simple HTML test page begin triggering minor GCs every second or two at idle when it previously never triggered any. The same is true for regular twitter, though I can't easily tell whether it is better or worse in comparison. The result of this in performance tools appears to be that individual pages appear to be GCing constantly, when the GCs/CCs are actually being induced by background tabs or extensions or something else.
:jonco, any chance you or someone you work with could investigate what's going on here?
Component: General → JavaScript: GC
Flags: needinfo?(jcoppeard)
Product: Firefox → Core
I'm looking into this, and although I'm not sure I'm seeing the exact problem described I'm observing:
 - frequent minor GCs due to full generic store buffer (from CCWs to nursery objects)
 - frequent GCs due to DOM workers
(In reply to Jon Coppeard (:jonco) from comment #4)
> I'm looking into this, and although I'm not sure I'm seeing the exact
> problem described I'm observing:
>  - frequent minor GCs due to full generic store buffer (from CCWs to nursery
> objects)
>  - frequent GCs due to DOM workers
Those both seem like they could cover the problem. Is there anything i can get you to make it easier to narrow down?
This will probably be made unnecessary by future changes, but in the meantime here's a patch to add a list of CCW keys containing nursery pointers to the JSCompartment and to trace those during minor GC and update the map.

This removes this use of the generic buffer.  It seems the buffer is quite small and currently can only handle 73 WrapperMapRef entries.

I did some manual testing of the following steps:
 - start browser
 - load twitter.com
 - scoll to the end
 - wait 10 seconds

Without patch: 297 minor GCs, 254 due to FULL_STORE_BUFFER
With patch: 67 minor GCs, 20 due to FULL_STORE_BUFFER
Flags: needinfo?(jcoppeard)
Attachment #8782890 - Flags: review?(terrence)
Comment on attachment 8782890 [details] [diff] [review]
bug1293127-ccw-nursery-keys

Review of attachment 8782890 [details] [diff] [review]:
-----------------------------------------------------------------

That's going to be a pain to rebase over.
Attachment #8782890 - Flags: review?(terrence) → review+
Keywords: leave-open
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/520e4b9d3ed0
Mark CCW keys that have nursery pointers explicitly rather than using the generic buffer r=terrence
(In reply to K. Gadd (:kael) from comment #5)
Hi, the changes should have made it into nightly by now.  Can you let me know if they made any difference?
Flags: needinfo?(kg)
It feels a bit better, but the browser still turns into a laggy, hitchy mess once I leave Twitter open in a pinned tab long enough. This also induces the issue where textareas repaint at around 1fps or lower.

Here's a profile of Twitter once things get bad. It's fine at first.
Flags: needinfo?(kg)
(In reply to K. Gadd (:kael) from comment #12)
Thanks for the feedback.  I'm trying to reproduce this (I replicated your addons setup as closely as I could) but haven't seen this slowdown yet.

From a GC perspective I did see some questionable behaviour though.  We're getting quite a few long minor GCs.  Sometimes marking the runtime takes over a second, and sometimes the value store buffer has over a thousand entries causing it to take a long time too.

How long does it take for the slowdown to happen?  How many other tabs do you have open?  I guess this is your main browsing session?.
Bug 1277376 may be related.
(In reply to Jon Coppeard (:jonco) from comment #13)
> (In reply to K. Gadd (:kael) from comment #12)
> Thanks for the feedback.  I'm trying to reproduce this (I replicated your
> addons setup as closely as I could) but haven't seen this slowdown yet.
> 
> From a GC perspective I did see some questionable behaviour though.  We're
> getting quite a few long minor GCs.  Sometimes marking the runtime takes
> over a second, and sometimes the value store buffer has over a thousand
> entries causing it to take a long time too.
> 
> How long does it take for the slowdown to happen?  How many other tabs do
> you have open?  I guess this is your main browsing session?.

I've reduced my session to 5 pinned tabs and a few regular ones:
gmail, google calendar, jisho, google translate, about:sync-tabs... and then whatever random tabs I have open. I pruned my whole sessionstore a couple weeks ago while investigating this so I've only had a couple tiny html-only pages open in the background while testing this.

Leaving twitter open for about 30 minutes in the background is usually sufficient for everything to become horrible. Closing the tab doesn't make it recover (at least not immediately).
(In reply to K. Gadd (:kael) from comment #15)
Great, I will try this.

Could you also capture the output of about:memory when this happens and post it here?
I tried some more but haven't been able to reproduce this slowdown.

I did observe some GCs with very many slices caused by use getting stuck in sweeping for ages and (presumably) not being able to make progress due to spending the entire slice doing the pre-sweep marking.  We could make it always do some work in sweeping even if it exceeds its slice budget to ensure that there is always progress.
Approval Request Comment
[Feature/regressing bug #]: Bug 1267812 was the most recent change to affect the generic store buffer.
[User impact if declined]: I'd like to uplift this because it should be a noticeable performance improvement for our users (see results in comment 6).
[Describe test coverage new/current, TreeHerder]: On m-c since August 20th.
[Risks and why]: Low.
[String/UUID change made/needed]: None.
Attachment #8784367 - Flags: review+
Attachment #8784367 - Flags: approval-mozilla-aurora?
Attached patch bug1293127-betaSplinter Review
Approval Request Comment
[Feature/regressing bug #]: Bug 1267812 was the most recent change to affect the generic store buffer.
[User impact if declined]: I'd like to uplift this because it should be a noticeable performance improvement for our users (see results in comment 6).
[Describe test coverage new/current, TreeHerder]: On m-c since August 20th.
[Risks and why]: Low.
[String/UUID change made/needed]: None.
Assignee: nobody → jcoppeard
Attachment #8784369 - Flags: review+
Attachment #8784369 - Flags: approval-mozilla-beta?
Depends on: 1297913
Jon, I can see your reasoning here but if we can't replicate the problem and haven't verified this makes it better, I am not sure we should uplift to late beta. Can you assure me in some way this is a good idea? Maybe a 2nd review with something of an explanation of possible risk?
Flags: needinfo?(jcoppeard)
Fixed in Nightly51 based on comment 10.
Hello K, could you please validate this issue is fixed as expected on the latest Nightly build? Thanks!
Flags: needinfo?(kg)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #20)
This does fix a valid problem, although it's probably not the cause of this bug.  I should really have opened a separate bug for it I guess.  But I understand if it's maybe not appropriate to take this for beta at this stage.
Flags: needinfo?(jcoppeard)
(In reply to Ritu Kothari (:ritu) from comment #22)
> Hello K, could you please validate this issue is fixed as expected on the
> latest Nightly build? Thanks!

I tested it in comment #12 and while the commit in this bug is an improvement it doesn't fix the problem I reported; the symptoms still persist (just to a reduced degree).
Flags: needinfo?(kg)
Can you capture the output of about:memory when the slowdown happens and post it here?
Flags: needinfo?(kg)
Depends on: 1298346
Comment on attachment 8784369 [details] [diff] [review]
bug1293127-beta

Too late in beta for kind of unclear gains. We may still want to take this for aurora 50.
Attachment #8784369 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Comment on attachment 8784367 [details] [diff] [review]
bug1293127-aurora

50 is still in Aurora cycle, this patch seems to have improved things a tiny bit and it seems worthwhile to uplift to Aurora.
Attachment #8784367 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Blocks: 1277819
platform-rel: --- → ?
Whiteboard: [platform-rel-Twitter]
platform-rel: ? → -
Assignee: jcoppeard → nobody
Keywords: triage-deferred
Priority: -- → P3
The leave-open keyword is there and there is no activity for 6 months.
:jonco, maybe it's time to close this bug?
Flags: needinfo?(jcoppeard)
I had a look in the profiler and I'm not seeing continuous GCs when browsing Twitter.  I'm going to close this as incomplete.  kael, please reopen if this is still a problem.
Status: NEW → RESOLVED
Closed: 11 months ago
Flags: needinfo?(jcoppeard)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.