1389381 - 16 Second Pause in ExtensionUtils.jsm's truncate()

Reporter

Description

•

8 years ago

Several users on Reddit are reporting that Firefox 55 is completely unusable for them due to extremely long pauses. I asked them to attempt a recording using the Gecko Profiler. One user was unable to get the profiler installed and running, but another one was able to upload several profiles. The one this bug is about is here: https://perfht.ml/2vJH3JQ The user is experiencing a 16 second hang on the main thread under WindowDestroyedEvent::Run(). In ExtensionUtils.jsm, the function truncate() is called on something, which then proceeds to spend 16 whole seconds seeking through an OrderedHashTable, calling onRemove() handlers, and finally remove()ing each item. The user says, "All addons except session manager and uBlock disabled." This seems pretty severe, and multiple users are reporting that 55 is effectively unusable for them. The specific user's report on Reddit is here: https://www.reddit.com/r/firefox/comments/6siseo/loading_websites_in_firefox_very_slow/dlgcl0q/ The full thread is here: https://www.reddit.com/r/firefox/comments/6siseo/loading_websites_in_firefox_very_slow/ I'm really not sure how to diagnose this. Ehsan or bz, would you be able to point me in the right direction, or see what the likely culprit would be? I remember that Ehsan changed some stuff with hash tables recently, but I don't know if that could be related.

Flags: needinfo?(ehsan)

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 1

•

8 years ago

This seems like something Kris might know something about, at least based on blame. But fundamentally, we're starting off at http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/toolkit/components/extensions/ExtensionContent.jsm#555 where we're told an inner window died. This calls http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/toolkit/components/extensions/MessageChannel.jsm#782 which adds things to this.abortedResponses. That's a LimitedSet (of max size 30), so we end up at http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/toolkit/components/extensions/ExtensionUtils.jsm#280 which calls http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/toolkit/components/extensions/ExtensionUtils.jsm#271 But... in this case that should do exactly one this.delete() call per add() call. That shouldn't be unreasonable. So maybe we have a ton of this.pendingResponses in abortResponses? Or we have a ton of DocumentManagers around? But at first glance DocumentManager is a singleton...

Flags: needinfo?(bzbarsky) → needinfo?(kmaglione+bmo)

André Bargull [:anba]

Comment 2

•

8 years ago

If LimitedSet#add() (resp. #truncate()) is called a few thousand times in consecutive order, and no GC happens in that time, the SetIteratorObjects aren't getting collected, which means their internal OrderedHashTable#Range isn't destroyed. In that case we spent a huge amount of time in http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/js/src/ds/OrderedHashTable.h#205-208 updating effectively stale ranges. Simple shell test case for that scenario: --- class LimitedSet extends Set { constructor(limit, iterable = undefined) { super(iterable); this.limit = limit; } truncate(limit) { for (let item of this) { if (this.size <= limit) { break; } this.delete(item); } } add(item) { if (!this.has(item) && this.size >= this.limit) { this.truncate(this.limit - 1); } super.add(item); } } var ls = new LimitedSet(30); var t = dateNow(); for (var i = 0; i < 50000; ++i) { ls.add(i); // GC inactive set iterators to destroy Range instances. if (i % 1000 === 0) gc(); // <-- Without this explicit GC, the test case needs much more time. } print(dateNow() - t); ---

André Bargull [:anba]

Comment 3

•

8 years ago

And if this is causing this slow-down, the simplest fix would be to change truncate() to: --- truncate(limit) { // Iterate over all entries to ensure the SetIteratorObject's Range gets destroyed. (bug 1389381) for (let item of this) { if (this.size > limit) { this.delete(item); } } } ---

Boris Zbarsky [:bzbarsky]

Comment 4

•

8 years ago

Hmm. Can we nursery-allocate set iterator objects? That seems like it would help too...

André Bargull [:anba]

Comment 5

•

8 years ago

(In reply to Boris Zbarsky [:bz] (vacation Aug 14-27) from comment #4) > Hmm. Can we nursery-allocate set iterator objects? That seems like it > would help too... bug 1368420. But for nursery allocation we'd need to have a different implementation for OrderedHashTable#Range, so we can background finalize the iterator objects. (At least that's my understanding.)

Ryan VanderMeulen [:RyanVM]

Comment 6

•

8 years ago

[Tracking Requested - why for this release]: Severe perf issue. Is comment 3 something we could ship in a dot release (and maybe 56 too) while we figure out a better long-term fix?

status-firefox55: --- → affected

status-firefox56: --- → affected

status-firefox57: --- → affected

status-firefox-esr52: --- → unaffected

tracking-firefox55: --- → ?

tracking-firefox56: --- → ?

tracking-firefox57: --- → ?

Keywords: regression

Whiteboard: [qf]

(no longer active)

Comment 7

•

8 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #6) > Is comment 3 something we could ship in a dot release (and maybe 56 too) > while we figure out a better long-term fix? It seems like we need a way to reproduce the problem so that we know this fixes it for that to be a viable option...

Flags: needinfo?(ehsan)

(no longer active)

Comment 8

•

8 years ago

Based on code inspection, I think this should be a regression from bug 1260548 which first added |this.abortedResponses = new ExtensionUtils.LimitedSet(30);| <https://hg.mozilla.org/mozilla-central/rev/8e67a3e9a5bf#l4.17>

Blocks: 1260548

(no longer active)

Comment 9

•

8 years ago

FWIW I tried installing both the Session Manager and uBlock Origin extensions in a new profile on 55.0.1 and browsed around a bit to see if I can reproduce this issue and didn't succeed.

Kris Maglione [:kmag]

Assignee

Comment 10

•

8 years ago

There's a chance this is caused by bug 1386937, which caused us to add duplicate inner-window-destroyed observers. But that seems pretty unlikely, since only the first observer should actually find any responses to abort, which is why I haven't asked for uplift. In the worst case, we can probably just remove the code that adds entries to abortedResponses, since it's really only there to prevent unhelpful console spam. But I think I'd rather try the solution in comment 3 first, combined with only truncating when we're something like 25% over the limit. I'd still like to know exactly why this is happening, though, and find a way to reproduce. This seems like a pretty pathological worst case, and I can't even guess what an extension would have to do in order to wind up with that many responses to abort. Although, looking at this more closely... When we call abortResponses, the aborted responses are removed from the set from a promise handler, which happens asynchronously. Which means that we should wind up iterating over the already aborted responses again from each observer callback. But even then, we only truncate when the item isn't already in the set, which means we'd have to have more than 30 responses to abort for each iteration.

Flags: needinfo?(kmaglione+bmo)

Kris Maglione [:kmag]

Assignee

Comment 11

•

8 years ago

Also, neither Session Manager nor any of the public versions of μBlock are WebExtensions, so it would be nice to know exactly which extensions they're using.

Kris Maglione [:kmag]

Assignee

Comment 12

•

8 years ago

Ah, looking at the profile some more, it looks like the only WebExtension they're using is Zotero.

Kris Maglione [:kmag]

Assignee

Comment 13

•

8 years ago

So, just running the Zotero add-on for a short while, I get a huge number of console errors. If I had to guess, I'd say their messaging wrapper code (which is pretty hairy) doesn't handle those correctly, and they wind up with a bunch of bunch of messages that are waiting for response callbacks that will never come, and all get canceled when when the window running the content scripts finally closes. So, given that, I'm just going to fix the issues that causes us to handle this so poorly, and not worry too much about trying to reproduce it.

Kris Maglione [:kmag]

Assignee

Updated

•

8 years ago

Depends on: 1386937

Comment hidden (mozreview-request)

Andrew Swan [:aswan]

Comment 17

•

8 years ago

mozreview-review

Comment on attachment 8896468 [details] Bug 1389381: Part 1 - Synchronously remove rending responses after aborting them. https://reviewboard.mozilla.org/r/167714/#review173012

Attachment #8896468 - Flags: review?(aswan) → review+

Andrew Swan [:aswan]

Comment 18

•

8 years ago

mozreview-review

Comment on attachment 8896469 [details] Bug 1389381: Part 2 - Add slop to the maximum size of a LimitedSet. https://reviewboard.mozilla.org/r/167716/#review173014 Good intentions, but where does `this.slop` actually get set?

Attachment #8896469 - Flags: review?(aswan) → review-

Andrew Swan [:aswan]

Comment 19

•

8 years ago

mozreview-review

Comment on attachment 8896470 [details] Bug 1389381: Part 3 - Loop over the entire set iterator when truncating a LimitedSet. https://reviewboard.mozilla.org/r/167718/#review173016

Attachment #8896470 - Flags: review?(aswan) → review+

Comment hidden (mozreview-request)

Andrew Swan [:aswan]

Comment 22

•

8 years ago

mozreview-review

Comment on attachment 8896469 [details] Bug 1389381: Part 2 - Add slop to the maximum size of a LimitedSet. https://reviewboard.mozilla.org/r/167716/#review173088

Attachment #8896469 - Flags: review?(aswan) → review+

Kris Maglione [:kmag]

Assignee

Comment 23

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/4b63f0f44148519df73c725ab99746d4d686e12f Bug 1389381: Part 1 - Synchronously remove rending responses after aborting them. r=aswan https://hg.mozilla.org/integration/mozilla-inbound/rev/364ef71685abd9959f7816810e422a15ec92087f Bug 1389381: Part 2 - Add slop to the maximum size of a LimitedSet. r=aswan https://hg.mozilla.org/integration/mozilla-inbound/rev/96fb5933d07336b44f19b77a511981cba96e29f3 Bug 1389381: Part 3 - Loop over the entire set iterator when truncating a LimitedSet. r=aswan

Kris Maglione [:kmag]

Assignee

Updated

•

8 years ago

Assignee: nobody → kmaglione+bmo

Kris Maglione [:kmag]

Assignee

Updated

•

8 years ago

Comment 24

•

8 years ago

Comment on attachment 8896468 [details] Bug 1389381: Part 1 - Synchronously remove rending responses after aborting them. Approval Request Comment [Feature/Bug causing the regression]: Bug 1260548 [User impact if declined]: This can cause significant performance issues in certain cases where extension scripts create large numbers of message listeners which never call their response callbacks. [Is this code covered by automated tests?]: Yes. [Has the fix been verified in Nightly?]: N/A [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No. [Why is the change risky/not risky?]: This is a fairly trivial change to make sure that aborted response objects are removed from a map synchronously, as expected, rather than only after returning to the main event loop or microtask queue. There should be no change in behavior except in the existing corner cases which currently produce incorrect behavior. [String changes made/needed]: None.

Attachment #8896468 - Flags: approval-mozilla-release?

Attachment #8896468 - Flags: approval-mozilla-beta?

Kris Maglione [:kmag]

Assignee

Comment 25

•

8 years ago

Comment on attachment 8896469 [details] Bug 1389381: Part 2 - Add slop to the maximum size of a LimitedSet. Approval Request Comment [Feature/Bug causing the regression]: Bug 1260548 [User impact if declined]: This can cause significant performance issues in certain cases where extension scripts create large numbers of message listeners which never call their response callbacks. [Is this code covered by automated tests?]: Yes. [Has the fix been verified in Nightly?]: N/A [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No. [Why is the change risky/not risky?]: This is a fairly simple change to reduce overhead by pruning entries from a set less often, rather than every time a new entry is added after it reaches its maximum capacity. [String changes made/needed]: None.

Attachment #8896469 - Flags: approval-mozilla-release?

Attachment #8896469 - Flags: approval-mozilla-beta?

Kris Maglione [:kmag]

Assignee

Comment 26

•

8 years ago

Comment on attachment 8896470 [details] Bug 1389381: Part 3 - Loop over the entire set iterator when truncating a LimitedSet. Approval Request Comment [Feature/Bug causing the regression]: Bug 1260548 [User impact if declined]: This can cause significant performance issues in certain cases where extension scripts create large numbers of message listeners which never call their response callbacks. [Is this code covered by automated tests?]: Yes. [Has the fix been verified in Nightly?]: N/A [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No. [Why is the change risky/not risky?]: This is a fairly simple change to reduce overhead by looping over an entire iterator, to make sure it's finalized immediately, rather than exiting the loop early. The logic is unchanged, and the extra loop overhead from a maximum of about 30 entries in a JIT-optimized function should not amount to much. [String changes made/needed]: None.

Attachment #8896470 - Flags: approval-mozilla-release?

Attachment #8896470 - Flags: approval-mozilla-beta?

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 27

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/4b63f0f44148 https://hg.mozilla.org/mozilla-central/rev/364ef71685ab https://hg.mozilla.org/mozilla-central/rev/96fb5933d073

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox57: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Sylvestre Ledru [:Sylvestre]

Comment 28

•

8 years ago

Would it be possible to have a test to make sure we don't regress here? Thanks

Sylvestre Ledru [:Sylvestre]

Comment 29

•

8 years ago

Comment on attachment 8896468 [details] Bug 1389381: Part 1 - Synchronously remove rending responses after aborting them. Fix some severe perf issues, taking it as ride along of 55.0.2

Attachment #8896468 - Flags: approval-mozilla-release?

Attachment #8896468 - Flags: approval-mozilla-release+

Attachment #8896468 - Flags: approval-mozilla-beta?

Attachment #8896468 - Flags: approval-mozilla-beta+

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

Attachment #8896469 - Flags: approval-mozilla-release?

Attachment #8896469 - Flags: approval-mozilla-release+

Attachment #8896469 - Flags: approval-mozilla-beta?

Attachment #8896469 - Flags: approval-mozilla-beta+

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

Attachment #8896470 - Flags: approval-mozilla-release?

Attachment #8896470 - Flags: approval-mozilla-release+

Attachment #8896470 - Flags: approval-mozilla-beta?

Attachment #8896470 - Flags: approval-mozilla-beta+

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

tracking-firefox55: ? → +

tracking-firefox56: ? → +

tracking-firefox57: ? → +

Ryan VanderMeulen [:RyanVM]

Comment 30

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/3323f34f84bb https://hg.mozilla.org/releases/mozilla-beta/rev/ffabdaaebcea https://hg.mozilla.org/releases/mozilla-beta/rev/b301a9babee6

status-firefox56: affected → fixed

Flags: in-testsuite?

Ryan VanderMeulen [:RyanVM]

Comment 31

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-release/rev/cdf2b7a4a822 https://hg.mozilla.org/releases/mozilla-release/rev/1b2c15264aee https://hg.mozilla.org/releases/mozilla-release/rev/45ab6e362747

status-firefox55: affected → fixed

Ryan VanderMeulen [:RyanVM]

Comment 32

•

8 years ago

Sean, would it be possible to ask some of the affected Reddit users to try out the CI builds that contain this fix? Should be reasonably safe to do, but backing up their profile first would be a reasonable precaution :) OSX: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-release-macosx64/1502721201/firefox-55.0.2.en-US.mac.dmg Win32: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-release-win32/1502720964/firefox-55.0.2.en-US.win32.zip Win64: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-release-win64/1502720964/firefox-55.0.2.en-US.win64.zip

Flags: needinfo?(sstangl)

Emil Ghitta, QA [:emilghitta]

Comment 33

•

8 years ago

We tried to reproduce this issue on 55.0.2-build1 as well, but we didn't managed to due so.

Sylvestre Ledru [:Sylvestre]

Comment 34

•

8 years ago

Added to the release notes with "Fix performance regressions with WebExtension"

Sean Stangl [:sstangl]

Reporter

Comment 35

•

8 years ago

Ryan: I sent messages 2 days ago, but have received no responses.

Flags: needinfo?(sstangl)

Kris Maglione [:kmag]

Assignee

Comment 36

•

8 years ago

(In reply to Sean Stangl [:sstangl] from comment #35) > Ryan: I sent messages 2 days ago, but have received no responses. It's probably not worth worrying about, at this point. I'm pretty confident that this solves the regression, and if there are still issues as bad as this, I'm sure we'll hear about them soon enough.

Christopher Tino

Comment 37

•

8 years ago

(In reply to Kris Maglione [:kmag] from comment #36) > (In reply to Sean Stangl [:sstangl] from comment #35) > > Ryan: I sent messages 2 days ago, but have received no responses. > > It's probably not worth worrying about, at this point. I'm pretty confident > that this solves the regression, and if there are still issues as bad as > this, I'm sure we'll hear about them soon enough. Hi Kris, we're still seeing a subset of our users on v55.0.2 experiencing 10-15sec startup times. I'm able to partially reproduce on a Win 10 machine running FF 55.0.2 x64. Here's a profile running Ghostery at startup and the browser locks up for about 10sec: https://perfht.ml/2imUBFC In this profile I disabled and re-enabled Ghostery around the 80sec mark, and Firefox locked up for about 20 seconds. https://perfht.ml/2imq9LQ This had been a massive issue for us recently and I'm grateful that you all were able to patch it in the last release. Any further help you could provide would be much appreciated. Thanks, Chris

Kris Maglione [:kmag]

Assignee

Comment 38

•

8 years ago

(In reply to Christopher Tino from comment #37) > Hi Kris, we're still seeing a subset of our users on v55.0.2 experiencing > 10-15sec startup times. I'm able to partially reproduce on a Win 10 machine > running FF 55.0.2 x64. Here's a profile running Ghostery at startup and the > browser locks up for about 10sec: There's too much going on in those profiles to make much sense of, but the problems are clearly unrelated to this bug. In both cases, a lot of the overhead is from Sync and SDK extensions (particularly LastPass). Aside from that, most of the WebExtension overhead seems to be JSON/structured clone overhead in the storage API (some of which was probably fixed by bug 1356546 and bug 1370752), and other random messaging overhead.

(no longer active)

Comment 39

•

8 years ago

It may be worth filing a separate bug for comment 37...

Christopher Tino

Comment 40

•

8 years ago

Thanks guys. We're trying to reproduce the issue based on the info we have from users. I'll see if I can get a few to run the profiler, then I'll open a new bug if we find anything useful.

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → ?

Whiteboard: [qf]

Bug 1389381: Part 1 - Synchronously remove rending responses after aborting them. 8 years ago Kris Maglione [:kmag] 59 bytes, text/x-review-board-request	aswan : review+ Sylvestre : approval-mozilla-beta+ Sylvestre : approval-mozilla-release+	Details
Bug 1389381: Part 2 - Add slop to the maximum size of a LimitedSet. 8 years ago Kris Maglione [:kmag] 59 bytes, text/x-review-board-request	aswan : review+ Sylvestre : approval-mozilla-beta+ Sylvestre : approval-mozilla-release+	Details
Bug 1389381: Part 3 - Loop over the entire set iterator when truncating a LimitedSet. 8 years ago Kris Maglione [:kmag] 59 bytes, text/x-review-board-request	aswan : review+ Sylvestre : approval-mozilla-beta+ Sylvestre : approval-mozilla-release+	Details