Crash in nsPIDOMWindowInner::UpdateWebSocketCount

VERIFIED FIXED in Firefox 61

Status

()

defect
P2
critical
VERIFIED FIXED
Last year
4 months ago

People

(Reporter: mccr8, Assigned: bkelly)

Tracking

(Depends on 1 bug, Blocks 3 bugs, {crash, regression})

Trunk
mozilla61
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox59 unaffected, firefox60 unaffected, firefox61blocking verified)

Details

(crash signature)

Attachments

(11 attachments, 30 obsolete attachments)

7.17 KB, patch
smaug
: review+
Details | Diff | Splinter Review
2.00 KB, patch
smaug
: review+
Details | Diff | Splinter Review
4.70 KB, patch
smaug
: review+
Details | Diff | Splinter Review
3.15 KB, patch
smaug
: review+
Details | Diff | Splinter Review
3.39 KB, patch
smaug
: review+
Details | Diff | Splinter Review
3.04 KB, patch
smaug
: review+
Details | Diff | Splinter Review
6.46 KB, patch
smaug
: review+
Details | Diff | Splinter Review
2.06 KB, patch
smaug
: review+
Details | Diff | Splinter Review
2.54 KB, patch
smaug
: review+
Details | Diff | Splinter Review
942 bytes, patch
baku
: review+
Details | Diff | Splinter Review
3.46 KB, patch
baku
: review+
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is
report bp-8cfa13ab-6352-4ee8-8680-a80ea0180405.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll nsPIDOMWindowInner::UpdateWebSocketCount dom/base/nsGlobalWindowInner.cpp:2641
1 xul.dll mozilla::dom::WebSocketImpl::Disconnect dom/websocket/WebSocket.cpp:621
2 xul.dll mozilla::dom::WebSocketImpl::DispatchConnectionCloseEvents dom/websocket/WebSocket.cpp:1907
3 xul.dll mozilla::dom::CallDispatchConnectionCloseEvents::Run dom/websocket/WebSocket.cpp:261
4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1096
5 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:97
6 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run ipc/glue/MessagePump.cpp:301
7 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:319
8 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:299
9 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:157

=============================================================

This crash showed up in the 20180404100127 Nightly. Possibly a regression from bug 1450266. #7 Windows top crash for the April 4 Nightlies.
Ben, could this be a regression from your patch?
Flags: needinfo?(bkelly)
This seems related to the assertions I get with the websocket leak test in bug 1451411.  I think this count updating code has a bug in it.  Since this is low frequency I'd like to see if Andrea can fix bug 1451411 before looking at this.
Depends on: 1451411
Flags: needinfo?(bkelly)
OS: Windows 10 → All
Hardware: Unspecified → All
Version: unspecified → Trunk
I guess this used to be infrequent, but is now more frequent.  I'll see if I can fix the activity counter stuff.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
I wonder if this is caused by:

1. Site opens websocket
2. Site does document.open()
3. websocket closes

We now rebind DETH objects on document.open(), so we probably end up decrementing the count on the new window while we only ever incremented it on the old window.
Blocks: 1446714
No longer blocks: 1446714
No longer depends on: 1451411
Blocks: 1451411
Priority: -- → P2
Comment on attachment 8966354 [details] [diff] [review]
P1 Allow DOMEventTargetHelper classes to indicate when budget throttling is allowed or not. r=farre

Andreas, this patch adds a system by which DOMEventTargetHelper (DETH) objects attached to a global can advertise that they should allow or disallow budget throttling.  The intention is to replace the counters we try to track on the window with code that queries the set of DETH objects.  This will avoid the problem of state inconsistency in weird corner cases.

Note, this will make figuring out "can I budget throttle" more expensive.  It has to query the tree of nested iframes and each DETH object in each window.  Personally I don't think scheduling timers should be that hot a path for most workloads, so this is probably ok.  If it becomes a problem we can add in some caching, etc.
Attachment #8966354 - Flags: review?(afarre)
Comment on attachment 8966355 [details] [diff] [review]
P2 Make WebSocket use DOMEventTargetHelper::BudgetThrottlingAllowed(). r=farre

This patch makes WebSocket use the new DETH::BudgetThrottlingAllowed() mechanism.
Attachment #8966355 - Flags: review?(afarre)
Comment on attachment 8966356 [details] [diff] [review]
P3 Make IDB classes use DOMEventTargetHelper::BackgroundThrottlingAllowed(). r=asuth

Andrew, this patch converts IDB code over to use the new BudgetThrottlingAllowed() mechanism introduced in P1 of this bug.  The goal is to allow the global to query its event targets' current state to determine if throttling is allowed instead of trying to track counters in the window.  This will automatically handle various corner cases like document.open() gracefully.

Note, its not strictly necessary to convert IDB over in this bug, but I wanted to do it for a couple reasons:

1. IDB has a test for throttling behavior and WebSocket does not.  So doing this lets us exercise the P1 patch in automation.
2. It proves that the P1 patch is suitable to handle multiple features.

Note, I did not convert over HasActivePeerConnections() because that method is called in another circumstance beyond TimeoutManager.
Attachment #8966356 - Flags: review?(bugmail)
Comment on attachment 8966356 [details] [diff] [review]
P3 Make IDB classes use DOMEventTargetHelper::BackgroundThrottlingAllowed(). r=asuth

Review of attachment 8966356 [details] [diff] [review]:
-----------------------------------------------------------------

This is a fantastic cleanup, thank you!  I've filed bug 1452917 to do a little more cleanup/potential optimization in IDB.  Please note that there is one necessary action item, it's not all restatings.

Restating the pre-this-patch-world:
- We were concerned about throttled background tabs blocking foreground tabs because write transactions block other write transactions/overlapping-scope read transactions, plus open databases older versions block version upgrades.
- nsGlobalWindowInner tracked the number of active databases and active transactions.
  - Active databases also includes the open requests (IDBOpenDBRequest) because the request will hit IPC and cause upgrade problems regardless of the global knowing about the resulting IDBDatabase.
- nsGlobalWindowInner would only use the number of active databases to forbid budget throttling because it's a superset of coverage; you can't have an active transaction without an active database.  The transaction count was intended to support Quantum DOM Scheduling.  I don't think it was ever used, but it's certainly no longer used, and bug 1452917 covers the removal of the IDB bits if the Quantum Scheduling removal doesn't remove that entirely.

Restating post-patch:
- We now also will consider the transaction state for budget throttling where we wouldn't previously, but that's fine, that was just an optimization since there must be an active database for there to be an active transaction.

::: dom/indexedDB/IDBDatabase.cpp
@@ -1149,5 @@
>  
>  void
> -IDBDatabase::NoteInactiveTransactionDelayed()
> -{
> -  ExpireFileActors(/* aExpireAll */ false);

Unfortunately, I think this is a load-bearing call to ExpireFileActors(false).  While ExpireFileActors(false) simply scans mFileActors for moot weak-references and deletes the corresponding parent-actor[1], the only other time we call this in steady-state is under memory-pressure.  So unless you want to hook ExpireFileActors up to a post-GC hook or something like that, we need a call to this from the current IDBTransaction::MaybeNoteInactiveTransaction somehow.

The good news is the runnable delay isn't actually necessary, it looks like Bevis was concerned about pre-emption points, but that was a Quantum Scheduler thing.

The bad news is that ExpireFileActors is private, so maybe the simplest thing is to keep NoteInactiveTransaction around but have it only call the expire method.  You don't need to guard the call with any conditionals, though.

1: This is part of IDB's Blob/File de-duplication magic by remembering Blobs we've seen before.

::: dom/indexedDB/IDBDatabase.h
@@ +266,2 @@
>      mBackgroundActor = nullptr;
> +    CloseInternal();

Restating: This change was made after asking me on IRC about whether this was necessary and I concluded that we'd potentially crash at https://searchfox.org/mozilla-central/rev/b55e1a1cbcaee34878e133fbac20c4c2af6e11b5/dom/indexedDB/IDBDatabase.cpp#715 in IDBDatabase::Transaction if we didn't enter the closed state.  Also, that the crash would be very rare since we don't normally expect the parent actor to go away without the child requesting it.  It shouldn't happen in shutdown, but might be possible during privacy-related origin-clearings.
Attachment #8966356 - Flags: review?(bugmail) → review+
(In reply to Andrew Sutherland [:asuth] from comment #17)
> ::: dom/indexedDB/IDBDatabase.cpp
> @@ -1149,5 @@
> >  
> >  void
> > -IDBDatabase::NoteInactiveTransactionDelayed()
> > -{
> > -  ExpireFileActors(/* aExpireAll */ false);

Ack.  This was unintentional.  I'll add it back.
Comment on attachment 8966354 [details] [diff] [review]
P1 Allow DOMEventTargetHelper classes to indicate when budget throttling is allowed or not. r=farre

Actually, I'm not sure these changes are adequate.
Attachment #8966354 - Flags: review?(afarre)
Attachment #8966355 - Flags: review?(afarre)
Attachment #8966356 - Flags: review+
The root problem here is that there are various stateful things on the inner window tied to the DETH objects attached to it.  When document.open() re-binds the DETH objects to the new inner window we are losing this current state.  When the new window closes we then hit our assertions about cleaning up state nicely.

I think we need to make a DETH::OnBind() virtual method hook DETH classes can implement.  This will allow them to move any inner window state from the old window to the new window.
Attachment #8966354 - Attachment is obsolete: true
Attachment #8966355 - Attachment is obsolete: true
Attachment #8966356 - Attachment is obsolete: true
I still need to fix webrtc and IDB.  Unfortunately IDB will require a special snowflake fix.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1d77ebcd41d5952bee4722e2a75e6699f9989cda
We probably need to update the window-to-worker map on document.open() as well.
Blocks: event-target-leaks
No longer blocks: 1451411
Duplicate of this bug: 1451411
Blocks: 1453471
Currently the #4 top content process crash.
No longer blocks: 1453471
Attachment #8967145 - Attachment is obsolete: true
I still need to fixup workers on document.open(), but lets see where this leaves us:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3c23c3226dea592b2efc045fc8d49a9ac2934665
It seems I broke test_bug372964-2.html again in this patch queue somewhere.
(In reply to Ben Kelly [:bkelly] from comment #38)
> It seems I broke test_bug372964-2.html again in this patch queue somewhere.

Ah, its in P0:

  currentInner->MigrateStateForDocumentOpen(newInnerWindow);

Should be:

  newInnerWindow->MigrateStateForDocumentOpen(currentInner);
*shakes fist at eslint*
Attachment #8967176 - Attachment is obsolete: true
Attachment #8967367 - Attachment is obsolete: true
I think I'm going to punt on the worker stuff for now.  I've verified workers don't crash on document.open().  Instead we end up calling CancelWorkersForWindow() from FreeInnerObjects() which stops the workers.  This seems ok for now.
See Also: → 1453721
Comment on attachment 8967455 [details] [diff] [review]
P0 Delay rebinding and freeing the old inner window until after set the new inner window. r=smaug

Olli, the root problem I'm trying to fix in this bug is that many APIs stash state on the inner window and loses access to it when we rebind to the new window for a document.open().

This patch adds a new method on the inner window that will be called for document.open() to migrate state from the old window to the new window.  This includes the re-binding of the DETH objects.

The patch also moves the state migration and FreeInnerObjects() call down in SetNewDocument().  Some of the state migration code requires the new window to be somewhat fully setup so it can access things like mTopWindow and mDoc, etc.

Note, my hope is that one day we can implement bug 1449992 and this entire mechanism can go away.  Working on this has convinced me that reusing the window on document.open() is much more sane than trying to keep all the different kind of binding objects working after we swap in a new inner window.
Attachment #8967455 - Flags: review?(bugs)
Comment on attachment 8967456 [details] [diff] [review]
P1 Add DOMEventTargetHelper::OnDynamicBind() method. r=smaug

This patch adds an OnDynamicBind() virtual method hook to the DETH base class.  Its called when we rebind the object to a new global.  This will be used in later patches to update state on the global.

Note, I wish I could have just always called this method when we bind the global, but I can't call a virtual method from a base class constructor.
Attachment #8967456 - Flags: review?(bugs)
Comment on attachment 8967457 [details] [diff] [review]
P2 Update websocket state on the inner window when rebinding. r=smaug

This patch uses the OnDynamicBind() hook to transfer websocket activity state from the old window to the new window.
Attachment #8967457 - Flags: review?(bugs)
Comment on attachment 8967458 [details] [diff] [review]
P3 Make AudioContext add itself to the new window when its rebound for a document.open(). r=smaug

This patch moves the AudioContext weak reference from the old window to the new window.
Attachment #8967458 - Flags: review?(bugs)
Comment on attachment 8967459 [details] [diff] [review]
P4 Don't manually create a new Performance object on document.open() since we are rebinding the old Performance to the new global. r=smaug

This patch removes the old document.open() code that created a new Performance object using the old object's state.  Now that we are rebinding, lets just keep the original object.
Attachment #8967459 - Flags: review?(bugs)
Comment on attachment 8967460 [details] [diff] [review]
P5 Migrate the IDBFactory to the new window on document.open(). r=smaug

This patch moves the IDB state from the old window to the new window.  Note, IDB objects do extend DETH, but I can't easily use OnDynamicRebind() here because of how IDB funnels everything through this non-DETH IDBFactory object.
Attachment #8967460 - Flags: review?(bugs)
Comment on attachment 8967461 [details] [diff] [review]
P6 Test that WebSocket does not leak through its event listeners. r=smaug

Add an event target leak test for WebSocket.  The document.open() code in this test triggered the assertion in comment 0.
Attachment #8967461 - Flags: review?(bugs)
Comment on attachment 8967462 [details] [diff] [review]
P7 Add an IDB event target leak test. r=smaug

This adds an IDB event target leak test.  This is to make sure we exercise the document.open() code with an active IDB transaction.
Attachment #8967462 - Flags: review?(bugs)
Comment on attachment 8967463 [details] [diff] [review]
P8 Add an AudioContext event target leak test. r=smaug

Add an AudioContext event target leak test.  This is for coverage of the document.open() code with an active AudioContext.
Attachment #8967463 - Flags: review?(bugs)
Comment on attachment 8967456 [details] [diff] [review]
P1 Add DOMEventTargetHelper::OnDynamicBind() method. r=smaug

Even after a night this feels awkward. I would just let DETH implementations to override BindToOwner(nsIGlobalObject* aOwner).
To my eyes OnDynamicBind look unneeded addition, when it would be enough to
have virtual BindToOwner, and that way the argument is also what one would expect, the new global, not old one.
Attachment #8967456 - Flags: review?(bugs) → review-
Comment on attachment 8967463 [details] [diff] [review]
P8 Add an AudioContext event target leak test. r=smaug

+// Its important here that we create a listener callback from
+// the DOM objects back to the frame's global in order to
+// exercise the leak condition.
It's 

rs+


It is unclear to me why we need to manually add 
+  ctx.onstatechange = e => {
+    contentWindow.stateChangeCount += 1;
+  };
Why the leak script couldn't just add event listener which has a reference to the window.
Attachment #8967463 - Flags: review?(bugs) → review+
Comment on attachment 8967462 [details] [diff] [review]
P7 Add an IDB event target leak test. r=smaug

It's

rs+
Attachment #8967462 - Flags: review?(bugs) → review+
Comment on attachment 8967457 [details] [diff] [review]
P2 Update websocket state on the inner window when rebinding. r=smaug

This would use BindToOwner, I hope, but otherwise ok.
Attachment #8967457 - Flags: review?(bugs) → review+
Attachment #8967458 - Flags: review?(bugs) → review+
Attachment #8967459 - Flags: review?(bugs) → review+
Attachment #8967461 - Flags: review?(bugs) → review+
Comment on attachment 8967460 [details] [diff] [review]
P5 Migrate the IDBFactory to the new window on document.open(). r=smaug

Ahaa,IDBFactory is not a DETH.
I wonder if we should move BindToOwner stuff out from DETH to some helper class.

but r+ anyhow

(This all is scary stuff)
Attachment #8967460 - Flags: review?(bugs) → review+
Attachment #8967455 - Flags: review?(bugs) → review+
Helps if you keep the ScopeExit alive till the end of the scope...
Attachment #8967800 - Attachment is obsolete: true
Comment on attachment 8967811 [details] [diff] [review]
P1 Allow subclasses to override DOMEventTargetHelper::BindToOwner(nsIGlobalObject*). r=smaug

This looks ok. r+ even if there wasn't '?'
Attachment #8967811 - Flags: review+
Comment on attachment 8967845 [details] [diff] [review]
P2 Update websocket state on the inner window when rebinding. r=smaug

ah, this is quite nice.
Attachment #8967845 - Flags: review+
Attachment #8967846 - Flags: review+
Thanks.  I was waiting for some try retriggers, but they turned out green.  Also a TV failure I probably need to fix.
I retrigged the TV a few times.  If its not perma-orange, then I will land this to get the critical crasher fixed.
Comment on attachment 8968077 [details] [diff] [review]
P9 Try to avoid event listener leak test intermittents by doing more GC. r=baku

Andrea, this patch does even more GC during the event target leak tests.  My hope is this will address some intermittent failures in the document.open() case for existing tests.  It also reduces the frequency of failures in the TV case for the IDB test in this bug.

If we see continued intermittent failure then I'll open another bug to research further.
Attachment #8968077 - Flags: review?(amarchesini)
I think maybe I should also immediately disconnect IDBFactory's mWindow ref back to the window when we call FreeInnerObjects().
Comment on attachment 8968085 [details] [diff] [review]
P10 Make the inner window explicitly disconnect the IDBFactory. r=asuth

This further reduces the reproduction of the orange in the --verify TV runs.  There is clearly some additional way we can delay GC or leak in this IDB test, but I don't want to delay landing the critical crash fixes for that.  I'd like to land these two bandaids for now and we can track down any intermittents later.
Attachment #8968085 - Flags: review?(amarchesini)
Attachment #8968077 - Flags: review?(amarchesini) → review+
Attachment #8968085 - Flags: review?(amarchesini) → review+
I have to fix the bug number in the last patch.  I'll land later this morning.
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e3b3fac9a495
P0 Delay rebinding and freeing the old inner window until after set the new inner window. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/3baa317d82f5
P1 Allow subclasses to override DOMEventTargetHelper::BindToOwner(nsIGlobalObject*). r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/1ff0b30a815e
P2 Update websocket state on the inner window when rebinding. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/4952f8b12700
P3 Make AudioContext add itself to the new window when its rebound for a document.open(). r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/7f45cea49e22
P4 Don't manually create a new Performance object on document.open() since we are rebinding the old Performance to the new global. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/102a57aaff3a
P5 Migrate the IDBFactory to the new window on document.open(). r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/0eed49c93565
P6 Test that WebSocket does not leak through its event listeners. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/2f9240cfa840
P7 Add an IDB event target leak test. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/8dd0da361b21
P8 Add an AudioContext event target leak test. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/76afcfbd047b
P9 Try to avoid event listener leak test intermittents by doing more GC. r=baku
https://hg.mozilla.org/integration/mozilla-inbound/rev/ca5da952ff4c
P10 Make the inner window explicitly disconnect the IDBFactory. r=baku
Blocks: 1453544
Blocks: 1452988
Depends on: 1454533
Its still early, but so far no crash reproductions in build id 20180416220315.
No crashes since this landed!
Status: RESOLVED → VERIFIED
Depends on: 1457867
Depends on: 1509442
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.