MediaManager walks descendant docshells of a docshell that it got from a non-current inner, tearing down mediacapture for unrelated documents
Categories
(Core :: WebRTC: Audio/Video, defect, P3)
Tracking
()
People
(Reporter: pehrsons, Assigned: jib)
References
(Regression)
Details
(Keywords: regression)
Attachments
(1 file)
STR:
- Build rev d202f2122d71 from try
./mach mochitest dom/media/test/test_imagecapture.html --verify
Expected result:
Test passes the verify suite
Actual result:
When the test reaches the gcTest
and calls SpecialPowers.gc()
the inner window where the mochitest is running gets released and the MediaStreamTrack
ends because of MediaManager::OnNavigation()
. That happens sometimes. Sometimes the test passes, but never the whole suite. Sometimes it is just left hanging. Issues seem to only happen on the first test run after launching the browser.
Surely when this happens the browser must have a reference to the inner window? I can see its content and all with my own eyes.
Or did some other window go away that the getUserMedia()
call this test did was handed by mistake? That could explain what happens when the track ends, but not when we're left hanging.
To help debug this I managed to catch the track-ending case with rr, and I have a pernosco recording
. See the Notebook for where the window goes away and the track ends. The track ending is what fails the test as seen in the Alerts panel.
This might be the cause of bug 1114828, but I can't say for sure if it's the only reason we could have timed out in the old version of this test. The linked version above is my modernized version of it where we still have issues.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 1•5 years ago
|
||
I'll add that I have seen similar things with other tests that do getUserMedia early on in the test. Either that gUM succeeds and it gets some tracks that unexpectedly end, or that the async gUM request is in flight and later rejected with an AbortError. Both would be possible if MediaManager::OnNavigation()
is called for the window the request was made with.
What makes this easier to reproduce for test_imagecapture.html is the fact that it's triggering GC explicitly, I suppose.
Reporter | ||
Comment 2•5 years ago
|
||
Also see test_imagecapture.html --verify with some more coverage on try. The failure mode that times out seems to later leak the world.
Reporter | ||
Comment 3•5 years ago
|
||
This seems to me like a real and unexpected issue. Could we get this triaged?
Comment 4•5 years ago
|
||
Andrew, can you look into this?
Comment 5•5 years ago
|
||
Sorry for the delay. I don't know what could be going wrong here. I think it is very unlikely that the window that is currently being displayed is being collected. Note that the OnNavigation doesn't actually require collection, just navigating away from the page. But I'm not sure how a GC would trigger that, and the test doesn't seem to do any navigation as far as I can see. I'll try retriggering the TV builds in your try run and see if I can come up with anything from the logs.
Reporter | ||
Comment 6•5 years ago
|
||
You should be able to get all the logs (and so much more) you need from pernosco.
Reporter | ||
Comment 7•5 years ago
|
||
Ah, the database for the pernosco recording has expired. I have requested to get it rebuilt.
The Pernosco session is back up at https://pernos.co/debug/_h2eccRj57KsBL0IFrJW2Q/index.html.
Reporter | ||
Comment 9•5 years ago
|
||
I had another look, and it's not the window of the gUM request that gets collected, it's the ancestor of the window of the gUM request. I'm updating the title to reflect this.
- We get a gUM request for window 36 (outer window 30).
- Window 11 gets cleaned up through FreeInnerObjects, and calls MediaManager::OnNavigation with window id 11.
- MediaManager traverses window 11's docshell for any child windows and finds outer window 30, then cleans the gUM request up for its inner window (36).
See the pernosco notebook for details.
Comment 10•5 years ago
|
||
Sorry for the delay here. It sounds like this is deliberate behavior on the part of the media code? I'm afraid I don't really have any idea what the right behavior for it. Boris, do you have any idea how this media manager code should deal with child windows?
Comment 11•5 years ago
|
||
OK, so the sequence of events here is:
- An
about:blank
document is placed into bfcache. I expect this is the initial about:blank that pre-exists the initial test harness toplevel load. - Eventually it is evicted from bfcache via GC calling
~nsSHEntryShared
(I'm a little surprised this causesnsSHEntryShared::RemoveFromBFCacheSync
to do something useful, but I haven't looked into how we maintain our bfcache state recently). - This calls
nsGlobalWindowInner::FreeInnerObjects
on theabout:blank
inner, which callsmNavigator->OnNavigation()
. This is almost certainly wrong semantically: the navigation happened a long time ago, when we were placed in bfcache. Of course we do callOnNavigation
then too, viansGlobalWindowOuter::SetNewDocument
calling it on the inner we are navigating away from. Navigator::OnNavigation
callsMediaManager::OnNavigation
, fine.MediaManager::OnNavigation
callsIterateWindowListeners
which starts poking ataWindow->GetDocShell()
, whereaWindow
is the dying inner. Importantly, the inner is NOT THE CURRENT INNER FOR THAT DOCSHELL. So now we are walking the child shells, which represent subframes in whatever document is in fact currently loaded in the docshell, not the document being evicted from bfcache, and tearing down their state. This seems pretty bogus.
A minimal fix is to bail out from MediaManager::OnNavigation
at the very least when it's called for a non-current inner. But IterateWindowListeners
is called in various places, and I have approximately 0 confidence that other callsites don't call it for non-current inners. For example, MediaManager::DeviceListChanged
calls it on all inners in the inner window table, which would certainly include non-current ones. That means, incidentally, that it multiple-walks all subframes (once via every ancestor, and once per subframe), which is likely also not great...
Looks like MediaManager::MediaCaptureWindowState
will always be calling IterateWindowListeners
with a current inner. On the other hand, StopScreensharing
looks like it just starts with some random windowid, and no guarantees that it's current.
Someone who actually knows the MediaManager code needs to decide what it's trying to do here and how to accomplish it, but the current setup is just broken.
Comment 12•5 years ago
|
||
I suspect just not walking descendants in IterateWindowListeners
if the window is not current would probably be enough to fix most of the issue here, though MediaManager::DeviceListChanged
would still be doing things O(N^2) in depth of the window tree...
Reporter | ||
Comment 13•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #11)
NOT THE CURRENT INNER FOR THAT DOCSHELL.
Wow. Thank you gentlemen, that explains things. The iteration code here is from bug 1062876, which is an old sec bug. Prior to that we were not handling iframes properly, so I suspect we still have to do some form of window iteration here. jib, you reviewed that one, wanna take this?
The DeviceListChanged behavior is from bug 1364038, and, well, is not great.
P3 given how old this behavior is.
Updated•5 years ago
|
Assignee | ||
Comment 14•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Comment 15•5 years ago
|
||
Pushed by jbruaroey@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6c86995d33de Avoid stepping on docShells of unrelated docs in MediaManager device stopping code. r=bzbarsky,pehrsons
Comment 16•5 years ago
|
||
bugherder |
Comment 17•5 years ago
|
||
Given the age of this bug, I assume we can live with it on older releases. Feel free to nominate for uplift if you feel strongly otherwise.
Updated•2 years ago
|
Description
•