Open
Bug 1401389
Opened 7 years ago
Updated 2 years ago
WebRTC Externals 1.0.0 can cause parent process to hang
Categories
(WebExtensions :: General, defect, P3)
Tracking
(Not tracked)
NEW
People
(Reporter: drno, Unassigned)
Details
Attachments
(3 files)
I have Firefox Nightly 57.0a1 (2017-09-19) (64-bit) on OSX the WebRTC Externals web extension installed https://github.com/fippo/webrtc-externals
With the extension installed and the following STR I get in about 75% a lockup of the parent process and can only recover by force quitting Firefox.
- open a room on https://appear.in
- join the same room with Firefox with the extension installed
- wait for the call to connect
- open a new tab via command+t
- close the new tab via command-w
- open a new tab via command+t
- close the new tab via command-w
- click on the enlarging arrow in the upper right corner of the remote video
- open a new tab via command+t
- close the new tab via command+w
- leave the call bt clicking on the red X at the bottom of the video rendering area
At one of these steps the browsers stops to respond. After some time the cursor turns into the beach ball of death. Call continues for some time. After some time the call disconnects. Only way to recover from this is to force-quit Firefox.
Note: it might need two or three attempts with the above steps to repro.
Reporter | ||
Comment 1•7 years ago
|
||
Reporter | ||
Comment 2•7 years ago
|
||
Reporter | ||
Comment 3•7 years ago
|
||
In here it look to me like the content process is waiting on a blocking IPC request answer for ever.
Comment 4•7 years ago
|
||
Can't reproduce on Windows after a few tries.
Comment 5•7 years ago
|
||
Can reproduce on OS X, it gets pretty bad.
Comment 6•7 years ago
|
||
can reproduce on linux too in a somewhat simpler case. Just make a call in two tabs, then switch between the tabs a couple of times. The extension calls getStats() every second and then does a window.postMessage + channel.postMessage. Possibly that goes wrong if the tab is in the background?
Comment 7•7 years ago
|
||
Andy, should this bug be under WebExtensions, or do you think there's something that can be done to the add-on to fix this?
Flags: needinfo?(amckay)
Priority: -- → P3
Comment 8•7 years ago
|
||
Sure, we don't really know where the problem is though at this point.
Component: Extension Compatibility → WebExtensions: Untriaged
Flags: needinfo?(amckay)
Product: Firefox → Toolkit
Comment 9•7 years ago
|
||
I can repro on Ubuntu 17.04 with recent Nightly by just entering an appear.in room with one remote participant while having the extension enabled. A couple seconds into the call, all UI interaction freezes but audio/video runs fine.
Attaching gdb shows the parent process main thread in weird places:
> (gdb) bt
> #0 0x000029d6648ef5b0 in ()
> #1 0xfff8800000000000 in ()
> #2 0x7ff0000000000000 in ()
> #3 0x4018000000000000 in ()
> #4 0xfff8800000000018 in ()
> #5 0x7ff0000000000000 in ()
> #6 0x0000000000000000 in ()
Resuming the thread and breaking again appears to just change the last few digits of frame #0 a bit. One core is pegged at 100%, presumable running the main thread.
The activity trace from drno looks more sane though. It includes this bit which looks bad (sync dispatch on main thread):
> 2224 nsFrameMessageManager::SendMessage(nsTSubstring<char16_t> const&, JS::Handle<JS::Value>, JS::Handle<JS::Value>, nsIPrincipal*, JSContext*, unsigned char, JS::MutableHandle<JS::Value>, bool) (in XUL) + 684 [0x1157dc99c]
> 2224 mozilla::dom::TabChild::DoSendBlockingMessage(JSContext*, nsTSubstring<char16_t> const&, mozilla::dom::ipc::StructuredCloneData&, JS::Handle<JSObject*>, nsIPrincipal*, nsTArray<mozilla::dom::ipc::StructuredCloneData>*, bool) (in XUL) + 323 [0x1167cfcf3]
> 2224 mozilla::dom::PBrowserChild::SendSyncMessage(nsTString<char16_t> const&, mozilla::dom::ClonedMessageData const&, nsTArray<mozilla::jsipc::CpowEntry> const&, IPC::Principal const&, nsTArray<mozilla::dom::ipc::StructuredCloneData>*) (in XUL) + 595 [0x115144503]
> 2224 mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*) (in XUL) + 2131 [0x114f25343]
> 2224 mozilla::ipc::MessageChannel::WaitForSyncNotify(bool) (in XUL) + 154 [0x114f25a8a]
> 2224 mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) (in libmozglue.dylib) + 28 [0x10e3834dc]
Any chance we can bump priority on this? Or could you aid me in what to look for in order to debug it further?
Flags: needinfo?(amckay)
Comment 10•7 years ago
|
||
Sorry, at this point we do not have the people to dig more into this one.
Flags: needinfo?(amckay)
Comment 11•7 years ago
|
||
Slightly above the fragment pasted in comment 9 is a stack frame for:
```
nsGlobalWindow::DispatchDOMWindowCreated() (in XUL) + 101 [0x1157ed005]
```
The WebExtensions framework does handle that event but I believe only in extension content processes (and it sound like this is a web content process that is pegged). In any case, it doesn't look to me like the handler for that event does anything that can trigger synchronous IPC.
Its possible that grabbing a profile during one of these events (see https://perf-html.io/) would give us more information, but I think the best bet is trying to get this in front of somebody more skilled at analyzing and triaging these sorts of problems, perhaps by changing the component to something like Firefox:Untriaged
Comment 12•7 years ago
|
||
since I just had to debug a coworkers machine: should we (me) pull the extension from the store until this is fixed?
Reporter | ||
Comment 13•7 years ago
|
||
(In reply to Philipp Hancke [:fippo] from comment #12)
> since I just had to debug a coworkers machine: should we (me) pull the
> extension from the store until this is fixed?
That is probably a good idea, as it currently lets Firefox appear unstable with little chance users of the extension will figure out that it's caused by the extension.
Comment 14•7 years ago
|
||
deactivated. Source is available from https://github.com/fippo/webrtc-externals if anyone wants to give it a try.
Comment 15•7 years ago
|
||
i've spent some more time debugging this. It doesn't happen when I deactivate the stats graphs. Those cause issues in Edge as well so i suspect the issue is *somewhere* in the graph library (which, to be fair, was written as part of chrome's internals).
Will remove the graphs in Firefox and republish. Shall we resolve as "works for me"?
Comment 16•7 years ago
|
||
Personally I'd still like to understand the underlying problem here. Whether the graph lib is doing something crazy or we are reacting to it in crazy ways. Perhaps this knowledge can let us shrink the testcase and STR a bit though.
Updated•6 years ago
|
Product: Toolkit → WebExtensions
Comment 17•6 years ago
|
||
Bulk move of bugs per https://bugzilla.mozilla.org/show_bug.cgi?id=1483958
status-firefox57:
affected → ---
Component: Untriaged → General
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•