Closed Bug 1237950 Opened 8 years ago Closed 4 years ago

CPOW deadlock in parent->child innerHTML setter

Categories

(Core :: DOM: Content Processes, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: freddy, Unassigned)

Details

Attachments

(3 files)

More or less daily, when I visit websites, Aurora suddenly hangs for me indefinitely. The only way to recover is to "kill -9". This is on 64bit linux.


I'd be happy to find out why, but I *guess* this message in stdout is part of the problem:

> thread '<unnamed>' panicked at 'Box<Any>', /builds/slave/m-aurora-l64-ntly-000000000000/build/src/media/libstagefright/binding/capi.rs:103
Flags: needinfo?(giles)
The panic message shouldn't cause hangs. It's caught by the caller.

Can you attach with gdb and get a backtrace? Something like:

gdb --pid <pid of firefox>
bt
quit

Would give an idea where it's stuck.
Flags: needinfo?(giles)
Assignee: nobody → giles
Priority: -- → P2
Attached file backtrace
Attaching backtrace
But there seem to be some missing symboles. Let me know what I can do to provide more information.
Attached file bt-with-preamble.txt
I couldn't reproduce it yesterday, but now twice today. I've created another attachment that includes the gdb output before asking for the backtrace.
Component: Audio/Video → Audio/Video: Playback
> #2  0x00007f2cfb5ea23c in PR_WaitCondVar () from /home/freddy/opt/aurora/libnspr4.so
> #3  0x00007f2cecf671c3 in mozilla::CondVar::Wait(unsigned int) [clone .isra.10] () from /home/freddy/opt/aurora/libxul.so
> #4  0x00007f2ced4c399b in mozilla::ipc::MessageChannel::WaitForSyncNotify(bool) () from /home/freddy/opt/aurora/libxul.so
> #5  0x00007f2ced4cb31d in mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*) () from /home/freddy/opt/aurora/libxul.so
> #6  0x00007f2ced544605 in mozilla::jsipc::PJavaScriptParent::SendSet(unsigned long const&, mozilla::jsipc::JSIDVariant const&, mozilla::jsipc::JSVariant const&, mozilla::jsipc::JSVariant const&, mozilla::jsipc::ReturnStatus*) () from /home/freddy/opt/aurora/libxul.so

So it's trying to set a property on a parent->child CPOW, which is a sync IPC, and it's blocked waiting for the response.  If it happens again, the next thing to get is the stack from the main thread of the content process, to find out why it's apparently not answering.  Also, using gdb to `call DumpJSStack()` in the parent process should help find the JS code involved on that side.
Flags: needinfo?(fbraun)
I hope nobody minds, but I'm going to move this to IPC and unassign it because I'm pretty sure it's not media-related.  Where it belongs is probably in whichever component (or add-on) is using that CPOW, but we'd need the JS stack to find that.
Assignee: giles → nobody
Component: Audio/Video: Playback → IPC
Summary: thread '<unnamed>' panicked at 'Box<Any>', /builds/slave/m-aurora-l64-ntly-000000000000/build/src/media/libstagefright/binding/capi.rs:103 → CPOW deadlock, uncertain cause
Thanks for taking a look. I'll try getting the main thread stack and the JS stack as suggested.
But as I said, reproducing is not reliable.
Flags: needinfo?(fbraun)
OK, I failed to get the information. Typed 'call DumpJSStack' and nothing happened. Not sure how to proceed. Can you help me out? :)
Flags: needinfo?(jld)
Attached file js stack
Ha!
stdout/stderr were pointing to /dev/pts, so I learned myself some gdb hackery to change that (note to self: 'p close(1)' and 'p creat("/tmp/1.log", 0600)')

The JS stack is attached.
Flags: needinfo?(jld)
Seems like a NoScript hang. I'm thus CCing Giorgio.
(In reply to Frederik Braun [:freddyb] from comment #7)
> OK, I failed to get the information. Typed 'call DumpJSStack' and nothing
> happened. Not sure how to proceed. Can you help me out? :)

Sorry about that; I see you figured it out, but I should've mentioned that it goes to the process's stdout.

From the stack:
> 0 ns.showNextNoscriptElement(script = [object CPOW [object HTMLScriptElement]]) ["jar:file:///home/freddy/.mozilla/firefox/leni7gbj.default-1414410181625/extensions/%7B73a6fe31-595d-460b-a920-fcc0f8843232%7D.xpi!/components/noscriptService.js":4228]

Which looks like it'd be:
>           el.innerHTML = child.nodeValue;

Which is in the part of NoScript that takes <noscript> elements and injects their text into the page.  I'm assuming something on the child side, as part of the real innerHTML setter / HTML parser / etc., is making a request back to the parent that deadlocks.

I notice that one of the examples in https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/Cross_Process_Object_Wrappers has chrome script setting innerHTML on a CPOW, so if the other half of the deadlock isn't also caused by NoScript itself, then either this ought to work or we need to fix that wiki page.
Component: IPC → DOM: Content Processes
Summary: CPOW deadlock, uncertain cause → CPOW deadlock in parent->child innerHTML setter
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
We don't have XPCOM add-ons anymore, and recently there was a post to mozilla.dev.platform that CPOWs are effectively no longer used (only as opaque tokens to reference objects in the other process, but no property accesses or method calls are done on them, if I understand correctly).  So this is probably WONTFIX.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: