Closed Bug 542023 Opened 14 years ago Closed 13 years ago

mochitest-ipcplugins crash/abort: Assertion (mDeferred.empty() || 1 == mDeferred.size()) failed in test_painting.html (Error calling method on NPObject) [@ mozilla::ipc::RPCChannel::DebugAbort]

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: benjamin, Unassigned)

References

Details

Random failure on mochitest-ipcplugins:
197 ERROR TEST-UNEXPECTED-FAIL | /tests/modules/plugin/test/test_painting.html | [SimpleTest/SimpleTest.js, window.onerror] An error occurred - Error calling method on NPObject! at http://localhost:8888/tests/modules/plugin/test/test_painting.html:105

###!!! [RPCChannel][Child][/builds/moz2_slave/mozilla-central-linux/build/ipc/glue/RPCChannel.cpp:276] Assertion (mDeferred.empty() || 1 == mDeferred.size()) failed.  expected mDeferred to have 0 or 1 items, but it has %lu (triggered by rpc)
  local RPC stack size: 2863316886
  remote RPC stack guess: 8
  deferred stack size: 2863316886
  out-of-turn RPC replies stack size: 2863316886
  Pending queue size: 2863317142, front to back:

<cjones> whew, definitely use-after-free then

Use-after-free of what, the RPCChannel?

 0  libxul.so!mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) [ipc_message.h:0235fc257969 : 97 + 0x0]
 1  libxul.so!mozilla::ipc::RPCChannel::OnMaybeDequeueOne() [RPCChannel.cpp:0235fc257969 : 275 + 0x2f]
 2  libxul.so!RunnableMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)(), Tuple0>::Run() [tuple.h:0235fc257969 : 383 + 0xd]
 3  libxul.so!MessageLoop::RunTask(Task*) [message_loop.cc:0235fc257969 : 326 + 0x7]
Yes.  The debug spew is garbage (a queue with 2.8 billion entries isn't possible), and in your blog post you stated that the crash was a segfault in DebugAbort().

It appears that an IPC message came in just before the other side crashed (suggested by the OnMaybeDequeueOne() runnable on the stack), and the crash was detected and error handling invoked (leading to the deletion of the channel) before that runnable was dequeued.

This is actually a regression from bug 538586, because this code was originally designed so that the error handling was always the last thing a *Channel did before being deleted.  I'm not really sure what the best way is to fix this, needs some think time.  We can always fallback on refcounting *Channels though.
I believe that the assertion and subsequent DebugAbort both happened in the child process. The parent process didn't crash and (AFAICT) worked perfectly.
Hmm.  I still think it's the same symptom, (early) shutdown racing with an incoming message, but I need to look into the cause more.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264449841.1264451487.11736.gz
Linux mozilla-central opt test mochitest-other on 2010/01/25 12:04:01
s: moz2-linux-slave20
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.