Closed
Bug 1049801
Opened 10 years ago
Closed 10 years ago
Intermittent test_bug949946.html,test_bug1002702.html | application crashed [@ mozilla::ipc::MessageChannel::DispatchOnChannelConnected(int)]
Categories
(Core :: DOM: Workers, defect)
Tracking
()
RESOLVED
FIXED
mozilla34
Tracking | Status | |
---|---|---|
firefox32 | --- | unaffected |
firefox33 | --- | fixed |
firefox34 | --- | fixed |
firefox-esr24 | --- | unaffected |
firefox-esr31 | --- | unaffected |
b2g-v1.4 | --- | unaffected |
b2g-v2.0 | --- | unaffected |
b2g-v2.1 | --- | fixed |
People
(Reporter: RyanVM, Assigned: bkelly)
References
Details
(Keywords: crash, intermittent-failure)
Attachments
(2 files, 9 obsolete files)
1.21 KB,
patch
|
khuey
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
6.20 KB,
patch
|
bkelly
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
https://tbpl.mozilla.org/php/getParsedLog.php?id=45346430&tree=Mozilla-Central b2g_emulator_vm mozilla-central opt test mochitest-8 on 2014-08-06 10:17:55 PDT for push f41a267983c1 slave: tst-linux64-spot-1111 10:25:11 INFO - 203 INFO TEST-START | /tests/dom/workers/test/test_bug1002702.html 10:25:12 INFO - 204 INFO TEST-OK | /tests/dom/workers/test/test_bug1002702.html | took 1695ms 10:25:14 INFO - 205 INFO [Parent 692] WARNING: pipe error (68): Connection reset by peer: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 452 10:25:14 INFO - 206 INFO ############ ErrorPage.js 10:25:14 INFO - 207 INFO System JS : ERROR chrome://specialpowers/content/SpecialPowersObserver.js:96 - NS_ERROR_NOT_INITIALIZED: Component returned failure code: 0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [nsIMessageSender.sendAsyncMessage] 10:25:14 INFO - 208 INFO ############################### browserElementPanning.js loaded 10:25:14 INFO - 209 INFO ######################## BrowserElementChildPreload.js loaded 10:30:48 INFO - mozcrash INFO | Downloading symbols from: http://pvtbuilds.pvt.build.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-central-emulator/20140806090025/b2g-34.0a1.en-US.android-arm.crashreporter-symbols.zip 10:31:09 INFO - DeviceRunner TEST-UNEXPECTED-FAIL | /tests/dom/workers/test/test_bug1002702.html | application timed out after 330.0 seconds with no output 10:31:09 WARNING - PROCESS-CRASH | /tests/dom/workers/test/test_bug1002702.html | application crashed [@ mozilla::ipc::MessageChannel::DispatchOnChannelConnected(int)] 10:31:09 INFO - Crash dump filename: /tmp/tmp70P6mW/5e89afa6-cacf-6237-245e6783-65ddee11.dmp 10:31:09 INFO - Operating system: Android 10:31:09 INFO - 0.0.0 Linux 2.6.29-g41a03df #22 Thu Jun 26 10:59:09 CST 2014 armv7l Android/full/generic:4.0.4.0.4.0.4/OPENMASTER/eng.cltbld.20140806.122837:eng/test-keys 10:31:09 INFO - CPU: arm 10:31:09 INFO - 0 CPUs 10:31:09 INFO - Crash reason: SIGSEGV 10:31:09 INFO - Crash address: 0x5a5a5a5e 10:31:09 INFO - Thread 24 (crashed) 10:31:09 INFO - 0 libxul.so!mozilla::ipc::MessageChannel::DispatchOnChannelConnected(int) [WeakPtr.h : 93 + 0x2] 10:31:09 INFO - r4 = 0x4372f1a0 r5 = 0x43187be0 r6 = 0x460bfb60 r7 = 0x4372f1ac 10:31:09 INFO - r8 = 0x460bfb58 r9 = 0x00000000 r10 = 0x00000000 fp = 0x00000001 10:31:09 INFO - sp = 0x460bfb28 lr = 0x40bfe5d1 pc = 0x40bfe3e4 10:31:09 INFO - Found by: given as instruction pointer in context 10:31:09 INFO - 1 libxul.so!RunnableMethod<IPC::Channel, bool (IPC::Channel::*)(IPC::Message*), Tuple1<IPC::Message*> >::Run() [tuple.h:f41a267983c1 : 393 + 0x7] 10:31:09 INFO - r4 = 0x4372f1a0 r5 = 0x43187be0 r6 = 0x460bfb60 r7 = 0x4372f1ac 10:31:09 INFO - r8 = 0x460bfb58 r9 = 0x00000000 r10 = 0x00000000 fp = 0x00000001 10:31:09 INFO - sp = 0x460bfb30 pc = 0x40bfe5d1 10:31:09 INFO - Found by: call frame info 10:31:09 INFO - 2 libxul.so!MessageLoop::RunTask(Task*) [message_loop.cc:f41a267983c1 : 357 + 0x5] 10:31:09 INFO - r4 = 0x4372f1a0 r5 = 0x43187be0 r6 = 0x460bfb60 r7 = 0x4372f1ac 10:31:09 INFO - r8 = 0x460bfb58 r9 = 0x00000000 r10 = 0x00000000 fp = 0x00000001 10:31:09 INFO - sp = 0x460bfb38 pc = 0x40bf3941 10:31:09 INFO - Found by: call frame info 10:31:09 INFO - 3 libxul.so!MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) [message_loop.cc:f41a267983c1 : 365 + 0x5] 10:31:09 INFO - r4 = 0x00000001 r5 = 0x460bfb50 r6 = 0x460bfb60 r7 = 0x4372f1ac 10:31:09 INFO - r8 = 0x460bfb58 r9 = 0x00000000 r10 = 0x00000000 fp = 0x00000001 10:31:09 INFO - sp = 0x460bfb48 pc = 0x40bf759f 10:31:09 INFO - Found by: call frame info 10:31:09 INFO - 4 libxul.so!MessageLoop::DoWork() [message_loop.cc:f41a267983c1 : 443 + 0x7] 10:31:09 INFO - r4 = 0x4372f1a0 r5 = 0x460bfb50 r6 = 0x460bfb60 r7 = 0x4372f1ac 10:31:09 INFO - r8 = 0x460bfb58 r9 = 0x00000000 r10 = 0x00000000 fp = 0x00000001 10:31:09 INFO - sp = 0x460bfb50 pc = 0x40bf815d 10:31:09 INFO - Found by: call frame info 10:31:09 INFO - 5 libxul.so!mozilla::ipc::DoWorkRunnable::Run() [MessagePump.cpp:f41a267983c1 : 233 + 0x7] 10:31:09 INFO - r4 = 0x4372f1a0 r5 = 0x00000001 r6 = 0x43151740 r7 = 0xffffffff 10:31:09 INFO - r8 = 0x460bfbdf r9 = 0x00000000 r10 = 0x00000000 fp = 0x00000001 10:31:09 INFO - sp = 0x460bfb80 pc = 0x40c00623 10:31:09 INFO - Found by: call frame info
Assignee | ||
Comment 1•10 years ago
|
||
This appears to be MessageChannel::DispatchOnChannelConnected() firing asynchronously after the MessageChannel has been destructed.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•10 years ago
|
||
Here's a stab at allowing the DispatchOnChannelConnected runnable task to be canceled when the MessageChannel is destroyed. https://tbpl.mozilla.org/?tree=Try&rev=d8672b248575
Assignee | ||
Comment 3•10 years ago
|
||
*sad trombone* Let's try again, this time with better nsRefPtr usage. https://tbpl.mozilla.org/?tree=Try&rev=7eef6d41e6eb
Attachment #8469701 -
Attachment is obsolete: true
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 5•10 years ago
|
||
This is a slightly better version of the patch that should avoid DispatchOnChannelConnected() racing with Clear() at the cost of a small amount of memory. https://tbpl.mozilla.org/?tree=Try&rev=b45449e71216 I think this patch should be pretty solid at this point, so flagging for review.
Attachment #8469782 -
Attachment is obsolete: true
Attachment #8470083 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 6•10 years ago
|
||
It seems this patch could also fix bug 1040288 which might be the same problem, but with a different crash point.
Comment on attachment 8470083 [details] [diff] [review] Cancel the DispatchOnChannelConnected runnable when destructing the MessageChannel. (v2) Review of attachment 8470083 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/MessageChannel.cpp @@ +1424,5 @@ > MessageChannel::OnChannelConnected(int32_t peer_id) > { > + // OnChannelConnected() should only ever be called once. > + MOZ_ASSERT(!mOnChannelConnectedTask); > + mOnChannelConnectedTask = new RefCountedTask(NewRunnableMethod( I don't think this is safe, you're setting something on this thread (should be the IO thread right? We should have an assertion here...) and then reading/modifying it on the other thread in Clear().
Attachment #8470083 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 8•10 years ago
|
||
Here's a new version that uses locking to safely store and then cancel the mOnChannelConnectedTask. Since we only get connected once per channel, I think the overhead should be fine. Do follow-up on our IRC conversation, I believe we are guaranteed that MessageChannel exists when MessageChannel::OnChannelConnected() is called. This method is called from ProcessLink::OnChannelConnected(), but MessageChannel deletes the link prior to destructing itself. So the channel must exist at this point. I also added a check in ProcessChannel::OnChannelConnected() to avoid forcing the channel back to the connected state if its already errored out of the opening state. This just seemed wrong to me. https://tbpl.mozilla.org/?tree=Try&rev=120f04be4d5d
Attachment #8470083 -
Attachment is obsolete: true
Attachment #8470278 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 9•10 years ago
|
||
Comment on attachment 8470278 [details] [diff] [review] Cancel the DispatchOnChannelConnected runnable when destructing the MessageChannel. (v3) Well, this patch is no good as the monitor is not available in Clear() if the actor was never opened.
Attachment #8470278 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 10•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=813111fbb068
Attachment #8470278 -
Attachment is obsolete: true
Assignee | ||
Comment 11•10 years ago
|
||
Comment on attachment 8470347 [details] [diff] [review] Cancel the DispatchOnChannelConnected runnable when destructing the MessageChannel. (v4) Try looks quite good for this patch.
Attachment #8470347 -
Flags: review?(bent.mozilla)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 14•10 years ago
|
||
Hmm, the stack in comment 13 looks like a different problem. :-\
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 16•10 years ago
|
||
Comment 13 and comment 15 spun off to bug 1052090
Reporter | ||
Comment 17•10 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=45880696&tree=Mozilla-Inbound Really hoping we can get a review here soon. This seems to be hitting with pretty high frequency these days.
Flags: needinfo?(bent.mozilla)
Summary: Intermittent test_bug1002702.html | application crashed [@ mozilla::ipc::MessageChannel::DispatchOnChannelConnected(int)] → Intermittent test_bug949946.html,test_bug1002702.html | application crashed [@ mozilla::ipc::MessageChannel::DispatchOnChannelConnected(int)]
Comment hidden (Legacy TBPL/Treeherder Robot) |
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #17) > Really hoping we can get a review here soon. This seems to be hitting with > pretty high frequency these days. The needinfo flag does not increase my review bandwidth.
Flags: needinfo?(bent.mozilla)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 21•10 years ago
|
||
Rebased to latest m-c. Also, here's a new try so we can see if it helps with the latest orange we are seeing: https://tbpl.mozilla.org/?tree=Try&rev=29a52763fd81 (Not sure why those extra commits showed up in the log on the try...)
Attachment #8470347 -
Attachment is obsolete: true
Attachment #8470347 -
Flags: review?(bent.mozilla)
Attachment #8473043 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 22•10 years ago
|
||
Botched the rebase in the last version. https://tbpl.mozilla.org/?tree=Try&rev=4af2a73c3369
Attachment #8473043 -
Attachment is obsolete: true
Attachment #8473043 -
Flags: review?(bent.mozilla)
Attachment #8473117 -
Flags: review?(bent.mozilla)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 24•10 years ago
|
||
Hmm, it seems this patch introduces a MessageChannel leak somehow. I'll need more time to investigate.
Assignee | ||
Updated•10 years ago
|
Attachment #8473117 -
Flags: review?(bent.mozilla)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 27•10 years ago
|
||
My current theory is that GeckoChildProcessHost does not behave well if its OnChannelConnected() method is never called. It sets its mProcessState member there which then influences various other paths. Going to add some debug to a try build to investigate that.
Assignee | ||
Comment 28•10 years ago
|
||
Actually, comparing my orange try run to a previous green result on tbpl, I can see that there are no additional leaks. The pre-existing code leaks 4964 bytes, while my new patch results in a leak of 5084 bytes. The increase is solely due to the increased size of MessageChannel and does not reflect any new object leaks. Kyle, is there some way I can increase the allowed number of leaked bytes here?
Flags: needinfo?(khuey)
Assignee | ||
Comment 29•10 years ago
|
||
Comment on attachment 8473117 [details] [diff] [review] Cancel the DispatchOnChannelConnected runnable when destructing the MessageChannel. (v6) Reflagging for review as I don't see a real leak here.
Attachment #8473117 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 30•10 years ago
|
||
Nathan pointed me towards the pref on IRC. This one line patch increases the leak threshold to account for the already-leaked objects increasing slightly in size.
Attachment #8473842 -
Flags: review?(khuey)
Flags: needinfo?(khuey)
Assignee | ||
Comment 31•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=68bee545b92a
Attachment #8473842 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 32•10 years ago
|
||
The try build looks very good. 149 triggers without a failure on b2g emulator debug M11.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment on attachment 8473117 [details] [diff] [review] Cancel the DispatchOnChannelConnected runnable when destructing the MessageChannel. (v6) Review of attachment 8473117 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/MessageChannel.cpp @@ +252,5 @@ > + mOnChannelConnectedTask = new RefCountedTask(NewRunnableMethod( > + this, > + &MessageChannel::DispatchOnChannelConnected)); > + > + Nit: Only need one newline here. @@ +1499,5 @@ > void > MessageChannel::OnChannelConnected(int32_t peer_id) > { > + { > + MonitorAutoLock lock(*mMonitor); I don't believe it's necessary to use a lock here or in DispatchOnChannelConnected since a) mPeerPid is only set once before the mOnChannelConnectedTask is posted and b) mPeerPid is only checked once when mOnChannelConnectedTask actually runs. ::: ipc/glue/MessageLink.cpp @@ +348,5 @@ > MonitorAutoLock lock(*mChan->mMonitor); > + // If the channel has errored out for some reason, do not force > + // it back into the connected state. > + if (NS_WARN_IF(mChan->mChannelState != ChannelOpening)) { > + return; I'm not sure returning here is correct... Is the previous listener guaranteed to already have been notified? (mExistingListener)
Assignee | ||
Comment 42•10 years ago
|
||
Yes, I think we need to call mExistingListener->OnChannelConnected() there. Here's an updated patch that does this. https://tbpl.mozilla.org/?tree=Try&rev=7e4595209f73 This may require another tweak to the allowed leak on b2g as it has changed slightly since P1 was last tested. This is also not tested locally as I need to do a clobber build.
Attachment #8473117 -
Attachment is obsolete: true
Attachment #8473117 -
Flags: review?(bent.mozilla)
Attachment #8476068 -
Flags: review?(bent.mozilla)
Comment on attachment 8476068 [details] [diff] [review] Cancel the DispatchOnChannelConnected runnable when destructing the MessageChannel. (v7) Review of attachment 8476068 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/MessageChannel.cpp @@ +1512,5 @@ > + if (!mListener) { > + return; > + } > + > + MOZ_ASSERT(mPeerPidSet); Nit: Might as well always assert this, put it above the early-return. ::: ipc/glue/MessageLink.cpp @@ +349,3 @@ > { > MonitorAutoLock lock(*mChan->mMonitor); > + // Only update channel state if its still thinks its opening. Don not Nit: typo "Don" @@ +349,5 @@ > { > MonitorAutoLock lock(*mChan->mMonitor); > + // Only update channel state if its still thinks its opening. Don not > + // force it into connected if it has errored out. > + if (mChan->mChannelState == ChannelOpening) { Can you assert that mChannelState is ChannelError here if it's not ChannelOpening?
Attachment #8476068 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 44•10 years ago
|
||
> @@ +349,5 @@ > > { > > MonitorAutoLock lock(*mChan->mMonitor); > > + // Only update channel state if its still thinks its opening. Don not > > + // force it into connected if it has errored out. > > + if (mChan->mChannelState == ChannelOpening) { > > Can you assert that mChannelState is ChannelError here if it's not > ChannelOpening? After looking at this more closely to see if ChannelError is safe to assert, I'm fairly convinced this can't happen. We either SynchronouslyClose() before setting ChannelError or the error is coming up from the link. So this patch just asserts we are in ChannelOpening state and removes the conditional logic I previously added. I do want to see another try after this change, though, as its hard to tell all the corner cases. https://tbpl.mozilla.org/?tree=Try&rev=5a32a9f04f21
Attachment #8476068 -
Attachment is obsolete: true
Attachment #8476371 -
Flags: review+
Assignee | ||
Comment 45•10 years ago
|
||
That try showed we clearly cannot assert ChannelOpening there. I'm trying just dropping the changes to ProcessLink::OnChannelConnected() completely since they should not be strictly necessary for solving this bug: https://tbpl.mozilla.org/?tree=Try&rev=0097196f942f
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 47•10 years ago
|
||
Ok, that last try hit a duplicate call to OnChannelConnected() from the link. Either we need to just add back the conditional or possibly calling the mExistingListener is wrong. This adds back the conditional, but also always calls mExistingListener: https://tbpl.mozilla.org/?tree=Try&rev=7385a4beaa15
Assignee | ||
Updated•10 years ago
|
Attachment #8476068 -
Attachment is obsolete: false
Assignee | ||
Updated•10 years ago
|
Attachment #8476371 -
Attachment is obsolete: true
Assignee | ||
Comment 48•10 years ago
|
||
Just to clarify, the try in comment 47 is patch v7 originally r+'d in comment 43.
Assignee | ||
Comment 49•10 years ago
|
||
Update the patch to fix the review nits, but not attempt to assert ChannelOpening or ChannelError in ProcessLink::OnChannelConnected. Instead, I expanded the comment to indicate we're not forcing the connected state if the channel has errored, has started closing, or otherwise changed state. The last try is still running and so far looks good. Getting a jump on this one: https://tbpl.mozilla.org/?tree=Try&rev=c1039023f525
Attachment #8476068 -
Attachment is obsolete: true
Attachment #8476722 -
Flags: review+
Assignee | ||
Comment 50•10 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/327482e1a672 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/909a529527a7
https://hg.mozilla.org/mozilla-central/rev/327482e1a672 https://hg.mozilla.org/mozilla-central/rev/909a529527a7
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Reporter | ||
Comment 52•10 years ago
|
||
The failure was B2G-only, but the code is cross-platform? Should we get this on Aurora33 as well?
status-b2g-v1.4:
--- → unaffected
status-b2g-v2.0:
--- → unaffected
status-b2g-v2.1:
--- → fixed
status-firefox32:
--- → unaffected
status-firefox33:
--- → affected
status-firefox34:
--- → fixed
status-firefox-esr24:
--- → unaffected
status-firefox-esr31:
--- → unaffected
Assignee | ||
Comment 53•10 years ago
|
||
Comment on attachment 8476722 [details] [diff] [review] Cancel the DispatchOnChannelConnected runnable when destructing the MessageChannel. (v9) Approval Request Comment [Feature/regressing bug #]: Bug 1013571 [User impact if declined]: Race condition in worker startup/teardown leading to use-after-free. Only effects e10s mode, but I think it would be prudent to uplift to avoid potential crashes. [Describe test coverage new/current, TBPL]: This was caught by the pre-existing test dom/workers/test/test_bug1002702.html [Risks and why]: I believe this is low risk. [String/UUID change made/needed]: None
Attachment #8476722 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 54•10 years ago
|
||
Comment on attachment 8473842 [details] [diff] [review] P1 Increase allowed leak threshold in tests to account for increased MessageChannel size. (v0) Needed for P2. See approval request for P2.
Attachment #8473842 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
Attachment #8476722 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•10 years ago
|
Attachment #8473842 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Reporter | ||
Comment 55•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/597626afe5e1 No B2G leak testing on 33, so no need for that patch :)
You need to log in
before you can comment on or make changes to this bug.
Description
•