Open Bug 961343 Opened 7 years ago Updated 5 years ago

[e10s] [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result) ]

Categories

(Core :: DOM: Content Processes, defect, P4)

All
macOS
defect

Tracking

()

Tracking Status
e10s + ---
firefox29 --- affected
b2g-v1.3T --- affected
b2g-v1.4 --- affected

People

(Reporter: cpeterson, Unassigned)

References

Details

(Whiteboard: [sprd296788][sprd337499])

Attachments

(2 files)

I hit this abort about 50% of the time I start Firefox with my old profile:

https://crash-stats.mozilla.com/report/index/57dd0638-2809-411d-9314-11b832140118
https://crash-stats.mozilla.com/report/index/d0564e61-8554-4920-a3e7-25b2f2140118

Frame 	Module 	Signature 	Source
0 	libmozalloc.dylib 	mozalloc_abort(char const*) 	memory/mozalloc/mozalloc_abort.cpp
1 	XUL 	Abort 	/builds/slave/fx-team-osx64-0000000000000000/build/obj-firefox/x86_64/xpcom/base/../../../../xpcom/base/nsDebugImpl.cpp
2 	XUL 	NS_DebugBreak 	/builds/slave/fx-team-osx64-0000000000000000/build/obj-firefox/x86_64/xpcom/base/../../../../xpcom/base/nsDebugImpl.cpp
3 	XUL 	mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result) 	dom/ipc/ContentChild.cpp
4 	XUL 	mozilla::ipc::MessageChannel::DispatchUrgentMessage(IPC::Message const&) 	ipc/glue/MessageChannel.cpp
5 	XUL 	mozilla::ipc::MessageChannel::OnMaybeDequeueOne() 	ipc/glue/MessageChannel.cpp
6 	XUL 	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) 	/builds/slave/fx-team-osx64-0000000000000000/build/obj-firefox/x86_64/ipc/chromium/../../../../ipc/chromium/src/base/message_loop.cc
7 	XUL 	MessageLoop::DoWork() 	/builds/slave/fx-team-osx64-0000000000000000/build/obj-firefox/x86_64/ipc/chromium/../../../../ipc/chromium/src/base/message_loop.cc
8 	XUL 	mozilla::ipc::DoWorkRunnable::Run() 	ipc/glue/MessagePump.cpp
9 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	/builds/slave/fx-team-osx64-0000000000000000/build/obj-firefox/x86_64/xpcom/threads/../../../../xpcom/threads/nsThread.cpp
10 	XUL 	NS_ProcessNextEvent(nsIThread*, bool) 	/builds/slave/fx-team-osx64-0000000000000000/build/xpcom/glue/nsThreadUtils.cpp
11 	XUL 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
12 	XUL 	MessageLoop::Run() 	/builds/slave/fx-team-osx64-0000000000000000/build/obj-firefox/x86_64/ipc/chromium/../../../../ipc/chromium/src/base/message_loop.cc
13 	XUL 	XRE_RunAppShell 	toolkit/xre/nsEmbedFunctions.cpp
14 	XUL 	MessageLoop::Run() 	/builds/slave/fx-team-osx64-0000000000000000/build/obj-firefox/x86_64/ipc/chromium/../../../../ipc/chromium/src/base/message_loop.cc
15 	XUL 	XRE_InitChildProcess 	toolkit/xre/nsEmbedFunctions.cpp
16 	plugin-container 	main 	ipc/app/MozillaRuntimeMain.cpp
17 	plugin-container 	start
Blocks: 899758
Duplicate of this bug: 991920
tracking-e10s: --- → +
Duplicate of this bug: 991926
Component: IPC → DOM: Content Processes
we meet this 3 times during monkey-test on 1.3t
Whiteboard: [sprd296788]
Duplicate of this bug: 992712
From minidump_summary of attachment 8402438 [details], it is aborted because of MsgRouteError in ContentChild. It is possible that a message is queued and received after the actor has been unregistered, I have talked to Cervantes to make sure my assumption is correct.

Enable IPC log can help us confirm this, you can turn it on by letting LoggingEnabled() in gecko/ipc/glue/ProtocolUtils.h return true.

Note the log will be output to stderr, I am not sure whether slog keeps that.
We have met this crash backtrace 8 times.
Flags: needinfo?(wchang)
Flags: needinfo?(kkuo)
James, could you please enable IPC log so that we can check what is going on? See comment 5 for how to enable it.
ni? James for comment 7
Flags: needinfo?(wchang) → needinfo?(james.zhang)
I have landed it on our monkey test hudson.
Flags: needinfo?(james.zhang) → needinfo?(ying.xu)
Flags: needinfo?(kkuo)
Zhenqing, please give the monkey test result.
Flags: needinfo?(zhenqing.liu)
The crashes occurred before the IPC log was opened. Maybe we should wait a little longer.
Flags: needinfo?(zhenqing.liu)
Hi Zhenqing,per comment5 to enable IPC log, could you please provide the log? 
Does the crash still happen?
and how about the frequency it happened ?

Thanks!
Flags: needinfo?(zhenqing.liu)
(In reply to Rachelle Yang [:ryang][ryang@mozilla.com] from comment #12)
> Hi Zhenqing,per comment5 to enable IPC log, could you please provide the
> log? 
> Does the crash still happen?
> and how about the frequency it happened ?
> 
> Thanks!

The crash never happened after IPC log enabled.
I will keep tracking on it.
Flags: needinfo?(zhenqing.liu)
FWIW QC is seeing a very similar issue in bug 1046084.
Whiteboard: [sprd296788] → [sprd296788][sprd337499]
1.3t homescreen/music don't killed by low memory pressure, we don't meet this issue again.
Attached file logs
This bug occurs again. Logs are uploaded.
Depends on: 1046084
Checked attachment 8466050 [details], none of the files inlcude IPC log. Please make sure it is enabled as comment 5 described.
(In reply to Ting-Yu Chou [:ting] from comment #17)
> Checked attachment 8466050 [details], none of the files inlcude IPC log.
> Please make sure it is enabled as comment 5 described.

LoggingEnabled() has been made to return true directly. So what should I do to make sure slog keep it? 
By the way, what's the keyword about IPC log?
(In reply to Ting-Yu Chou [:ting] from comment #5)
> Note the log will be output to stderr, I am not sure whether slog keeps that.

Zhenqing, does slog keep stderr from b2g?
(In reply to Ting-Yu Chou [:ting] from comment #19)
> Zhenqing, does slog keep stderr from b2g?

I'm not sure either.
I have checked that LoggingEnabled() return true on monkey versions only not on hudson ones.
So we should wait for it to appear on monkey versions.
Hi Ting-Yu, 
I grep the source codes and find only a few lines are controlled by LoggingEnabled().
Could you please tell me only these logs are wanted or all the IPC logs?
(In reply to Zhenqing Liu from comment #21)
> Hi Ting-Yu, 
> I grep the source codes and find only a few lines are controlled by
> LoggingEnabled().
> Could you please tell me only these logs are wanted or all the IPC logs?

If you grep objdir-gecko, you should find over 1000 lines of code is calling it.
If it is enabled correctly, running |adb shell "stop b2g; b2g.sh"| you should see something like:

[time:1407296689516806][1615->1790][PContentParent] Sending Msg_PFileDescriptorSetConstructor([TODO])
[time:1407296689516900][1615->1790][PBlobStreamParent] Sending Msg___delete__([TODO])
[time:1407296689523152][1790<-1615][PContentChild] Received Msg_PFileDescriptorSetConstructor([TODO])

from console.
(In reply to Ting-Yu Chou [:ting] from comment #23)
> If it is enabled correctly, running |adb shell "stop b2g; b2g.sh"| you
> should see something like:
> 
> [time:1407296689516806][1615->1790][PContentParent] Sending
> Msg_PFileDescriptorSetConstructor([TODO])
> [time:1407296689516900][1615->1790][PBlobStreamParent] Sending
> Msg___delete__([TODO])
> [time:1407296689523152][1790<-1615][PContentChild] Received
> Msg_PFileDescriptorSetConstructor([TODO])
> 
> from console.

I can see these lines from console on monkey versions. But none was printed from |adb logcat|. So we should do something more to make sure slog keep them.
You can take this patch:

  https://hg.mozilla.org/mozilla-central/rev/8cd50b4ce64b

After rebuild you should be able to see IPC log from logcat.
CAF was unable to reproduce with the IPC logging enabled.  https://bugzilla.mozilla.org/show_bug.cgi?id=1046084#c35 has a patch that will change the timing less, making it more likely to reproduce this issue.  If you pick up the cset from comment 35 make sure to take the one from comment 36 too.  Everything I wrote in comment 35 applies here too.
Flags: needinfo?(ying.xu)
I met this crash when I executed automation tests on Dolphin device (512MB)

@ Crash ID.
  - https://crash-stats.mozilla.com/report/index/7014ccde-822b-45eb-b61c-15aea2150506

@ Build information.
 - Gaia-Rev        5152e8366cb74b79d40a0b6ad7c97bf6d76ea778
 - Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g34_v2_1s/rev/10dd289cfb77
 - Build-ID        20150505001205
 - Version         34.0
 - Device-Name     scx15_sp7715ea
 - FW-Release      4.4.2
 - FW-Incremental  eng.cltbld.20150505.035921
 - FW-Date         Tue May  5 03:59:33 EDT 2015
Priority: -- → P4
You need to log in before you can comment on or make changes to this bug.