Closed Bug 862097 Opened 11 years ago Closed 11 years ago

[Inari] crash in mozilla::layers::ShadowLayersChild::ActorDestroy with abort message: "ActorDestroy by IPC channel failure at ShadowLayersChild: file ../../../gecko/gfx/layers/ipc/ShadowLayersChild.cpp, line 68"

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(blocking-b2g:tef+, b2g18 fixed, b2g18-v1.0.1 fixed)

RESOLVED FIXED
blocking-b2g tef+
Tracking Status
b2g18 --- fixed
b2g18-v1.0.1 --- fixed

People

(Reporter: nhirata, Assigned: sotaro)

References

()

Details

(Keywords: crash, Whiteboard: c= , [b2g-crash])

Crash Data

Attachments

(2 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-0f606005-03dd-44c2-a19a-739bd2130415 .
============================================================= 
Frame 	Module 	Signature 	Source
0 	libxul.so 	mozalloc_abort 	mozalloc_abort.cpp:30
1 	libxul.so 	NS_DebugBreak_P 	nsDebugImpl.cpp:423
2 	libxul.so 	mozilla::layers::ShadowLayersChild::ActorDestroy 	ShadowLayersChild.cpp:68
3 	libxul.so 	mozilla::layers::PLayersChild::DestroySubtree 	PLayersChild.cpp:673
4 	libxul.so 	mozilla::layers::PCompositorChild::DestroySubtree 	PCompositorChild.cpp:886
5 	libxul.so 	mozilla::layers::PCompositorChild::OnChannelError 	PCompositorChild.cpp:775
6 	libxul.so 	mozilla::ipc::AsyncChannel::NotifyMaybeChannelError 	AsyncChannel.cpp:549
7 	libxul.so 	mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError 	AsyncChannel.cpp:514
8 	libxul.so 	RunnableMethod<IPC::ChannelProxy::Context, void , Tuple0>::Run 	tuple.h:383
9 	libxul.so 	MessageLoop::RunTask 	message_loop.cc:334
10 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	message_loop.cc:342
11 	libxul.so 	MessageLoop::DoWork 	message_loop.cc:442
12 	libxul.so 	mozilla::ipc::DoWorkRunnable::Run 	MessagePump.cpp:42
13 	libxul.so 	nsThread::ProcessNextEvent 	nsThread.cpp:620
14 	libxul.so 	NS_ProcessNextEvent_P 	nsThreadUtils.cpp:237
15 	libxul.so 	mozilla::ipc::MessagePump::Run 	MessagePump.cpp:82
16 	libxul.so 	mozilla::ipc::MessagePumpForChildProcess::Run 	MessagePump.cpp:231
17 	libxul.so 	MessageLoop::RunInternal 	message_loop.cc:216
18 	libxul.so 	MessageLoop::Run 	message_loop.cc:209
19 	libxul.so 	nsBaseAppShell::Run 	nsBaseAppShell.cpp:163
20 	libxul.so 	XRE_RunAppShell 	nsEmbedFunctions.cpp:646
21 	libxul.so 	mozilla::ipc::MessagePumpForChildProcess::Run 	MessagePump.cpp:198
22 	libxul.so 	MessageLoop::RunInternal 	message_loop.cc:216
23 	libxul.so 	MessageLoop::Run 	message_loop.cc:209
24 	libxul.so 	XRE_InitChildProcess 	nsEmbedFunctions.cpp:485
25 	plugin-container 	main 	ipc/app/MozillaRuntimeMain.cpp:48
26 	libc.so 	__libc_init 	libc_init_dynamic.c:114
27 		@0xb0001dc5

More reports:
https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort+|+NS_DebugBreak_P+|+mozilla%3A%3Alayers%3A%3AShadowLayersChild%3A%3AActorDestroy

STR: 
1. launch browser app
2. go to http://bit.ly/css3d-mol
3. select any of the links at the bottom of the page (caffiene, Salt, Buckyball)
4. repeat 3 several times.

Expected: no crash
Actual: browser content crash
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/2b44e2c40cc1
Gaia   c79e761bae4d92f329154c64159f4f5c8eb49c9e
BuildID 20130415070202
Version 18.0
Inari
Component: Gaia::Browser → General
This abort is come from Bug 860892.
> 
> More reports:
> https://crash-stats.mozilla.com/report/
> list?signature=mozalloc_abort+|+NS_DebugBreak_P+|+mozilla%3A%3Alayers%3A%3ASh
> adowLayersChild%3A%3AActorDestroy
> 
> STR: 
> 1. launch browser app
> 2. go to http://bit.ly/css3d-mol
> 3. select any of the links at the bottom of the page (caffiene, Salt,
> Buckyball)
> 4. repeat 3 several times.
> 
> Expected: no crash
> Actual: browser content crash

This might be dupe of Bug 851664. The STR is what I wanted! I am going to check it tomorrow.
Summary: [Inari] crash in [@ mozalloc_abort | NS_DebugBreak_P | mozilla::layers::ShadowLayersChild::ActorDestroy] → [Inari] crash in mozilla::layers::ShadowLayersChild::ActorDestroy with abort message: "ActorDestroy by IPC channel failure at ShadowLayersChild: file ../../../gecko/gfx/layers/ipc/ShadowLayersChild.cpp, line 68"
Depends on: 858926
No longer depends on: 858926
Blocks: 851664
Depends on: 860892
blocking-b2g: --- → tef?
The crash happens by hit rlimit of file descriptor in b2g process. I confirmed it from adb shell logcat on unagi.
the log says, b2g process is get kill by genlock_create_lock failure. And then child process recognize, IPC connection is disconnected.
On custom ROM that increase file descriptor's rlimit to 4048. The crash does not happen. The page is rendered correctly on the STR.
(In reply to Sotaro Ikeda [:sotaro] from comment #6)
> On custom ROM that increase file descriptor's rlimit to 4048. The crash does
> not happen. The page is rendered correctly on the STR.

Curent Firefox OS architecture requires a lot of file descriptor's for gralloc buffers. The buffer is used for Layer's buffers. And all the buffers are delivered to b2g process for rendering. If a web page has a lot of layers, it allocate a lot of gralloc buffers. And there is no way to limit the number.
(In reply to Sotaro Ikeda [:sotaro] from comment #6)
> On custom ROM that increase file descriptor's rlimit to 4048. The crash does
> not happen. The page is rendered correctly on the STR.

By using the following command, a number of file descriptor in b2g process increase to about 2200.
Before considering upping the number of file descriptors, can you help us understand how the current limit was chosen? We don't want to introduce new bugs, if this bug is already difficult to run into.
Current file descriptor's rlimit on v1.01 is linux's default value.
  - soft limit: 1024
  - hard limit: 4096
This blocks bug 851664 so should be tef+.  If we fix this by increasing the limits (and fix one other yet-to-be-filed bug), it will likely fix bug 851664.
blocking-b2g: tef? → tef+
On v1.1, Current file descriptor's rlimit is changed to following in Bug 853977.
  - soft limit: 2048
  - hard limit: 4096

From v1.1, android's media server is used to handle codec and camera hw in it.
(In reply to Sotaro Ikeda [:sotaro] from comment #12)
> 
> From v1.1, android's media server is used to handle codec and camera hw in
> it.

android's media server needs more file descriptors than 1024.
Assignee: nobody → sotaro.ikeda.g
Depends on: 862397
It took longer, but I was still able to reproduce the issue on master.  I ended up tapping on at least 15 different molecular structures on that web site and placing into landscape view.

Build:2013-04-16-03-10-11
"mozilla-central" revision="1d9c510b3742"
"integration/gaia-central" revision="972f3504af31"
"gecko.git" revision="95bf443385bbd8ee611be1586ec975e609acd006"
"gaia.git" revision="d10e892eb6887967803a68e8c441a7c456446437"
Unagi

I think increasing will delay the crashing; I do not believe that it will stop the crashing...
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #15)
> I think increasing will delay the crashing; I do not believe that it will
> stop the crashing...

How much do you increase the rlimit?

Anyway Bug 862397 fix the crash. Qcom already fixed the crash in recent source. Just apply it to our source.
Flags: needinfo?(nhirata.bugzilla)
\o/

Once I get a build with the fix, I will try looking at it again.  If you have a build already made with the patch, I will retest with it.
Flags: needinfo?(nhirata.bugzilla)
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #17)
> \o/
> 
> Once I get a build with the fix, I will try looking at it again.  If you
> have a build already made with the patch, I will retest with it.

I already confirmed the path works locally.
Confirmed that the patch fix the crash on unagi.
update comment.
Attachment #738689 - Attachment is obsolete: true
Attachment #738696 - Flags: review?(mwu)
Attachment #738696 - Attachment description: patch v2 - set RLIMIT_NOFILE of b2g/b2g-container processes to 4096 → patch v2 - set RLIMIT_NOFILE of b2g/content processes to 4096
Comment on attachment 738696 [details] [diff] [review]
patch v2 - set RLIMIT_NOFILE of b2g/content processes to 4096

I don't think the comments are necessary here. Looks good otherwise.
Attachment #738696 - Flags: review?(mwu) → review+
Merged in https://github.com/mozilla-b2g/gonk-misc/pull/82 . Uplift to follow.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
need to update also b2g-manifest of v1-train and v1.0.1
jhford, can you upadte the b2g-manifest of v1-train and v1.0.1 like following?
 - update all devices' gonk-misc revision to most recent one that already have specific revision.
Flags: needinfo?(jhford)
To github.com:mozilla-b2g/b2g-manifest.git
   8589ac6..9635c27  v1-train -> v1-train
   88be926..bc902bf  v1.0.1 -> v1.0.1
Flags: needinfo?(jhford)
On buri device partner build, I found a problem of showing http://bit.ly/css3d-mol. It is a different problem.

it seems that there are no fallback to ashmem. Though it seems that it badly affect to http://bit.ly/css3d-mol in Bug 862097. When doing STR in Bug 862097, "Well, this is embarassing. We tried to display this Web page, but it's not responding" is shown.
Inari Build: 20130429070204
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/45aa5ba0ed53
Gaia   cf2d4136f0ebc66039637fdbeb72ed184dfbc0f2
Kernel:  Feb 21st

On the build above for the Inari device, I was able to switch between the three links for (Salt, Caffine and buckyball) just fine with no problems seen. However going to other links not mentioned in the bug (Graphite, YBCO) did cause the browser to crash. This was not the steps in the bug so not sure if this is a different issue and warrants a different bug.

Leo Build: 20130426070204
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/6c2493de1441
Gaia   c9046a7acef33328977840176ff5574720d2c74c
Kernel: March 15th

On the Leo build above, when tapping on the Salt link, the browser went to a white screen and the device could not be recovered. Had to pull the battery to get the device to recover. Once the device rebooted, I was presented with a Just Crashed message which I sent on to Mozilla.

Let me know if you need anything more from me here. Logs or screenshots or bugs.
Whiteboard: [b2g-crash] → c=performance [b2g-crash]
On Inari, When tried tap on any molecules in the last row (Buckyball, graphite, YBCO superconductor or salt) the browser either crashes with a black screen (3/5 times)/screen  with selected molecules are highlighted and 3D structures are not seen. 

Inari Build ID: 20130515070203
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/0b6bcb1f4175
Gaia   9648799c2e45917ff150fa9eef8aeac79a9ac008

On leo, Tapping on any of the molecules does not bring up its 3D forms (2/5 times)/ crash report prompted to be submitted.
Leo Build ID: 20130515070208  
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/d06cfe7d67c2
Gaia   0ddb515f15cbc6b74fc2742b7599d6ae74c6413f
Deepa, can you create a new bug for it? And can you attach adb log?
Sotaro Ikeda:

Please see the bug created https://bugzilla.mozilla.org/show_bug.cgi?id=877495
Whiteboard: c=performance [b2g-crash] → c= , [b2g-crash]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: