Closed Bug 970008 Opened 10 years ago Closed 10 years ago

[tarako]monkey test crash at libxul.so!mozilla::dom::TabChild::RecvRealTouchEvent

Categories

(Core :: Graphics: Layers, defect)

ARM
Gonk (Firefox OS)
defect
Not set
major

Tracking

()

RESOLVED WORKSFORME
blocking-b2g -
Tracking Status
b2g-v1.3T --- affected

People

(Reporter: james.zhang, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash])

Attachments

(5 files, 2 obsolete files)

[FFOS minidump: mtlog-now-sp6821a_gonk-98-custom_hudson-jameszhangubtpc-1402080859/dump_parse (the top 10 stack info)]
0  libxul.so!mozilla::dom::TabChild::RecvRealTouchEvent(mozilla::WidgetTouchEvent const&, mozilla::layers::ScrollableLayerGuid const&) [nsCOMPtr.h : 554 + 0x2]
1  libxul.so!mozilla::dom::PBrowserChild::OnMessageReceived(IPC::Message const&) [PBrowserChild.cpp : 2080 + 0xd]
2  libxul.so!mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) [PContentChild.cpp : 3170 + 0x7]
3  libxul.so!mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) [MessageChannel.cpp : 1126 + 0x5]
4  libxul.so!mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&) [MessageChannel.cpp : 1044 + 0x3]
5  libxul.so!mozilla::ipc::MessageChannel::OnMaybeDequeueOne() [MessageChannel.cpp : 1027 + 0x3]
6  libxul.so!RunnableMethod<WebCore::ReverbConvolver, void (WebCore::ReverbConvolver::*)(), Tuple0>::Run() [tuple.h : 383 + 0x5]
7  libxul.so!mozilla::ipc::MessageChannel::DequeueTask::Run() [MessageChannel.h : 371 + 0x9]
8  libxul.so!MessageLoop::RunTask(Task*) [message_loop.cc : 340 + 0x5]
9  libxul.so!MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) [message_loop.cc : 348 + 0x5]
blocking-b2g: --- → 1.3T?
Hi James, is it a high occurance bug ?
and if this can be easily reproduced with any script that you can share?
Thanks
Flags: needinfo?(james.zhang)
(In reply to Joe Cheng [:jcheng] from comment #1)
> Hi James, is it a high occurance bug ?
> and if this can be easily reproduced with any script that you can share?
> Thanks

Yes. We can catch this crash every two days.

I have given the script to ttsai, but it's base on our hudson daily build. You should modify the script or use run-6821-local.sh to catch minidump and parse the backtrace.
Flags: needinfo?(james.zhang)
Attached file monkey.test.tar.gz
cd test-config
./run-6821-local.sh

Otherwise your QA should configure the same hudson build as our.
Flags: needinfo?(ahuang)
(In reply to James Zhang from comment #0)
> Created attachment 8372966 [details]
> mtlog-now-sp6821a_gonk-98-custom_hudson-jameszhangubtpc-1402080859.tar.bz2
> 
> [FFOS minidump:
> mtlog-now-sp6821a_gonk-98-custom_hudson-jameszhangubtpc-1402080859/
> dump_parse (the top 10 stack info)]
> 0 
> libxul.so!mozilla::dom::TabChild::RecvRealTouchEvent(mozilla::
> WidgetTouchEvent const&, mozilla::layers::ScrollableLayerGuid const&)
> [nsCOMPtr.h : 554 + 0x2]
> 1  libxul.so!mozilla::dom::PBrowserChild::OnMessageReceived(IPC::Message
> const&) [PBrowserChild.cpp : 2080 + 0xd]

Hi James,

The backtrace looks really weird. mozilla::dom::TabChild::RecvRealTouchEvent should be in TabChild.cpp. Are you using optimized build? If you are not sure, can you attach the "config.status" file under objdir-gecko folder? Thanks.
Flags: needinfo?(ahuang)
Attached file config.status
config.status
Flags: needinfo?(ahuang)
(In reply to Alan Huang [:ahuang|away 2/27-3/3] from comment #4)
> (In reply to James Zhang from comment #0)
> > Created attachment 8372966 [details]
> > mtlog-now-sp6821a_gonk-98-custom_hudson-jameszhangubtpc-1402080859.tar.bz2
> > 
> > [FFOS minidump:
> > mtlog-now-sp6821a_gonk-98-custom_hudson-jameszhangubtpc-1402080859/
> > dump_parse (the top 10 stack info)]
> > 0 
> > libxul.so!mozilla::dom::TabChild::RecvRealTouchEvent(mozilla::
> > WidgetTouchEvent const&, mozilla::layers::ScrollableLayerGuid const&)
> > [nsCOMPtr.h : 554 + 0x2]
> > 1  libxul.so!mozilla::dom::PBrowserChild::OnMessageReceived(IPC::Message
> > const&) [PBrowserChild.cpp : 2080 + 0xd]
> 
> Hi James,
> 
> The backtrace looks really weird. mozilla::dom::TabChild::RecvRealTouchEvent
> should be in TabChild.cpp. Are you using optimized build? If you are not
> sure, can you attach the "config.status" file under objdir-gecko folder?
> Thanks.

Please NeedInfo me if you need more information.
    (''' MOZ_OPTIMIZE ''', r''' 1 '''),
    (''' MOZ_FRAMEPTR_FLAGS ''', r''' -fomit-frame-pointer '''),
    (''' MOZ_OPTIMIZE_FLAGS ''', r''' -Os -freorder-blocks -fno-reorder-functions '''),

Yeah, as I expected in comment 4
Flags: needinfo?(ahuang)
blocking-b2g: 1.3T? → 1.3T+
Whiteboard: [SI-testing-blocker]
Steven can you help understand why this was made a blocker ? What are the next steps here ?
Flags: needinfo?(styang)
(In reply to bhavana bajaj [:bajaj] from comment #8)
> Steven can you help understand why this was made a blocker ? What are the
> next steps here ?

We need to pass the stability test before shipping, this happened often in the test.

James, we need you to use the optimized build for the testing again. flag it to 1.3T? for monitoring.
Flags: needinfo?(styang) → needinfo?(james.zhang)
blocking-b2g: 1.3T+ → 1.3T?
(In reply to Steven Yang [:styang] from comment #9)
> (In reply to bhavana bajaj [:bajaj] from comment #8)
> > Steven can you help understand why this was made a blocker ? What are the
> > next steps here ?
> 
> We need to pass the stability test before shipping, this happened often in
> the test.
> 
> James, we need you to use the optimized build for the testing again. flag it
> to 1.3T? for monitoring.

We use optimized build and you use ununoptimized build, right?
We can still catch this crash in 0305 daily build.
Flags: needinfo?(james.zhang)
Is there any chance to attach the .sc file that triggered this so we can re-run it?
(In reply to Andreas Gal :gal from comment #11)
> Is there any chance to attach the .sc file that triggered this so we can
> re-run it?

No, the monkey run over 12~24 hour, and the content in sdcard is differnent, we can't reproduce this issue by .sc.

Andreas, can we add gnasp when minidump? Spreadtrum slog can catch snapshot when native crash.
For example, in this monkey test log, we can find minidump log in minidump folder, and we can also find tombstone in slog_external/20230601042625/misc/tombstones.
We can see the tombstone screenshot and comfirm it caused by video thumbnail crash. But we only see the backtrace and no minidump screenshot, we don't know how to analyze this minidump.
Flags: needinfo?(gal)
gnasp?
Flags: needinfo?(gal)
Attached image screenshot_053612.jpg (obsolete) —
Attached image screenshot_174308.jpg (obsolete) —
James, the 2nd screenshot seems to be video playback, not thumbnail preview.
James, is comment 13 the same crash cause? Because that looks like a mediaserver crash.
(In reply to Andreas Gal :gal from comment #17)
> James, the 2nd screenshot seems to be video playback, not thumbnail preview.

Andreas, it's just a example, this bug caused by my side and we have fixed it.
I think gecko should catch the screenshot when minidump happen, we can get more information.

commit 8f6662d002f2a7084f2c79af369965b51c26e017
Author: ming.li <ming.li@spreadtrum.com>
Date:   Thu Mar 6 17:00:04 2014 +0800

    Bug #286278 mediaserver crash
    
    [bug number  ] 286278
    [root cause  ] failed to malloc mem from ion heap
    [changes     ] protect when failed
    [side effects]
    [reviewers   ]
    
    Change-Id: I84f860ef4a570dfca32e03256c0e7bba5d9ae313
(In reply to Andreas Gal :gal from comment #14)
> gnasp?

gnasp is a tool to catch screenshot, you can aslo write code to catch screenshot.
We need catch more information when minidump happen, log is not enough.
Attachment #8388023 - Attachment is obsolete: true
Attachment #8388024 - Attachment is obsolete: true
Ok, thanks. Sorry for my confusion. How do you trigger the screenshot? Can you link me to the gnasp tool? I will check out our crash dump handler.
(In reply to Andreas Gal :gal from comment #21)
> Ok, thanks. Sorry for my confusion. How do you trigger the screenshot? Can
> you link me to the gnasp tool? I will check out our crash dump handler.

If you have tarako or fugu, you can use "adb shell gsnap /data/1.jpg /dev/graphics/fb0" catch the screenshot, and use "adb pull /data/1.jpg" to get it from phone.

gsnap is one tool of busybox.
Ok thanks. We can hack up something to save fb0. Its probably not safe in production, but we can add an env variable.
Attached patch patchSplinter Review
This is completely untested and likely doesn't even compile but this shows how we should be able to save a ppm during a crash. James, do you guys want to give this a try and see if you can apply this? I will try to find someone on our end Monday if you don't make progress.
This might not work when the child process dies in case the child process has no access to the FB or when the child process isn't visible anyway.
Very quick patch! Our monkey test engineer will intergrate it into monkey test version.
Hi Andreas, how to open the ppm file or convert it to jpg or png?
(In reply to Andreas Gal :gal from comment #24)
> Created attachment 8388039 [details] [diff] [review]
> patch
> 
> This is completely untested and likely doesn't even compile but this shows
> how we should be able to save a ppm during a crash. James, do you guys want
> to give this a try and see if you can apply this? I will try to find someone
> on our end Monday if you don't make progress.

Lianxiang will take this patch.
Flags: needinfo?(lianxiang.zhou)
http://stackoverflow.com/questions/4012889/converting-ppm-to-png

Keep in mind that the patch is totally untested and uncompiled. Its just a sketch. But something like it should work. On mac and linux convert should do the trick to read the ppm file.
Comment on attachment 8388039 [details] [diff] [review]
patch

bsmedberg, does something like this make sense? Will this run for the child as well? Can we somehow make sure this always runs in the parent? Even when a child crashes? (so we have access to dev/fb).
Attachment #8388039 - Flags: feedback?(benjamin)
Steven/James, is this still a true blocker for partner stability testing? how easy is it to reproduce in your stability testing? thanks
blocking-b2g: 1.3T? → 1.3T+
Flags: needinfo?(styang)
This looks reasonable as debugging code behind a envvar or pref or something. If we wanted it as production code we'd need to add additional code to clean up the screenshots along with the other crash data.

As written it will only take a screenshot if the main process crashes. If you wanted to take a screen shot of content process crashes, you'd need to add something similar to OnChildProcessDumpRequested here: http://hg.mozilla.org/mozilla-central/annotate/c8bea55437c1/toolkit/crashreporter/nsExceptionHandler.cpp#l2401

Presumably this should this be #ifdef MOZ_B2G or something like that...
Attachment #8388039 - Flags: feedback?(benjamin)
(In reply to Joe Cheng [:jcheng] from comment #31)
> Steven/James, is this still a true blocker for partner stability testing?
> how easy is it to reproduce in your stability testing? thanks

Maybe this crash caused by media server crash, I can't see this crash after apply media server crash.
Lianxiang, please keep tracking.
Component: General → Graphics: Layers
Product: Firefox OS → Core
Blocks: 970007
James, since you are no longer seeing this crash, we will minus for now
blocking-b2g: 1.3T+ → -
Whiteboard: [SI-testing-blocker]
Flags: needinfo?(styang)
James, if you are no longer seeing this crash, can we close this out?
Flags: needinfo?(james.zhang)
Keywords: crash
Whiteboard: [b2g-crash]
Yes, you can.
Flags: needinfo?(james.zhang)
Per comment 36, close this bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Flags: needinfo?(lianxiang.zhou)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: