crash in mozilla::layers::AsyncCompositionManager::TransformScrollableLayer @ libEGL_MRVL.so

RESOLVED FIXED in Firefox 28

Status

()

Core
Graphics: Layers
--
critical
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: Scoobidiver (away), Assigned: BenWa)

Tracking

({crash, topcrash-android-armv7})

unspecified
mozilla29
ARM
Android
crash, topcrash-android-armv7
Points:
---

Firefox Tracking Flags

(firefox23 wontfix, firefox24+ wontfix, firefox25+ wontfix, firefox26+ wontfix, firefox27+ wontfix, firefox28+ fixed, firefox29 fixed, fennec-)

Details

(Whiteboard: [native-crash][fixed in 28+ by bug 925608], crash signature)

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
It's similar to bug 863313 and bug 845867 but with a different stack trace.
With combined signatures, it's #16 crasher in 22.0 and #27 in 23.0b8.

Signature 	libEGL_MRVL.so@0x69e4 More Reports Search
UUID 	44148022-b1cf-4c0f-b8bb-266072130731
Date Processed	2013-07-31 03:31:33.063326
Uptime	88
Install Age 	88 since version was first installed.
Install Time 	2013-07-31 03:29:51
Product 	FennecAndroid
Version 	23.0
Build ID 	20130723230815
Release Channel 	beta
OS 	Android
OS Version 	0.0.0 Linux 3.4.5-1219080-user #1 SMP PREEMPT Sun Jun 2 21:52:23 KST 2013 armv7l samsung/lt02wifiue/
Build Architecture 	arm
Build Architecture Info 	ARMv0 | 2
Crash Reason 	SIGSEGV
Crash Address 	0x7362437e
App Notes 	
AdapterDescription: 'Vivante Corporation -- GC1000 core -- OpenGL ES 2.0 -- Model: SM-T210R, Product: lt02wifiue, Manufacturer: samsung, Hardware: pxa988'
GL Layers! EGL? EGL+ GL Context? GL Context+ GL Layers+ 
samsung SM-T210R
samsung/lt02wifiue/lt02wifi:4.1.2/JZO54K/T210RUEAMF1:user/release-keys

Frame 	Module 	Signature 	Source
0 	libEGL_MRVL.so 	libEGL_MRVL.so@0x69e4 	
1 	libxul.so 	mozilla::layers::AsyncCompositionManager::TransformScrollableLayer(mozilla::layers::Layer*, gfx3DMatrix const&) 	gfx/layers/composite/AsyncCompositionManager.cpp
2 	libxul.so 	mozilla::layers::AsyncCompositionManager::TransformShadowTree(mozilla::TimeStamp) 	obj-firefox/dist/include/nsTArray.h
3 	libxul.so 	mozilla::layers::CompositorParent::Composite() 	gfx/layers/ipc/CompositorParent.cpp
4 	libxul.so 	mozilla::layers::CompositorParent::ResumeComposition() 	gfx/layers/ipc/CompositorParent.cpp
5 	libxul.so 	RunnableMethod<mozilla::ipc::AsyncChannel, void (mozilla::ipc::AsyncChannel::*)(mozilla::ipc::AsyncChannel*, mozilla::ipc::AsyncChannel::Side), Tuple2<mozilla::ipc::AsyncChannel*, mozilla::ipc::AsyncChannel::Side> >::Run() 	ipc/chromium/src/base/tuple.h
6 	libxul.so 	MessageLoop::RunTask(Task*) 	ipc/chromium/src/base/message_loop.cc
7 	libxul.so 	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) 	ipc/chromium/src/base/message_loop.cc
8 	libxul.so 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc
9 	libxul.so 	base::MessagePumpDefault::Run(base::MessagePump::Delegate*) 	ipc/chromium/src/base/message_pump_default.cc
10 	libxul.so 	MessageLoop::RunInternal() 	ipc/chromium/src/base/message_loop.cc
11 	libxul.so 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
12 	libxul.so 	base::Thread::ThreadMain() 	ipc/chromium/src/base/thread.cc
13 	libxul.so 	ThreadFunc 	ipc/chromium/src/base/platform_thread_posix.cc
14 	libc.so 	libc.so@0x12e02 	
15 	libc.so 	libc.so@0x1255a 	
16 	libEGL.so 	libEGL.so@0x22c3a

More reports at:
https://crash-stats.mozilla.com/query/?product=FennecAndroid&query_search=signature&query_type=contains&query=libEGL_MRVL.so
(Reporter)

Comment 1

5 years ago
In aggregate, it's #5 top crasher in 23.0.

Not sure if there's much more to do than what was done in bug 863313 and bug 845867.
tracking-fennec: --- → ?
Crash Signature: [@ libEGL_MRVL.so@0x69e4] [@ libEGL_MRVL.so@0x69ec] [@ libEGL_MRVL.so@0x642c] [@ libEGL_MRVL.so@0x6c58] [@ libEGL_MRVL.so@0x65ec] → [@ libEGL_MRVL.so@0x69e4] [@ libEGL_MRVL.so@0x69ec] [@ libEGL_MRVL.so@0x642c] [@ libEGL_MRVL.so@0x6c58] [@ libEGL_MRVL.so@0x65ec] [@ libEGL_MRVL.so@0x6c60]
tracking-firefox24: --- → ?
Keywords: topcrash
kats,:jgilbert;

since this is a different stack trace anything different that may help with investigation here ?
status-firefox24: --- → affected
Keywords: needURLs, steps-wanted
Kats, looks like this has dropped off significantly in 25. Do you think your compositor pause/resume refactor fixed this? If so, I think this winds up being won't fix.
Flags: needinfo?(bugmail.mozilla)
No, my compositor pause/resume refactor was a while ago. I don't think I did anything related in 25. Based on the stack in comment 0 it looks like it's related to fixed-position code so maybe one of roc's and/or Cwiiis' changes fixed it.
Flags: needinfo?(bugmail.mozilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4)
> No, my compositor pause/resume refactor was a while ago. I don't think I did
> anything related in 25. Based on the stack in comment 0 it looks like it's
> related to fixed-position code so maybe one of roc's and/or Cwiiis' changes
> fixed it.

Tracking as this is a top-crasher and needinfo'ing :roc, Chris Lord to help with your above comment.
status-firefox23: --- → affected
tracking-firefox24: ? → +
Flags: needinfo?(roc)
Flags: needinfo?(chrislord.net)
I don't think I've touched this since comment #0.
Flags: needinfo?(roc)

Comment 7

5 years ago
Bug 876542 did land in 25, which completely rewrote this code, so there's a good chance it could have altered the frequency of this crash.

From the comments, it sounds like this isn't an issue anymore and we're just talking about likely candidates for having fixed this?
Flags: needinfo?(chrislord.net)
So, sounds like the tiles refactor fixed this. I don't think we should uplift that, so tracking minus and marking won't fix for 23 and 24
tracking-fennec: ? → -
status-firefox23: affected → wontfix
status-firefox24: affected → wontfix
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #9)
> So, sounds like the tiles refactor fixed this. I don't think we should
> uplift that, so tracking minus and marking won't fix for 23 and 24

The fact that we fixed it in a different way in a later release doesn't mean we should wontfix it for all earlier. Are we at a loss of what could be going wrong here?
status-firefox24: wontfix → affected
status-firefox25: --- → affected
tracking-firefox25: --- → +
Flags: needinfo?(blassey.bugs)
Just to clarify, I think a wontfix 2-3 weeks ago was premature.
Flags: needinfo?(kairo)

Updated

4 years ago
Flags: needinfo?(kairo)

Comment 12

4 years ago
libEGL_MRVL.so@0x69ec is #4 in 23 and #8 in 24 in yesterday's data (so this is quite high-volume), and https://crash-stats.mozilla.com/report/list?signature=libEGL_MRVL.so%400x69ec says that it's even seen to some degree in 25 and 26 (see products section of signature summary), so I wonder what makes you so sure it's been fixed by the rewrite in 25.
(In reply to Alex Keybl [:akeybl] from comment #11)
> Just to clarify, I think a wontfix 2-3 weeks ago was premature.

I'm unaware of any theory as to what's going wrong here. If we get STR (its already marked as steps-wanted) then I think we should take another look at fixing it for 24.
Flags: needinfo?(blassey.bugs)

Comment 14

4 years ago
Devices for the two main signatures (see signature summary tabs of those URLs):

https://crash-stats.mozilla.com/report/list?signature=libEGL_MRVL.so%400x69ec
Manufacturer 	Model 	API Version 	CPU ABI 	Report Count 	Percentage
samsung 	SM-T211	16 (REL) 	armeabi-v7a 	1790 	37.495 %
samsung 	SM-T210	16 (REL) 	armeabi-v7a 	1706 	35.735 %
samsung 	SM-T210R	16 (REL) 	armeabi-v7a 	1277 	26.749 %
samsung 	SM-T2105	16 (REL) 	armeabi-v7a 	1 	0.021 %

https://crash-stats.mozilla.com/report/list?signature=libEGL_MRVL.so%400x69e4
Manufacturer 	Model 	API Version 	CPU ABI 	Report Count 	Percentage
samsung 	SM-T210	16 (REL) 	armeabi-v7a 	1295 	91.584 %
samsung 	SM-T210R	16 (REL) 	armeabi-v7a 	81 	5.728 %
samsung 	SM-T211	16 (REL) 	armeabi-v7a 	38 	2.687 %


So this might only affect a small amount of devices, but it ranks pretty high up in our crash stats.
This only happens on the Galaxy Tab 3 8.0 and 7.0. These devices are not at all like the Galaxy Tab 3 10.0 which is an Intel x86 device. The 8.0 and 7.0 are ARMv7 devices.

http://en.wikipedia.org/wiki/Samsung_Galaxy_Tab_3_%287.0%29
http://en.wikipedia.org/wiki/Samsung_Galaxy_Tab_3_%288.0%29
Placed a service now request REQ0018914.
I've looked at this device trying some of the URLs from the crash reports. I was able to crash about three times over a few hours visiting www.investopedia.com with this signature.

Device is sitting in front of my monitor in MTV if anyone needs to look at it. Near as I have been able to come to STR is navigate back and forth between page pages of investopedia and rotate the device. Occasionally you will crash.

Comment 18

4 years ago
Thanks!
(In reply to Kevin Brosnan [:kbrosnan] from comment #17)
> I've looked at this device trying some of the URLs from the crash reports. I
> was able to crash about three times over a few hours visiting
> www.investopedia.com with this signature.
> 
> Device is sitting in front of my monitor in MTV if anyone needs to look at
> it. Near as I have been able to come to STR is navigate back and forth
> between page pages of investopedia and rotate the device. Occasionally you
> will crash.

blassey, looks like we found those STR :)
Assignee: nobody → blassey.bugs

Updated

4 years ago
Keywords: steps-wanted
The STR is so reminiscent of bug 900020 (rotate the device), but that was uplifted Sep 25th, and we seems to have these crashes with the October builds.  Still, Benoit, does anything look interesting in the stack?
Flags: needinfo?(bjacob)
Assignee: blassey.bugs → nobody
Anyone that could look at this?
Flags: needinfo?(milan)
Jeff, could you bring the device along to the work week, so that we can look at it next week?
Flags: needinfo?(milan) → needinfo?(jgilbert)
Yeah, just bring the device by/to my desk in MTV. (3015 near Very Good Very Mighty)
Flags: needinfo?(jgilbert)
Dropped off. Near as I can tell this only happens when rotating the device during page load. It has been a low frequency crash for me.
topcrash is being replaced by more precise keywords per https://bugzilla.mozilla.org/show_bug.cgi?id=927557#c3
Keywords: topcrash → topcrash-android-armv7

Updated

4 years ago
status-firefox24: affected → wontfix
status-firefox25: affected → wontfix
The stack looks really different from what I've seen elsewhere; still it'll be interesting to see if the renew-surface-on-resume fixes will affect this. Unfortunately, with the work week and all, I haven't been able to work on this. For reference: bug 925608.
Flags: needinfo?(bjacob)
Assigning to Jeff, but only because he has the device right now (correct?)  Let's keep an eye on bug 925608 that Benoit is working on.
Assignee: nobody → jgilbert
status-firefox26: --- → affected
status-firefox27: --- → affected
status-firefox28: --- → affected
tracking-firefox26: --- → +
tracking-firefox27: --- → +
tracking-firefox28: --- → +
Jeff - do you have the device? Is this reproducing for you and is there any update on a potential fix here?
Flags: needinfo?(jgilbert)
I passed the device off to jrmuizel in Paris for reassignment.
Assignee: jgilbert → jmuizelaar
Flags: needinfo?(jgilbert)
And I passed it on to bjacob
Assignee: jmuizelaar → bjacob
We're now past the point of taking speculative fixes on FF26 so this will be wontfix again. Benoit - you have the device, where are you at with this and where is this on your priorities list for FF27 - do we need another assignee here/pass on the device one more time?  Please don't assign this bug to 'nobody', let's find someone who can take it on in the next 6 weeks.
status-firefox26: affected → wontfix
Flags: needinfo?(bjacob)
I haven't really spent any time specifically on this bug. Instead, we were hoping that fixing bug 925608 would fix several of this kind of Android bugs at once, possibly including the present one.

Bug 925608 landed a week ago and I've been dealing with fallout from it for the past week (bug 834243). Now (since today's Nightly build) we seem to be finally in good shape, so it finally looks like it's going to stick.

I'm OK to spend some time this week checking if I can reproduce the present bug if we think that that's what I should do.

It sounds like the device is a "Galazy Tab 3":

(In reply to Kevin Brosnan [:kbrosnan] from comment #15)
> This only happens on the Galaxy Tab 3 8.0 and 7.0. These devices are not at
> all like the Galaxy Tab 3 10.0 which is an Intel x86 device. The 8.0 and 7.0
> are ARMv7 devices.
> 
> http://en.wikipedia.org/wiki/Samsung_Galaxy_Tab_3_%287.0%29
> http://en.wikipedia.org/wiki/Samsung_Galaxy_Tab_3_%288.0%29

I don't really remember getting that device, but people give me devices all the time so that doesn't mean anything. I'll try looking for it in the Toronto office.
Flags: needinfo?(bjacob)
NI on :bjacob to see if he had a chance to investigate this and help with next steps. Overall android crash-rates have been significantly high in the past few releases, is there anything we can do in terms of this bug to avoid shipping with this top-crasher in Fx27 ?
Flags: needinfo?(bjacob)
Note AaronMT picked up the device to see if it is possible to find STR better than the occasional crash I hit in comments 17 and comment 24.
I believe I have solid steps to reproduce here, I have hit this crash about four times now on this demo canvas URL: http://www.smashcat.org/av/canvas_test/, I let the device idle for a minute or two after running the demo and it crashes.
Following the steps in comment 35, I immediately got a crash, but of a different nature than the one discussed here: filed bug 958256.
Flags: needinfo?(bjacob)
I retried in a plain Nightly, and got the bug 958256 crash immediately too.
I tried reproducing for another half hour, testing a wide variety of URLs with different types of content. I also tried all the features of about:home, since that is the top URL in comment 8. I could not reproduce any crash, once I worked around bug 958256 by disabling Skia/GL which should be unrelated.

I don't remember how I got magically assigned this bug, but I shouldn't have, and when I got assigned it, I should have mentioned immediately that there was no reason to assign to me: I have never worked on anything remotely related to AsyncCompositionManager::TransformScrollableLayer.

Since this bug seems so difficult to reproduce, if you want anything to get done about it, maybe the best you can do is find a developer who knows about that code, and get him to add various assertions or otherwise crash-report-annotations so that we can get more useful information about these crashes.
Assignee: bjacob → nobody
Ni on :milan to help with assignee here.

Milan, this is a top-crasher, QA has a reproducible device in Toronto, can you please help with an assignee who can help with urgent investigation here ?
Flags: needinfo?(milan)
CJ, do you have access to Galaxy Tab 3 8.0 and 7.0 (ARM v7)?
Flags: needinfo?(milan) → needinfo?(cku)
FWIW:

 1. The Toronto office's Galaxy Tab 3 device is on my desk (Toronto 5029).

 2. The file where AsyncCompositionManager::TransformScrollableLayer is defined is

      gfx/layers/composite/AsyncCompositionManager.cpp

 so I suppose that a starting point to find an assignee could involve hg loh / hg ann on that file.

Comment 42

4 years ago
Milan, we don't have that device at TPE side.
Flags: needinfo?(cku)
Either BenWa or Botond will pick this up, later this week, depending on who finishes the 1.3 APZC bugs first.
(Assignee)

Comment 44

4 years ago
Looking at the bug. libEGL_MRVL.so is stripped so I can trivially find the crashing function.
(Assignee)

Comment 45

4 years ago
Created attachment 8361907 [details] [diff] [review]
Tentative patch

Tentative patch since I can't reproduce it. This makes us not rely on the driver to get an error code for bad behavior but check it ourselves.

I can't reproduce the issue locally hence why this is a tentative fix.
Assignee: nobody → bgirard
Status: NEW → ASSIGNED
Attachment #8361907 - Flags: review?(jmuizelaar)
(Assignee)

Comment 46

4 years ago
I found a bug with HTML5 video while testing this. Filed as bug 961228.
Attachment #8361907 - Flags: review?(jmuizelaar) → review+

Comment 48

4 years ago
If this helps, I'd love to see this go into beta as soon as we can, as we only really see this with the more extensive population we have on beta and release.
https://hg.mozilla.org/mozilla-central/rev/cd4a9095e25c
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
NI on :Benwa to see if if this ready for our second last mobile beta, going to build tomorrow. I think if this is safe enough we should uplift and see if it helps.

Also Aaront can help verify this based on comment #35
Flags: needinfo?(bgirard)
(Assignee)

Comment 51

4 years ago
I don't think putting this in Beta is wise. Lets move it up to aurora (soon beta) instead?
Flags: needinfo?(bgirard)
(In reply to Benoit Girard (:BenWa) from comment #51)
> I don't think putting this in Beta is wise. Lets move it up to aurora (soon
> beta) instead?

Makes sense if you deem it risky, lets get it on aurora once we are comfortable with the m-c bake time.

Updated

4 years ago
status-firefox27: affected → wontfix
status-firefox28: affected → fixed
status-firefox29: --- → fixed

Comment 53

4 years ago
Kevin, why did you mark this fixed on 28? I see no indication that this landed there.
Flags: needinfo?(kbrosnan)

Comment 55

4 years ago
Thanks, that's awesome!
Depends on: 925608
Whiteboard: [native-crash] → [native-crash][fixed in 28+ by bug 925608]
You need to log in before you can comment on or make changes to this bug.