Closed Bug 900033 Opened 8 years ago Closed 8 years ago
crash in mozilla::layers::Async
Composition Manager::Transform Scrollable Layer @ lib EGL _MRVL .so
It's similar to bug 863313 and bug 845867 but with a different stack trace. With combined signatures, it's #16 crasher in 22.0 and #27 in 23.0b8. Signature libEGL_MRVL.so@0x69e4 More Reports Search UUID 44148022-b1cf-4c0f-b8bb-266072130731 Date Processed 2013-07-31 03:31:33.063326 Uptime 88 Install Age 88 since version was first installed. Install Time 2013-07-31 03:29:51 Product FennecAndroid Version 23.0 Build ID 20130723230815 Release Channel beta OS Android OS Version 0.0.0 Linux 3.4.5-1219080-user #1 SMP PREEMPT Sun Jun 2 21:52:23 KST 2013 armv7l samsung/lt02wifiue/ Build Architecture arm Build Architecture Info ARMv0 | 2 Crash Reason SIGSEGV Crash Address 0x7362437e App Notes AdapterDescription: 'Vivante Corporation -- GC1000 core -- OpenGL ES 2.0 -- Model: SM-T210R, Product: lt02wifiue, Manufacturer: samsung, Hardware: pxa988' GL Layers! EGL? EGL+ GL Context? GL Context+ GL Layers+ samsung SM-T210R samsung/lt02wifiue/lt02wifi:4.1.2/JZO54K/T210RUEAMF1:user/release-keys Frame Module Signature Source 0 libEGL_MRVL.so libEGL_MRVL.so@0x69e4 1 libxul.so mozilla::layers::AsyncCompositionManager::TransformScrollableLayer(mozilla::layers::Layer*, gfx3DMatrix const&) gfx/layers/composite/AsyncCompositionManager.cpp 2 libxul.so mozilla::layers::AsyncCompositionManager::TransformShadowTree(mozilla::TimeStamp) obj-firefox/dist/include/nsTArray.h 3 libxul.so mozilla::layers::CompositorParent::Composite() gfx/layers/ipc/CompositorParent.cpp 4 libxul.so mozilla::layers::CompositorParent::ResumeComposition() gfx/layers/ipc/CompositorParent.cpp 5 libxul.so RunnableMethod<mozilla::ipc::AsyncChannel, void (mozilla::ipc::AsyncChannel::*)(mozilla::ipc::AsyncChannel*, mozilla::ipc::AsyncChannel::Side), Tuple2<mozilla::ipc::AsyncChannel*, mozilla::ipc::AsyncChannel::Side> >::Run() ipc/chromium/src/base/tuple.h 6 libxul.so MessageLoop::RunTask(Task*) ipc/chromium/src/base/message_loop.cc 7 libxul.so MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) ipc/chromium/src/base/message_loop.cc 8 libxul.so MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc 9 libxul.so base::MessagePumpDefault::Run(base::MessagePump::Delegate*) ipc/chromium/src/base/message_pump_default.cc 10 libxul.so MessageLoop::RunInternal() ipc/chromium/src/base/message_loop.cc 11 libxul.so MessageLoop::Run() ipc/chromium/src/base/message_loop.cc 12 libxul.so base::Thread::ThreadMain() ipc/chromium/src/base/thread.cc 13 libxul.so ThreadFunc ipc/chromium/src/base/platform_thread_posix.cc 14 libc.so libc.so@0x12e02 15 libc.so libc.so@0x1255a 16 libEGL.so libEGL.so@0x22c3a More reports at: https://crash-stats.mozilla.com/query/?product=FennecAndroid&query_search=signature&query_type=contains&query=libEGL_MRVL.so
In aggregate, it's #5 top crasher in 23.0. Not sure if there's much more to do than what was done in bug 863313 and bug 845867.
tracking-fennec: --- → ?
Crash Signature: [@ libEGL_MRVL.so@0x69e4] [@ libEGL_MRVL.so@0x69ec] [@ libEGL_MRVL.so@0x642c] [@ libEGL_MRVL.so@0x6c58] [@ libEGL_MRVL.so@0x65ec] → [@ libEGL_MRVL.so@0x69e4] [@ libEGL_MRVL.so@0x69ec] [@ libEGL_MRVL.so@0x642c] [@ libEGL_MRVL.so@0x6c58] [@ libEGL_MRVL.so@0x65ec] [@ libEGL_MRVL.so@0x6c60]
kats,:jgilbert; since this is a different stack trace anything different that may help with investigation here ?
Kats, looks like this has dropped off significantly in 25. Do you think your compositor pause/resume refactor fixed this? If so, I think this winds up being won't fix.
No, my compositor pause/resume refactor was a while ago. I don't think I did anything related in 25. Based on the stack in comment 0 it looks like it's related to fixed-position code so maybe one of roc's and/or Cwiiis' changes fixed it.
(In reply to Kartikaya Gupta (email:email@example.com) from comment #4) > No, my compositor pause/resume refactor was a while ago. I don't think I did > anything related in 25. Based on the stack in comment 0 it looks like it's > related to fixed-position code so maybe one of roc's and/or Cwiiis' changes > fixed it. Tracking as this is a top-crasher and needinfo'ing :roc, Chris Lord to help with your above comment.
I don't think I've touched this since comment #0.
Bug 876542 did land in 25, which completely rewrote this code, so there's a good chance it could have altered the frequency of this crash. From the comments, it sounds like this isn't an issue anymore and we're just talking about likely candidates for having fixed this?
Total Count URL 15 about:home 3 about:blank 1 https://settings.adobe.com/flashplayer/mobile/ 1 http://www.pusch-wohnwagen.at/main-menu/mietprogramm/wohnwagen.html?id=5 1 http://apps.337.com/tr/bombom/ 1 http://www.rts.rs/page/sport/sr/story/2026/Ludo+i+brzo/1378594/Ludo+i+brzo,+47.+deo.html 1 https://m.facebook.com/r.php?refid=9 1 https://addons.mozilla.org/en-US/android/addon/full-screen-252573/?src=api 1 file:///storage/extSdCard/Downloads/bookmarks.html 1 http://www.philstar.com/ 1 http://www.filipinochannels.net/ 1 https://www.directv.com.co/midirectv/inicio-pospago 1 http://www.sturt.nsw.edu.au/mobile/ 1 http://www.google.de/ 1 http://www.freelang.net/dictionary/index.php 1 http://www.thefriedmans.net/jewgle/
So, sounds like the tiles refactor fixed this. I don't think we should uplift that, so tracking minus and marking won't fix for 23 and 24
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #9) > So, sounds like the tiles refactor fixed this. I don't think we should > uplift that, so tracking minus and marking won't fix for 23 and 24 The fact that we fixed it in a different way in a later release doesn't mean we should wontfix it for all earlier. Are we at a loss of what could be going wrong here?
Just to clarify, I think a wontfix 2-3 weeks ago was premature.
libEGL_MRVL.so@0x69ec is #4 in 23 and #8 in 24 in yesterday's data (so this is quite high-volume), and https://crash-stats.mozilla.com/report/list?signature=libEGL_MRVL.so%400x69ec says that it's even seen to some degree in 25 and 26 (see products section of signature summary), so I wonder what makes you so sure it's been fixed by the rewrite in 25.
(In reply to Alex Keybl [:akeybl] from comment #11) > Just to clarify, I think a wontfix 2-3 weeks ago was premature. I'm unaware of any theory as to what's going wrong here. If we get STR (its already marked as steps-wanted) then I think we should take another look at fixing it for 24.
Devices for the two main signatures (see signature summary tabs of those URLs): https://crash-stats.mozilla.com/report/list?signature=libEGL_MRVL.so%400x69ec Manufacturer Model API Version CPU ABI Report Count Percentage samsung SM-T211 16 (REL) armeabi-v7a 1790 37.495 % samsung SM-T210 16 (REL) armeabi-v7a 1706 35.735 % samsung SM-T210R 16 (REL) armeabi-v7a 1277 26.749 % samsung SM-T2105 16 (REL) armeabi-v7a 1 0.021 % https://crash-stats.mozilla.com/report/list?signature=libEGL_MRVL.so%400x69e4 Manufacturer Model API Version CPU ABI Report Count Percentage samsung SM-T210 16 (REL) armeabi-v7a 1295 91.584 % samsung SM-T210R 16 (REL) armeabi-v7a 81 5.728 % samsung SM-T211 16 (REL) armeabi-v7a 38 2.687 % So this might only affect a small amount of devices, but it ranks pretty high up in our crash stats.
This only happens on the Galaxy Tab 3 8.0 and 7.0. These devices are not at all like the Galaxy Tab 3 10.0 which is an Intel x86 device. The 8.0 and 7.0 are ARMv7 devices. http://en.wikipedia.org/wiki/Samsung_Galaxy_Tab_3_%287.0%29 http://en.wikipedia.org/wiki/Samsung_Galaxy_Tab_3_%288.0%29
Placed a service now request REQ0018914.
I've looked at this device trying some of the URLs from the crash reports. I was able to crash about three times over a few hours visiting www.investopedia.com with this signature. Device is sitting in front of my monitor in MTV if anyone needs to look at it. Near as I have been able to come to STR is navigate back and forth between page pages of investopedia and rotate the device. Occasionally you will crash.
(In reply to Kevin Brosnan [:kbrosnan] from comment #17) > I've looked at this device trying some of the URLs from the crash reports. I > was able to crash about three times over a few hours visiting > www.investopedia.com with this signature. > > Device is sitting in front of my monitor in MTV if anyone needs to look at > it. Near as I have been able to come to STR is navigate back and forth > between page pages of investopedia and rotate the device. Occasionally you > will crash. blassey, looks like we found those STR :)
Assignee: nobody → blassey.bugs
The STR is so reminiscent of bug 900020 (rotate the device), but that was uplifted Sep 25th, and we seems to have these crashes with the October builds. Still, Benoit, does anything look interesting in the stack?
Anyone that could look at this?
Jeff, could you bring the device along to the work week, so that we can look at it next week?
Flags: needinfo?(milan) → needinfo?(jgilbert)
Yeah, just bring the device by/to my desk in MTV. (3015 near Very Good Very Mighty)
Dropped off. Near as I can tell this only happens when rotating the device during page load. It has been a low frequency crash for me.
topcrash is being replaced by more precise keywords per https://bugzilla.mozilla.org/show_bug.cgi?id=927557#c3
The stack looks really different from what I've seen elsewhere; still it'll be interesting to see if the renew-surface-on-resume fixes will affect this. Unfortunately, with the work week and all, I haven't been able to work on this. For reference: bug 925608.
Assigning to Jeff, but only because he has the device right now (correct?) Let's keep an eye on bug 925608 that Benoit is working on.
Assignee: nobody → jgilbert
Jeff - do you have the device? Is this reproducing for you and is there any update on a potential fix here?
I passed the device off to jrmuizel in Paris for reassignment.
Assignee: jgilbert → jmuizelaar
And I passed it on to bjacob
Assignee: jmuizelaar → bjacob
We're now past the point of taking speculative fixes on FF26 so this will be wontfix again. Benoit - you have the device, where are you at with this and where is this on your priorities list for FF27 - do we need another assignee here/pass on the device one more time? Please don't assign this bug to 'nobody', let's find someone who can take it on in the next 6 weeks.
I haven't really spent any time specifically on this bug. Instead, we were hoping that fixing bug 925608 would fix several of this kind of Android bugs at once, possibly including the present one. Bug 925608 landed a week ago and I've been dealing with fallout from it for the past week (bug 834243). Now (since today's Nightly build) we seem to be finally in good shape, so it finally looks like it's going to stick. I'm OK to spend some time this week checking if I can reproduce the present bug if we think that that's what I should do. It sounds like the device is a "Galazy Tab 3": (In reply to Kevin Brosnan [:kbrosnan] from comment #15) > This only happens on the Galaxy Tab 3 8.0 and 7.0. These devices are not at > all like the Galaxy Tab 3 10.0 which is an Intel x86 device. The 8.0 and 7.0 > are ARMv7 devices. > > http://en.wikipedia.org/wiki/Samsung_Galaxy_Tab_3_%287.0%29 > http://en.wikipedia.org/wiki/Samsung_Galaxy_Tab_3_%288.0%29 I don't really remember getting that device, but people give me devices all the time so that doesn't mean anything. I'll try looking for it in the Toronto office.
NI on :bjacob to see if he had a chance to investigate this and help with next steps. Overall android crash-rates have been significantly high in the past few releases, is there anything we can do in terms of this bug to avoid shipping with this top-crasher in Fx27 ?
Note AaronMT picked up the device to see if it is possible to find STR better than the occasional crash I hit in comments 17 and comment 24.
I believe I have solid steps to reproduce here, I have hit this crash about four times now on this demo canvas URL: http://www.smashcat.org/av/canvas_test/, I let the device idle for a minute or two after running the demo and it crashes.
Following the steps in comment 35, I immediately got a crash, but of a different nature than the one discussed here: filed bug 958256.
I retried in a plain Nightly, and got the bug 958256 crash immediately too.
I tried reproducing for another half hour, testing a wide variety of URLs with different types of content. I also tried all the features of about:home, since that is the top URL in comment 8. I could not reproduce any crash, once I worked around bug 958256 by disabling Skia/GL which should be unrelated. I don't remember how I got magically assigned this bug, but I shouldn't have, and when I got assigned it, I should have mentioned immediately that there was no reason to assign to me: I have never worked on anything remotely related to AsyncCompositionManager::TransformScrollableLayer. Since this bug seems so difficult to reproduce, if you want anything to get done about it, maybe the best you can do is find a developer who knows about that code, and get him to add various assertions or otherwise crash-report-annotations so that we can get more useful information about these crashes.
Assignee: bjacob → nobody
Ni on :milan to help with assignee here. Milan, this is a top-crasher, QA has a reproducible device in Toronto, can you please help with an assignee who can help with urgent investigation here ?
CJ, do you have access to Galaxy Tab 3 8.0 and 7.0 (ARM v7)?
Flags: needinfo?(milan) → needinfo?(cku)
FWIW: 1. The Toronto office's Galaxy Tab 3 device is on my desk (Toronto 5029). 2. The file where AsyncCompositionManager::TransformScrollableLayer is defined is gfx/layers/composite/AsyncCompositionManager.cpp so I suppose that a starting point to find an assignee could involve hg loh / hg ann on that file.
Milan, we don't have that device at TPE side.
Either BenWa or Botond will pick this up, later this week, depending on who finishes the 1.3 APZC bugs first.
Looking at the bug. libEGL_MRVL.so is stripped so I can trivially find the crashing function.
Tentative patch since I can't reproduce it. This makes us not rely on the driver to get an error code for bad behavior but check it ourselves. I can't reproduce the issue locally hence why this is a tentative fix.
Assignee: nobody → bgirard
Status: NEW → ASSIGNED
Attachment #8361907 - Flags: review?(jmuizelaar)
I found a bug with HTML5 video while testing this. Filed as bug 961228.
Attachment #8361907 - Flags: review?(jmuizelaar) → review+
If this helps, I'd love to see this go into beta as soon as we can, as we only really see this with the more extensive population we have on beta and release.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
NI on :Benwa to see if if this ready for our second last mobile beta, going to build tomorrow. I think if this is safe enough we should uplift and see if it helps. Also Aaront can help verify this based on comment #35
I don't think putting this in Beta is wise. Lets move it up to aurora (soon beta) instead?
(In reply to Benoit Girard (:BenWa) from comment #51) > I don't think putting this in Beta is wise. Lets move it up to aurora (soon > beta) instead? Makes sense if you deem it risky, lets get it on aurora once we are comfortable with the m-c bake time.
Kevin, why did you mark this fixed on 28? I see no indication that this landed there.
bug 925608 fixed this, it was fixed in Firefox 28. The crash does not show up in 28 beta 1 at all. https://crash-stats.mozilla.com/query/?product=FennecAndroid&version=FennecAndroid%3A28.0b1&range_value=1&range_unit=weeks&date=02%2F12%2F2014+17%3A00%3A00&query_search=signature&query_type=contains&query=libEGL_MRVL.so%400x6&reason=&release_channels=&build_id=&process_type=any&hang_type=any
Thanks, that's awesome!
Depends on: 925608
Whiteboard: [native-crash] → [native-crash][fixed in 28+ by bug 925608]
You need to log in before you can comment on or make changes to this bug.