863313 - crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE or libGLES_rhea on ICS because gecko is too busy in reflow

Reporter

Description

•

12 years ago

It's #19 crasher in 22.0a2. It first showed in 22.0a1/20130223, was discontinuous across builds and stopped after 23.0a1/20130406. Signature libEGL_VIVANTE.so@0x5d0c More Reports Search UUID 85103878-bf84-481e-91ab-437bb2130417 Date Processed 2013-04-17 10:15:21 Uptime 527 Install Age 23.2 hours since version was first installed. Install Time 2013-04-16 11:04:32 Product FennecAndroid Version 22.0a2 Build ID 20130416004017 Release Channel aurora OS Android OS Version 0.0.0 Linux 3.0.8+ #497 PREEMPT Thu Aug 30 12:01:31 CST 2012 armv7l Android/rk29sdk/rk29sdk:4.0.4/IMM76D/20120823.112042:user/release-keys Build Architecture arm Build Architecture Info Crash Reason SIGSEGV Crash Address 0x5fef0b50 App Notes AdapterDescription: 'Vivante Corporation -- GC800 core -- OpenGL ES 2.0 -- Model: Full AOSP on Rk29sdk, Product: rk29sdk, Manufacturer: unknown, Hardware: rk29board' EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ Stagefright? Stagefright- unknown Full AOSP on Rk29sdk Android/rk29sdk/rk29sdk:4.0.4/IMM76D/20120823.112042:user/release-keys Processor Notes sp-processor09.phx1.mozilla.com_17411:2012; exploitability tool failed: 127 EMCheckCompatibility True Adapter Vendor ID Vivante Corporation Adapter Device ID GC800 core Device unknown Full AOSP on Rk29sdk Android API Version 15 (REL) Android CPU ABI armeabi-v7a Frame Module Signature Source 0 libEGL_VIVANTE.so libEGL_VIVANTE.so@0x5d0c 1 libEGL.so libEGL.so@0x230a6 2 libEGL.so libEGL.so@0xb137 3 libEGL.so libEGL.so@0x23056 4 libEGL_VIVANTE.so libEGL_VIVANTE.so@0x5c86 5 libEGL.so libEGL.so@0xd1bb 6 libEGL.so libEGL.so@0xd08f 7 libxul.so mozilla::gl::GLContextEGL::MakeCurrentImpl gfx/gl/GLLibraryEGL.h:164 8 libxul.so mozilla::layers::LayerManagerOGL::MakeCurrent obj-firefox/dist/include/GLContext.h:185 9 libxul.so mozilla::layers::LayerManagerOGL::Render gfx/layers/opengl/LayerManagerOGL.cpp:1075 More reports at: https://crash-stats.mozilla.com/report/list?signature=libEGL_VIVANTE.so%400x5d0c

Scoobidiver (away)

Reporter

Comment 1

•

12 years ago

With combined signatures, it's #4 top crasher in 22.0a2 and #22 in 23.0a1. More reports at: https://crash-stats.mozilla.com/query/query?product=FennecAndroid&query_search=signature&query_type=contains&query=libEGL_VIVANTE.so%400x5&do_query=1

Crash Signature: [@ libEGL_VIVANTE.so@0x5d0c] → [@ libEGL_VIVANTE.so@0x5d0c] [@ libEGL_VIVANTE.so@0x57c8] [@ libEGL_VIVANTE.so@0x5be8] [@ libEGL_VIVANTE.so@0x5384] [@ libEGL_VIVANTE.so@0x57bc] [@ libEGL_VIVANTE.so@0x5ed4] [@ libEGL_VIVANTE.so@0x5720] [@ libEGL_VIVANTE.so@0x5d20] [@ libEGL_VIVAN…

status-firefox23: unaffected → affected

tracking-firefox22: --- → ?

Keywords: topcrash

Summary: crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE.so@0x5d0c with Vivante GC800 core and rk29board hw running ICS → crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE.so@0x5... with Vivante GC400 or GC800 core running ICS

Scoobidiver (away)

Reporter

Updated

•

12 years ago

Crash Signature: libEGL_VIVANTE.so@0x5ee8] → libEGL_VIVANTE.so@0x5ee8] [@ libGAL.so@0x262fc]

bhavana bajaj [:bajaj]

Updated

•

12 years ago

Keywords: needURLs, regressionwindow-wanted, steps-wanted

bhavana bajaj [:bajaj]

Comment 2

•

12 years ago

Adding needsinfo on :Kairo for urls co-relations which can help QA.

Flags: needinfo?(kairo)

Aaron Train [:aaronmt]

Comment 3

•

12 years ago

Device names?

Robert Kaiser

Comment 4

•

12 years ago

URLs: none interesting, some about:home, some about:blank Devices: last week of Nightly: libEGL_VIVANTE.so@0x5be8 6 Unknown AN9G2I last week of Aurora: libEGL_VIVANTE.so@0x5414 2 Unknown NAM805HCX 2 previous week of Nightly: libEGL_VIVANTE.so@0x5ee8 3 Penta Penta WS802C 3 previous week of Aurora: libEGL_VIVANTE.so@0x5d0c 4 3-Q RC9716B 4 libEGL_VIVANTE.so@0x5ed4 2 HUAWEI MediaPad 7 Lite 2 libEGL_VIVANTE.so@0x5481 1 HUAWEI MediaPad 10 FHD 1 All in all, I suspect that this may not really be worth investigating too much.

Flags: needinfo?(kairo)

Kevin Brosnan [Ex-Mozilla]

Comment 5

•

12 years ago

These look to be Huawei and small market player Android tablet devices. If this is fixed on trunk I would let the fix just ride the trains. Acquiring affected devices will likely prove difficult.

tracking-fennec: --- → ?

Keywords: needURLs

Scoobidiver (away)

Reporter

Comment 6

•

12 years ago

(In reply to Kevin Brosnan [:kbrosnan] from comment #5) > If this is fixed on trunk That's the point. It's not fixed in the trunk.

Kevin Brosnan [Ex-Mozilla]

Comment 7

•

12 years ago

I agree information is incomplete. The only crashes we have on FxA 23a1 over the last 4 weeks are from the beginning of the month. This suggests that the issue may have been fixed by some other code base change. We would know more when 23 goes to Aurora in the second week of May.

Scoobidiver (away)

Reporter

Comment 8

•

12 years ago

(In reply to Kevin Brosnan [:kbrosnan] from comment #7) > The only crashes we have on FxA 23a1 over the last 4 weeks are from the beginning of > the month. It's wrong. There are 30 crashes in 23.0a1 over the last four weeks and the latest one happened in April 28 build.

Kevin Brosnan [Ex-Mozilla]

Comment 9

•

12 years ago

Ah I was looking at the @0x5d0c signature from comment 0.

Brad Lassey [:blassey] (use needinfo?)

Comment 10

•

12 years ago

Alex, I think we should just black list these devices and be done with it. What is the process for that now-a-days?

tracking-fennec: ? → 22+

Flags: needinfo?(akeybl)

Alex Keybl [:akeybl]

Comment 11

•

12 years ago

(In reply to Brad Lassey [:blassey] from comment #10) > Alex, I think we should just black list these devices and be done with it. > What is the process for that now-a-days? gfx or device?

Flags: needinfo?(akeybl) → needinfo?(blassey.bugs)

Brad Lassey [:blassey] (use needinfo?)

Comment 12

•

12 years ago

I was thinking that we should black list the devices in the play store.

Flags: needinfo?(blassey.bugs)

Scoobidiver (away)

Reporter

Comment 13

•

12 years ago

(In reply to Brad Lassey [:blassey] from comment #12) > I was thinking that we should black list the devices in the play store. Before doing that, we should wait 22.0 go to Beta as Aurora and Nightly not representative for device specific crashes.

Alex Keybl [:akeybl]

Comment 14

•

12 years ago

Even still, it's not clear that we should block the device (the device should be pretty unusable to block).

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 15

•

12 years ago

Scoobi beyond the devices, is this the similar bug 863307 ? The other one looks like ICS as well. I think I might be missing something.

Scoobidiver (away)

Reporter

Comment 16

•

12 years ago

(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #15) > Scoobi beyond the devices, is this the similar bug 863307 ? The other one > looks like ICS as well. I think I might be missing something. They are similar and likely related but no necessarily duplicates because a 0x2000 address shift in libEGL_VIVANTE.so is important and can't be explained by the library version breakdown. This one is 100% correlated to Vivante GC400/GC800 GPUs.

Scoobidiver (away)

Reporter

Comment 17

•

12 years ago

It might related to bug 848810.

Alex Keybl [:akeybl]

Updated

•

12 years ago

Whiteboard: [native-crash] → [native-crash][waiting on followup to comment 13 before tracking]

Scoobidiver (away)

Reporter

Updated

•

12 years ago

status-firefox24: --- → affected

Kevin Brosnan [Ex-Mozilla]

Comment 18

•

12 years ago

Since we are going to go with blocking the devices kairo can you get a list of devices that crash on libEGL_VIVANTE.so@0x5

Flags: needinfo?(kairo)

Keywords: regressionwindow-wanted

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 19

•

12 years ago

VIMICRO MID HUAWEI MediaPad 7 Lite Mediacom Xteam Smartpad 810c Unknown AN7DG3 ViewSonic ViewPad 10e HUAWEI U9508 AIRIS OnePAD 970 AOC MW0710 Archos 97 CARBON HKC P771A MSI Enjoy 10 PLUS Penta Penta WS802C Unknown 720F Unknown AN10BG3 Unknown CTC07SO Unknown DEM752HCF Unknown Full AOSP on Rk29sdk Unknown HFM752HCF Unknown INTELLIPAD Unknown K8GT_H Unknown MID Unknown MW0711 Unknown NEXT Unknown PMP3370B Unknown PMP5080CPRO Unknown PMP5097CPRO Unknown Q3M752HC Unknown QPAD C-0700111 Unknown S800 Unknown STM1007HD Unknown TR720F Unknown miTab FUNK WEXLER TAB 7i 3G

Kevin Brosnan [Ex-Mozilla]

Comment 20

•

12 years ago

Aaron how does the above device list match devices listed in the Play Store? Block them right away?

Flags: needinfo?(kairo) → needinfo?(aaron.train)

Benjamin Smedberg

Comment 21

•

12 years ago

blassey, is this block a permanent solution and we think it's ok because these are not an important target? Or should there be a separate followup bug on actually fixing this and reenabling these devices?

Flags: needinfo?(blassey.bugs)

Scoobidiver (away)

Reporter

Comment 22

•

12 years ago

Follow-up of comment 13: It's #5 top crasher in 22.0b1, #16 in 23.0a2, and #15 in 24.0a1.

Aaron Train [:aaronmt]

Comment 23

•

12 years ago

(In reply to Kevin Brosnan [:kbrosnan] from comment #20) > Aaron how does the above device list match devices listed in the Play Store? > Block them right away? The following can be blocked on Google Play directly: > HUAWEI MediaPad 7 Lite > HUAWEI U9508 > AOC MW0710 > Archos 97 CARBON > HKC P771A > Unknown MID > Unknown MW0711 > Unknown PMP3370B > Unknown PMP5080CPRO > Unknown PMP5097CPRO

Flags: needinfo?(aaron.train)

Brad Lassey [:blassey] (use needinfo?)

Comment 24

•

12 years ago

My understanding was that these devices are unusable, which warranted blocking the play store. I asked Aaron to order one. need-info to him to tell us how usable the devices are.

Flags: needinfo?(blassey.bugs) → needinfo?(aaron.train)

Benjamin Smedberg

Comment 25

•

12 years ago

You mean they're unusable because of this crash? I don't think we know that because we know that while it's a common crash, we don't know whether it always crashes or just sometimes.

Aaron Train [:aaronmt]

Comment 26

•

12 years ago

(In reply to Aaron Train [:aaronmt] from comment #23) > > HUAWEI MediaPad 7 Lite On order RITM0016641.

Flags: needinfo?(aaron.train)

Aaron Train [:aaronmt]

Updated

•

12 years ago

QA Contact: aaron.train

Alex Keybl [:akeybl]

Comment 27

•

12 years ago

We'll wait for the results of a Aaron's repro (if possible to repro).

tracking-firefox22: ? → +

Scoobidiver (away)

Reporter

Updated

•

12 years ago

Crash Signature: libEGL_VIVANTE.so@0x5ee8] [@ libGAL.so@0x262fc] → libEGL_VIVANTE.so@0x5ee8] [@ libEGL_VIVANTE.so@0x53f8] [@ libGAL.so@0x262fc]

Aaron Train [:aaronmt]

Comment 30

•

12 years ago

I have the MediaPad 7 Lite now (RK29board), investigating.

Aaron Train [:aaronmt]

Comment 31

•

12 years ago

Attached file Rawlog (Firefox 22, MediaPad 7 Lite) — Details

I have hit this crash once but am unsure what exactly I did. Attempts to reproduce again are yielding the following in my attachment. So far, Firefox is certainly usable on these devices.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 32

•

12 years ago

The log from comment 31 shows that the gecko thread is busy when the Java UI thread goes to send it a synchronous message. This belongs to a class of bugs we've seen before, and for which I would like bug 863777 to be landed to get more data (specifically to get a gecko stack when this happens).

Depends on: 863777

Aaron Train [:aaronmt]

Comment 33

•

12 years ago

I can reproduce this crash now; it involves output from above by loading a busy page while heading into the AwesomeScreen bp-27d5787d-2168-4ef5-984f-180f32130606

Scoobidiver (away)

Reporter

Updated

•

12 years ago

Keywords: steps-wanted → reproducible

Whiteboard: [native-crash][waiting on followup to comment 13 before tracking] → [native-crash]

Tracy Walker [:tracy]

Comment 34

•

12 years ago

only Total Count URL 2 http://www.google.com.pk/

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 35

•

12 years ago

Attached file Gecko traces from sendEventToGeckoSync failures — Details

I reproduced the problem using Aaron's instructions on his tablet and grabbed a handful of gecko stack traces from when it happens. They seem to be in various different parts of the code. It seems like the page (online.wsj.com) is very stressful for Gecko and so it often takes 4+ seconds before it can respond to events from the Java UI thread. Also to clarify what I said in comment 32, once we hit this condition (sendEventToGeckoSync failing due to timeout), the Java code and compositor are no longer in sync with respect to the state of the surface, and so GL-related crashes are not entirely unexpected. The alternative here is to make sendEventToGeckoSync not timeout, in which case we will get an Android ANR after 5 seconds. I don't know if that is preferable or not. See https://bugzilla.mozilla.org/show_bug.cgi?id=835356#c12 and onwards for an earlier discussion on this.

(inactive) Jim Chen [:jchen] [:darchons]

Comment 36

•

12 years ago

If needed, here are the Nightly builds that spit out a profile to logcat when Gecko event sync is taking too long. The output is in JSON and because logcat has a line length limit, the JSON is split into 2000 char blocks. https://tbpl.mozilla.org/?tree=Try&rev=0b075022183c http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nchen@mozilla.com-0b075022183c/

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 37

•

12 years ago

Attached file Profile dump from jchen's build — Details

Using a modified version of the build jchen created I got the attached profile dump. I'm not sure how to interpret it though.

Alex Keybl [:akeybl]

Comment 38

•

11 years ago

Kats - please set status-firefox22 back to affected if you think there's any chance we'll fix this before Monday's final Beta build.

Assignee: nobody → bugmail.mozilla

status-firefox22: affected → wontfix

tracking-firefox23: --- → +

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 39

•

11 years ago

That is unlikely, leaving as wontfix. jchen, do you know how to interpret the dump in comment 37? Is there some way to read it in cleopatra?

Flags: needinfo?(nchen)

Naoki Hirata :nhirata (please use needinfo instead of cc)

Updated

•

11 years ago

Crash Signature: [@ libEGL_VIVANTE.so@0x5d0c] [@ libEGL_VIVANTE.so@0x57c8] [@ libEGL_VIVANTE.so@0x5be8] [@ libEGL_VIVANTE.so@0x5384] [@ libEGL_VIVANTE.so@0x57bc] [@ libEGL_VIVANTE.so@0x5ed4] [@ libEGL_VIVANTE.so@0x5720] [@ libEGL_VIVANTE.so@0x5d20] [@ libEGL_VIVAN… → [@ libEGL_VIVANTE.so@0x33fe] [@ libEGL_VIVANTE.so@0x3402] [@ libEGL_VIVANTE.so@0x336e] [@ libEGL_VIVANTE.so@0x3374] [@ libEGL_VIVANTE.so@0x337c] [@ libEGL_VIVANTE.so@0x34b6] [@ libEGL_VIVANTE.so@0x3274] [@ libEGL_VIVANTE.so@0x320e] [@ libEGL_VIVAN…

(inactive) Jim Chen [:jchen] [:darchons]

Comment 40

•

11 years ago

Attached file Prettified JSON profile dump — Details

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #39) > That is unlikely, leaving as wontfix. > > jchen, do you know how to interpret the dump in comment 37? Is there some > way to read it in cleopatra? That build only takes one profile sample, and splits the JSON profile into lines, before logging the lines through logcat. Here's the prettified JSON. You can see the Gecko thread stack there. I don't know how useful this is, but we can make the build take a multi-sample profile too if that's more useful.

Flags: needinfo?(nchen)

Scoobidiver (away)

Reporter

Updated

•

11 years ago

Crash Signature: [@ libEGL_VIVANTE.so@0x33fe] [@ libEGL_VIVANTE.so@0x3402] [@ libEGL_VIVANTE.so@0x336e] [@ libEGL_VIVANTE.so@0x3374] [@ libEGL_VIVANTE.so@0x337c] [@ libEGL_VIVANTE.so@0x34b6] [@ libEGL_VIVANTE.so@0x3274] [@ libEGL_VIVANTE.so@0x320e] [@ libEGL_VIVAN… → [@ libEGL_VIVANTE.so@0x33fe] [@ libEGL_VIVANTE.so@0x3402] [@ libEGL_VIVANTE.so@0x3414] [@ libEGL_VIVANTE.so@0x34d0] [@ libEGL_VIVANTE.so@0x3442] [@ libEGL_VIVANTE.so@0x336e] [@ libEGL_VIVANTE.so@0x3374] [@ libEGL_VIVANTE.so@0x337c] [@ libEGL_VIVAN…

Summary: crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE.so@0x5... with Vivante GC400 or GC800 core running ICS → crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE or libGLES_rhea on ICS

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 41

•

11 years ago

Ok, thanks. More samples would be helpful but in this case it shows the same thing as the logs from comment 35. In a nutshell: the page is taking a long time to reflow and paint, causing the gecko thread to be busy for long periods of time. This means when the Java UI thread needs to set up GL state (which blocks on the gecko thread) it can't do it in a timely manner. The UI thread aborts the wait after 4 seconds to prevent Android from ANR'ing the app, which results in invalid GL state and eventually this crash.

Assignee: bugmail.mozilla → nobody

Component: Graphics: Layers → Layout

Summary: crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE or libGLES_rhea on ICS → crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE or libGLES_rhea on ICS because gecko is too busy in reflow

Whiteboard: [native-crash] → [native-crash][summary in comment 41]

Brad Lassey [:blassey] (use needinfo?)

Updated

•

11 years ago

Assignee: nobody → bugmail.mozilla

Scoobidiver (away)

Reporter

Comment 42

•

11 years ago

It's #2 top crasher in the first hours of 22.0 (all devices, not only ARMv6 devices) and accounts for 6.7% of all crashes. This bug and bug 845867 which is likely a dupe account for 14.3% of all crashes. (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #41) > In a nutshell: the page is taking a long time to reflow and paint Reflow is not new so which feature is causing that in 22.0 and above?

Flags: needinfo?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 43

•

11 years ago

I looked over this again and don't really have much to add at this point. I agree that the spike in crashes in 22 is surprising but given the info we have so far I can't explain why that is. It'll be interesting to see if bug 887097 impacts this at all.

Depends on: 887097

Flags: needinfo?(bugmail.mozilla)

Scoobidiver (away)

Reporter

Comment 44

•

11 years ago

(In reply to Aaron Train [:aaronmt] from comment #33) > I can reproduce this crash now; it involves output from above by loading a > busy page while heading into the AwesomeScreen Can you find out the regression range?

Keywords: regressionwindow-wanted

Scoobidiver (away)

Reporter

Updated

•

11 years ago

status-firefox25: --- → affected

Lukas Blakk [:lsblakk] use ?needinfo

Comment 45

•

11 years ago

Aaron: anything new on comment 44? Will be checking on this bug frequently as it's a suspected dupe for bug 845867 which is a topcrasher for ARMv6 22.0

Flags: needinfo?(aaron.train)

Aaron Train [:aaronmt]

Comment 46

•

11 years ago

No. Triggering this crash with the associated signatures is not 100% reproducible; what I reported was actually a bit of a struggle to tickle the conditions to crash with this signature.

Flags: needinfo?(aaron.train)

Scoobidiver (away)

Reporter

Updated

•

11 years ago

Crash Signature: libEGL_VIVANTE.so@0x57bc] [@ libEGL_VIVANTE.so@0x5ed4] [@ libEGL_VIVANTE.so@0x5720] [@ libEGL_VIVANTE.so@0x5d20] [@ libEGL_VIVANTE.so@0x5ee8] [@ libEGL_VIVANTE.so@0x53f8] [@ libEGL_VIVANTE.so@0x5414] [@ libGLES_rhea.so@0x8ed58] [@ libGLES_rhea.so… → libEGL_VIVANTE.so@0x57bc] [@ libEGL_VIVANTE.so@0x5ed4] [@ libEGL_VIVANTE.so@0x5720] [@ libEGL_VIVANTE.so@0x5d20] [@ libEGL_VIVANTE.so@0x5ee8] [@ libEGL_VIVANTE.so@0x53f8] [@ libEGL_VIVANTE.so@0x5414] [@ libEGL_VIVANTE.so@0x3a4c] [@ libEGL_VIVANTE…

Robert Kaiser

Comment 47

•

11 years ago

If I see things correctly, then bug 887097 has landed in 23.0b2, but these signatures don't look like they have really diminished in that version.

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

11 years ago

status-firefox23: affected → wontfix

Kevin Brosnan [Ex-Mozilla]

Comment 48

•

11 years ago

22+ ship has sailed. Need to re-triage this.

tracking-fennec: 22+ → ?

Brad Lassey [:blassey] (use needinfo?)

Updated

•

11 years ago

tracking-fennec: ? → +

Alex Keybl [:akeybl]

Comment 49

•

11 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #41) > Ok, thanks. More samples would be helpful but in this case it shows the same > thing as the logs from comment 35. > > In a nutshell: the page is taking a long time to reflow and paint, causing > the gecko thread to be busy for long periods of time. This means when the > Java UI thread needs to set up GL state (which blocks on the gecko thread) > it can't do it in a timely manner. The UI thread aborts the wait after 4 > seconds to prevent Android from ANR'ing the app, which results in invalid GL > state and eventually this crash. Still a topcrash on Release, a little lower on Beta. bug 845867 is very similar and also remains a huge topcrash on ARMv6. Triage's understanding is that the root cause is difficult to identify. Is there anything we can do to recover and not crash? For instance, stop reflow or painting when a certain condition is met?

Flags: needinfo?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 50

•

11 years ago

As Kairo said in comment 47, bug 887097 has landed but this crash is still happening which is somewhat unexpected to me. I would have expected the ANR to trigger if we get into a scenario where gecko is busy and the UI thread is blocked on it for 5+ seconds. And even if it doesn't, we no longer abort the gecko event sync so the compositor thread state should never get out of sync with the gecko thread state. I think we need to go back to the device and try to reproduce this again on a recent nightly and see if the observable behaviour is the same. I would expect not - the ANR dialog should pop up instead. If it still crashes, the logging I recently added to nightly for bug 884047 should provide some additional insight into the problem. Aaron, would you mind trying to repro this again on the Mediapad device with a recent nightly and attaching the logcat?

Flags: needinfo?(bugmail.mozilla)

Tracy Walker [:tracy]

Comment 51

•

11 years ago

topcrash is being replaced by more precise keywords per https://bugzilla.mozilla.org/show_bug.cgi?id=927557#c3

Keywords: topcrash → topcrash-android-armv7

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 52

•

11 years ago

For comment 50

Flags: needinfo?(aaron.train)

Aaron Train [:aaronmt]

Comment 53

•

11 years ago

I have not hit a crash, but I am running into the following which attempts to write out a traces.txt but fails. Any idea how to correct this? D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 e to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 I/InputDispatcher( 679): Application is not responding: Window{41b9dbc8 org.mozilla.fennec/org.mozilla.fennec.App paused=false}. 5001.9ms since event, 5001.6ms since wait started I/WindowManager( 679): Input event dispatching timed out sending to org.mozilla.fennec/org.mozilla.fennec.App I/SystemProperties( 679): get key=events.cpu I/SystemProperties( 679): get key=dalvik.vm.stack-trace-file,def=null I/Process ( 679): Sending signal. PID: 2958 SIG: 3 I/dalvikvm( 2958): threadid=3: reacting to signal 3 I/dalvikvm( 2958): Wrote stack traces to '/data/anr/traces.txt' I/Process ( 679): Sending signal. PID: 679 SIG: 3 I/dalvikvm( 679): threadid=3: reacting to signal 3 I/dalvikvm( 679): Wrote stack traces to '/data/anr/traces.txt' I/Process ( 679): Sending signal. PID: 751 SIG: 3 I/dalvikvm( 751): threadid=3: reacting to signal 3 I/dalvikvm( 751): Wrote stack traces to '/data/anr/traces.txt' I/Process ( 679): Sending signal. PID: 840 SIG: 3 I/dalvikvm( 840): threadid=3: reacting to signal 3 I/dalvikvm( 840): Wrote stack traces to '/data/anr/traces.txt' I/Process ( 679): Sending signal. PID: 850 SIG: 3 I/dalvikvm( 850): threadid=3: reacting to signal 3 I/dalvikvm( 850): Wrote stack traces to '/data/anr/traces.txt' I/Process ( 679): Sending signal. PID: 865 SIG: 3 I/dalvikvm( 865): threadid=3: reacting to signal 3 I/dalvikvm( 865): Wrote stack traces to '/data/anr/traces.txt' D/dalvikvm( 679): GC_EXPLICIT freed 765K, 24% free 11545K/15011K, paused 11ms+8ms I/Lights ( 679): >>> Enter set_buttons_light D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496 How can I pull traces.txt? E/ActivityManager( 679): Error reading /data/anr/traces.txt E/ActivityManager( 679): java.io.FileNotFoundException: /data/anr/traces.txt: open failed: ENOENT (No such file or directory)

Flags: needinfo?(aaron.train)

Aaron Train [:aaronmt]

Comment 54

•

11 years ago

Attached file traces_org.mozilla.fennec.txt — Details

Actually it looks like it was written to traces_org.mozilla.fennec.txt - hopefully something helpful in here?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 55

•

11 years ago

In that logcat it looks the android main thread is at: at com.google.android.gles_jni.EGLImpl.eglSwapBuffers(Native Method) at android.view.HardwareRenderer$GlRenderer.draw(HardwareRenderer.java:875) which should be a pretty fast operation. I'm not sure why it would ANR there. :( As for the "Aborting draw due to resolution change" messages - mostly that can be ignored but usually there shouldn't be a ton of them getting printed like that. NI to Cwiiis; maybe he can provide some insight on it.

Flags: needinfo?(chrislord.net)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 56

•

11 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #55) > In that logcat it looks the android main thread is at: > > at com.google.android.gles_jni.EGLImpl.eglSwapBuffers(Native Method) > at android.view.HardwareRenderer$GlRenderer.draw(HardwareRenderer.java:875) > > which should be a pretty fast operation. I'm not sure why it would ANR > there. :( I saw a hang similar to this in bug 935676. It was waiting on some mutex inside libEGL, but I don't see EGL appear anywhere else in this trace. The Compositor thread isn't listed for some reason?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 57

•

11 years ago

Hm, good point. Maybe the compositor thread died somehow and so the main thread is left waiting for the mutex that will never be released?

Chris Lord [:cwiiis]

Comment 58

•

11 years ago

The number of aborts is because on the first paint, we ignore if the front-end tries to abort - so that's likely just the number of transactions to do the paint and the front-end is trying to abort on each one. The other reason is that the zoom front-end side is incorrect on first-paint because setFirstPaintViewport isn't called *after* the corresponding paint (I think that was it anyway).

Flags: needinfo?(chrislord.net)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 59

•

11 years ago

This crash seems to have dropped significantly in 28, possibly because of bug 925608. There's still a handful I see on crash-stats but I think we can take off the topcrash status?

Depends on: 925608

Flags: needinfo?(kairo)

Robert Kaiser

Comment 60

•

11 years ago

Yes, that is as expected and it's really awesome, 28 looks like it will be one of the most stable releases we shipped so far. I'd mark this crash as a dupe of bug 925608, but feel free to resolve in other ways. :)

Flags: needinfo?(kairo)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 61

•

11 years ago

With pleasure :)

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → DUPLICATE

Kevin Brosnan [Ex-Mozilla]

Updated

•

10 years ago

Keywords: regressionwindow-wanted

Rawlog (Firefox 22, MediaPad 7 Lite) 12 years ago Aaron Train [:aaronmt] 10.03 KB, text/plain		Details
Gecko traces from sendEventToGeckoSync failures 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 76.73 KB, text/plain		Details
Profile dump from jchen's build 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 9.62 KB, text/plain		Details
Prettified JSON profile dump 11 years ago (inactive) Jim Chen [:jchen] [:darchons] 11.98 KB, text/plain		Details
traces_org.mozilla.fennec.txt 11 years ago Aaron Train [:aaronmt] 66.38 KB, text/plain		Details