crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE or libGLES_rhea on ICS because gecko is too busy in reflow

RESOLVED DUPLICATE of bug 925608

Status

()

--
critical
RESOLVED DUPLICATE of bug 925608
5 years ago
4 years ago

People

(Reporter: scoobidiver, Assigned: kats)

Tracking

(4 keywords)

22 Branch
ARM
Android
crash, regression, reproducible, topcrash-android-armv7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox21 unaffected, firefox22+ wontfix, firefox23+ wontfix, firefox24 affected, firefox25 affected, fennec+)

Details

(Whiteboard: [native-crash][summary in comment 41], crash signature)

Attachments

(5 attachments)

(Reporter)

Description

5 years ago
It's #19 crasher in 22.0a2.
It first showed in 22.0a1/20130223, was discontinuous across builds and stopped after 23.0a1/20130406.

Signature 	libEGL_VIVANTE.so@0x5d0c More Reports Search
UUID	85103878-bf84-481e-91ab-437bb2130417
Date Processed	2013-04-17 10:15:21
Uptime	527
Install Age	23.2 hours since version was first installed.
Install Time	2013-04-16 11:04:32
Product	FennecAndroid
Version	22.0a2
Build ID	20130416004017
Release Channel	aurora
OS	Android
OS Version	0.0.0 Linux 3.0.8+ #497 PREEMPT Thu Aug 30 12:01:31 CST 2012 armv7l Android/rk29sdk/rk29sdk:4.0.4/IMM76D/20120823.112042:user/release-keys
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0x5fef0b50
App Notes 	
AdapterDescription: 'Vivante Corporation -- GC800 core -- OpenGL ES 2.0 -- Model: Full AOSP on Rk29sdk, Product: rk29sdk, Manufacturer: unknown, Hardware: rk29board'
EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ Stagefright? Stagefright- 
unknown Full AOSP on Rk29sdk
Android/rk29sdk/rk29sdk:4.0.4/IMM76D/20120823.112042:user/release-keys
Processor Notes 	sp-processor09.phx1.mozilla.com_17411:2012; exploitability tool failed: 127
EMCheckCompatibility	True
Adapter Vendor ID	Vivante Corporation
Adapter Device ID	GC800 core
Device	unknown Full AOSP on Rk29sdk
Android API Version	15 (REL)
Android CPU ABI	armeabi-v7a

Frame 	Module 	Signature 	Source
0 	libEGL_VIVANTE.so 	libEGL_VIVANTE.so@0x5d0c 	
1 	libEGL.so 	libEGL.so@0x230a6 	
2 	libEGL.so 	libEGL.so@0xb137 	
3 	libEGL.so 	libEGL.so@0x23056 	
4 	libEGL_VIVANTE.so 	libEGL_VIVANTE.so@0x5c86 	
5 	libEGL.so 	libEGL.so@0xd1bb 	
6 	libEGL.so 	libEGL.so@0xd08f 	
7 	libxul.so 	mozilla::gl::GLContextEGL::MakeCurrentImpl 	gfx/gl/GLLibraryEGL.h:164
8 	libxul.so 	mozilla::layers::LayerManagerOGL::MakeCurrent 	obj-firefox/dist/include/GLContext.h:185
9 	libxul.so 	mozilla::layers::LayerManagerOGL::Render 	gfx/layers/opengl/LayerManagerOGL.cpp:1075

More reports at:
https://crash-stats.mozilla.com/report/list?signature=libEGL_VIVANTE.so%400x5d0c
(Reporter)

Comment 1

5 years ago
With combined signatures, it's #4 top crasher in 22.0a2 and #22 in 23.0a1.

More reports at:
https://crash-stats.mozilla.com/query/query?product=FennecAndroid&query_search=signature&query_type=contains&query=libEGL_VIVANTE.so%400x5&do_query=1
Crash Signature: [@ libEGL_VIVANTE.so@0x5d0c] → [@ libEGL_VIVANTE.so@0x5d0c] [@ libEGL_VIVANTE.so@0x57c8] [@ libEGL_VIVANTE.so@0x5be8] [@ libEGL_VIVANTE.so@0x5384] [@ libEGL_VIVANTE.so@0x57bc] [@ libEGL_VIVANTE.so@0x5ed4] [@ libEGL_VIVANTE.so@0x5720] [@ libEGL_VIVANTE.so@0x5d20] [@ libEGL_VIVAN…
status-firefox23: unaffected → affected
tracking-firefox22: --- → ?
Keywords: topcrash
Summary: crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE.so@0x5d0c with Vivante GC800 core and rk29board hw running ICS → crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE.so@0x5... with Vivante GC400 or GC800 core running ICS
(Reporter)

Updated

5 years ago
Crash Signature: [@ libEGL_VIVANTE.so@0x5d0c] [@ libEGL_VIVANTE.so@0x57c8] [@ libEGL_VIVANTE.so@0x5be8] [@ libEGL_VIVANTE.so@0x5384] [@ libEGL_VIVANTE.so@0x57bc] [@ libEGL_VIVANTE.so@0x5ed4] [@ libEGL_VIVANTE.so@0x5720] [@ libEGL_VIVANTE.so@0x5d20] [@ libEGL_VIVAN… → [@ libEGL_VIVANTE.so@0x5d0c] [@ libEGL_VIVANTE.so@0x57c8] [@ libEGL_VIVANTE.so@0x5be8] [@ libEGL_VIVANTE.so@0x5384] [@ libEGL_VIVANTE.so@0x57bc] [@ libEGL_VIVANTE.so@0x5ed4] [@ libEGL_VIVANTE.so@0x5720] [@ libEGL_VIVANTE.so@0x5d20] [@ libEGL_VIVAN…

Updated

5 years ago
Keywords: needURLs, regressionwindow-wanted, steps-wanted
Adding needsinfo on :Kairo for urls co-relations which can help QA.
Flags: needinfo?(kairo)
Device names?

Comment 4

5 years ago
URLs: none interesting, some about:home, some about:blank

Devices:
last week of Nightly:
libEGL_VIVANTE.so@0x5be8 	6
Unknown AN9G2I
last week of Aurora:
libEGL_VIVANTE.so@0x5414 	2
Unknown NAM805HCX 	2
previous week of Nightly:
libEGL_VIVANTE.so@0x5ee8 	3
Penta Penta WS802C 	3
previous week of Aurora:
libEGL_VIVANTE.so@0x5d0c 	4
3-Q RC9716B 	4
libEGL_VIVANTE.so@0x5ed4 	2
HUAWEI MediaPad 7 Lite 	2
libEGL_VIVANTE.so@0x5481 	1
HUAWEI MediaPad 10 FHD 	1


All in all, I suspect that this may not really be worth investigating too much.
Flags: needinfo?(kairo)
These look to be Huawei and small market player Android tablet devices. If this is fixed on trunk I would let the fix just ride the trains. Acquiring affected devices will likely prove difficult.
tracking-fennec: --- → ?
Keywords: needURLs
(Reporter)

Comment 6

5 years ago
(In reply to Kevin Brosnan [:kbrosnan] from comment #5)
> If this is fixed on trunk
That's the point. It's not fixed in the trunk.
I agree information is incomplete. The only crashes we have on FxA 23a1 over the last 4 weeks are from the beginning of the month. This suggests that the issue may have been fixed by some other code base change. We would know more when 23 goes to Aurora in the second week of May.
(Reporter)

Comment 8

5 years ago
(In reply to Kevin Brosnan [:kbrosnan] from comment #7)
> The only crashes we have on FxA 23a1 over the last 4 weeks are from the beginning of
> the month.
It's wrong. There are 30 crashes in 23.0a1 over the last four weeks and the latest one happened in April 28 build.
Ah I was looking at the @0x5d0c signature from comment 0.
Alex, I think we should just black list these devices and be done with it. What is the process for that now-a-days?
tracking-fennec: ? → 22+
Flags: needinfo?(akeybl)
(In reply to Brad Lassey [:blassey] from comment #10)
> Alex, I think we should just black list these devices and be done with it.
> What is the process for that now-a-days?

gfx or device?
Flags: needinfo?(akeybl) → needinfo?(blassey.bugs)
I was thinking that we should black list the devices in the play store.
Flags: needinfo?(blassey.bugs)
(Reporter)

Comment 13

5 years ago
(In reply to Brad Lassey [:blassey] from comment #12)
> I was thinking that we should black list the devices in the play store.
Before doing that, we should wait 22.0 go to Beta as Aurora and Nightly not representative for device specific crashes.
Even still, it's not clear that we should block the device (the device should be pretty unusable to block).
Scoobi beyond the devices, is this the similar bug 863307 ?  The other one looks like ICS as well.  I think I might be missing something.
(Reporter)

Comment 16

5 years ago
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #15)
> Scoobi beyond the devices, is this the similar bug 863307 ?  The other one
> looks like ICS as well.  I think I might be missing something.
They are similar and likely related but no necessarily duplicates because a 0x2000 address shift in libEGL_VIVANTE.so is important and can't be explained by the library version breakdown.
This one is 100% correlated to Vivante GC400/GC800 GPUs.
(Reporter)

Comment 17

5 years ago
It might related to bug 848810.

Updated

5 years ago
Whiteboard: [native-crash] → [native-crash][waiting on followup to comment 13 before tracking]
(Reporter)

Updated

5 years ago
status-firefox24: --- → affected
Since we are going to go with blocking the devices kairo can you get a list of devices that crash on libEGL_VIVANTE.so@0x5
Flags: needinfo?(kairo)
Keywords: regressionwindow-wanted
VIMICRO MID
HUAWEI MediaPad 7 Lite
Mediacom Xteam Smartpad 810c
Unknown AN7DG3
ViewSonic ViewPad 10e
HUAWEI U9508
AIRIS OnePAD 970
AOC MW0710
Archos 97 CARBON
HKC P771A
MSI Enjoy 10 PLUS
Penta Penta WS802C
Unknown 720F
Unknown AN10BG3
Unknown CTC07SO
Unknown DEM752HCF
Unknown Full AOSP on Rk29sdk
Unknown HFM752HCF
Unknown INTELLIPAD
Unknown K8GT_H
Unknown MID
Unknown MW0711
Unknown NEXT
Unknown PMP3370B
Unknown PMP5080CPRO
Unknown PMP5097CPRO
Unknown Q3M752HC
Unknown QPAD C-0700111
Unknown S800
Unknown STM1007HD
Unknown TR720F
Unknown miTab FUNK
WEXLER TAB 7i 3G
Aaron how does the above device list match devices listed in the Play Store? Block them right away?
Flags: needinfo?(kairo) → needinfo?(aaron.train)

Comment 21

5 years ago
blassey, is this block a permanent solution and we think it's ok because these are not an important target? Or should there be a separate followup bug on actually fixing this and reenabling these devices?
Flags: needinfo?(blassey.bugs)
(Reporter)

Comment 22

5 years ago
Follow-up of comment 13:
It's #5 top crasher in 22.0b1, #16 in 23.0a2, and #15 in 24.0a1.
(In reply to Kevin Brosnan [:kbrosnan] from comment #20)
> Aaron how does the above device list match devices listed in the Play Store?
> Block them right away?

The following can be blocked on Google Play directly:

> HUAWEI MediaPad 7 Lite
> HUAWEI U9508
> AOC MW0710
> Archos 97 CARBON
> HKC P771A
> Unknown MID
> Unknown MW0711
> Unknown PMP3370B
> Unknown PMP5080CPRO
> Unknown PMP5097CPRO
Flags: needinfo?(aaron.train)
My understanding was that these devices are unusable, which warranted blocking the play store. I asked Aaron to order one. need-info to him to tell us how usable the devices are.
Flags: needinfo?(blassey.bugs) → needinfo?(aaron.train)

Comment 25

5 years ago
You mean they're unusable because of this crash? I don't think we know that because we know that while it's a common crash, we don't know whether it always crashes or just sometimes.
(In reply to Aaron Train [:aaronmt] from comment #23)
> > HUAWEI MediaPad 7 Lite

On order RITM0016641.
Flags: needinfo?(aaron.train)

Updated

5 years ago
QA Contact: aaron.train
We'll wait for the results of a Aaron's repro (if possible to repro).
tracking-firefox22: ? → +
Duplicate of this bug: 863307
Duplicate of this bug: 877254
(Reporter)

Updated

5 years ago
Crash Signature: [@ libEGL_VIVANTE.so@0x5d0c] [@ libEGL_VIVANTE.so@0x57c8] [@ libEGL_VIVANTE.so@0x5be8] [@ libEGL_VIVANTE.so@0x5384] [@ libEGL_VIVANTE.so@0x57bc] [@ libEGL_VIVANTE.so@0x5ed4] [@ libEGL_VIVANTE.so@0x5720] [@ libEGL_VIVANTE.so@0x5d20] [@ libEGL_VIVAN… → [@ libEGL_VIVANTE.so@0x5d0c] [@ libEGL_VIVANTE.so@0x57c8] [@ libEGL_VIVANTE.so@0x5be8] [@ libEGL_VIVANTE.so@0x5384] [@ libEGL_VIVANTE.so@0x57bc] [@ libEGL_VIVANTE.so@0x5ed4] [@ libEGL_VIVANTE.so@0x5720] [@ libEGL_VIVANTE.so@0x5d20] [@ libEGL_VIVAN…
I have the MediaPad 7 Lite now (RK29board), investigating.
Created attachment 758091 [details]
Rawlog (Firefox 22, MediaPad 7 Lite)

I have hit this crash once but am unsure what exactly I did. Attempts to reproduce again are yielding the following in my attachment.

So far, Firefox is certainly usable on these devices.
The log from comment 31 shows that the gecko thread is busy when the Java UI thread goes to send it a synchronous message. This belongs to a class of bugs we've seen before, and for which I would like bug 863777 to be landed to get more data (specifically to get a gecko stack when this happens).
Depends on: 863777
I can reproduce this crash now; it involves output from above by loading a busy page while heading into the AwesomeScreen 

bp-27d5787d-2168-4ef5-984f-180f32130606
(Reporter)

Updated

5 years ago
Keywords: steps-wanted → reproducible
Whiteboard: [native-crash][waiting on followup to comment 13 before tracking] → [native-crash]
only
Total Count 	URL
2 	http://www.google.com.pk/
Created attachment 759361 [details]
Gecko traces from sendEventToGeckoSync failures

I reproduced the problem using Aaron's instructions on his tablet and grabbed a handful of gecko stack traces from when it happens. They seem to be in various different parts of the code. It seems like the page (online.wsj.com) is very stressful for Gecko and so it often takes 4+ seconds before it can respond to events from the Java UI thread.

Also to clarify what I said in comment 32, once we hit this condition (sendEventToGeckoSync failing due to timeout), the Java code and compositor are no longer in sync with respect to the state of the surface, and so GL-related crashes are not entirely unexpected. The alternative here is to make sendEventToGeckoSync not timeout, in which case we will get an Android ANR after 5 seconds. I don't know if that is preferable or not. See https://bugzilla.mozilla.org/show_bug.cgi?id=835356#c12 and onwards for an earlier discussion on this.
If needed, here are the Nightly builds that spit out a profile to logcat when Gecko event sync is taking too long. The output is in JSON and because logcat has a line length limit, the JSON is split into 2000 char blocks.

https://tbpl.mozilla.org/?tree=Try&rev=0b075022183c

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nchen@mozilla.com-0b075022183c/
Created attachment 759443 [details]
Profile dump from jchen's build

Using a modified version of the build jchen created I got the attached profile dump. I'm not sure how to interpret it though.
Kats - please set status-firefox22 back to affected if you think there's any chance we'll fix this before Monday's final Beta build.
Assignee: nobody → bugmail.mozilla
status-firefox22: affected → wontfix
tracking-firefox23: --- → +
That is unlikely, leaving as wontfix.

jchen, do you know how to interpret the dump in comment 37? Is there some way to read it in cleopatra?
Flags: needinfo?(nchen)
Crash Signature: [@ libEGL_VIVANTE.so@0x5d0c] [@ libEGL_VIVANTE.so@0x57c8] [@ libEGL_VIVANTE.so@0x5be8] [@ libEGL_VIVANTE.so@0x5384] [@ libEGL_VIVANTE.so@0x57bc] [@ libEGL_VIVANTE.so@0x5ed4] [@ libEGL_VIVANTE.so@0x5720] [@ libEGL_VIVANTE.so@0x5d20] [@ libEGL_VIVAN… → [@ libEGL_VIVANTE.so@0x33fe] [@ libEGL_VIVANTE.so@0x3402] [@ libEGL_VIVANTE.so@0x336e] [@ libEGL_VIVANTE.so@0x3374] [@ libEGL_VIVANTE.so@0x337c] [@ libEGL_VIVANTE.so@0x34b6] [@ libEGL_VIVANTE.so@0x3274] [@ libEGL_VIVANTE.so@0x320e] [@ libEGL_VIVAN…
Created attachment 763677 [details]
Prettified JSON profile dump

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #39)
> That is unlikely, leaving as wontfix.
> 
> jchen, do you know how to interpret the dump in comment 37? Is there some
> way to read it in cleopatra?

That build only takes one profile sample, and splits the JSON profile into lines, before logging the lines through logcat.

Here's the prettified JSON. You can see the Gecko thread stack there. I don't know how useful this is, but we can make the build take a multi-sample profile too if that's more useful.
Flags: needinfo?(nchen)
(Reporter)

Updated

5 years ago
Crash Signature: [@ libEGL_VIVANTE.so@0x33fe] [@ libEGL_VIVANTE.so@0x3402] [@ libEGL_VIVANTE.so@0x336e] [@ libEGL_VIVANTE.so@0x3374] [@ libEGL_VIVANTE.so@0x337c] [@ libEGL_VIVANTE.so@0x34b6] [@ libEGL_VIVANTE.so@0x3274] [@ libEGL_VIVANTE.so@0x320e] [@ libEGL_VIVAN… → [@ libEGL_VIVANTE.so@0x33fe] [@ libEGL_VIVANTE.so@0x3402] [@ libEGL_VIVANTE.so@0x3414] [@ libEGL_VIVANTE.so@0x34d0] [@ libEGL_VIVANTE.so@0x3442] [@ libEGL_VIVANTE.so@0x336e] [@ libEGL_VIVANTE.so@0x3374] [@ libEGL_VIVANTE.so@0x337c] [@ libEGL_VIVAN…
Summary: crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE.so@0x5... with Vivante GC400 or GC800 core running ICS → crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE or libGLES_rhea on ICS
Ok, thanks. More samples would be helpful but in this case it shows the same thing as the logs from comment 35.

In a nutshell: the page is taking a long time to reflow and paint, causing the gecko thread to be busy for long periods of time. This means when the Java UI thread needs to set up GL state (which blocks on the gecko thread) it can't do it in a timely manner. The UI thread aborts the wait after 4 seconds to prevent Android from ANR'ing the app, which results in invalid GL state and eventually this crash.
Assignee: bugmail.mozilla → nobody
Component: Graphics: Layers → Layout
Summary: crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE or libGLES_rhea on ICS → crash in mozilla::gl::GLContextEGL::MakeCurrentImpl @ libEGL_VIVANTE or libGLES_rhea on ICS because gecko is too busy in reflow
Whiteboard: [native-crash] → [native-crash][summary in comment 41]
Assignee: nobody → bugmail.mozilla
(Reporter)

Comment 42

5 years ago
It's #2 top crasher in the first hours of 22.0 (all devices, not only ARMv6 devices) and accounts for 6.7% of all crashes.

This bug and bug 845867 which is likely a dupe account for 14.3% of all crashes.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #41)
> In a nutshell: the page is taking a long time to reflow and paint
Reflow is not new so which feature is causing that in 22.0 and above?
Flags: needinfo?(bugmail.mozilla)
I looked over this again and don't really have much to add at this point. I agree that the spike in crashes in 22 is surprising but given the info we have so far I can't explain why that is. It'll be interesting to see if bug 887097 impacts this at all.
Depends on: 887097
Flags: needinfo?(bugmail.mozilla)
(Reporter)

Comment 44

5 years ago
(In reply to Aaron Train [:aaronmt] from comment #33)
> I can reproduce this crash now; it involves output from above by loading a
> busy page while heading into the AwesomeScreen 
Can you find out the regression range?
Keywords: regressionwindow-wanted
(Reporter)

Updated

5 years ago
status-firefox25: --- → affected
Aaron: anything new on comment 44?  Will be checking on this bug frequently as it's a suspected dupe for bug 845867 which is a topcrasher for ARMv6 22.0
Flags: needinfo?(aaron.train)
No. Triggering this crash with the associated signatures is not 100% reproducible; what I reported was actually a bit of a struggle to tickle the conditions to crash with this signature.
Flags: needinfo?(aaron.train)
(Reporter)

Updated

5 years ago
Crash Signature: [@ libEGL_VIVANTE.so@0x33fe] [@ libEGL_VIVANTE.so@0x3402] [@ libEGL_VIVANTE.so@0x3414] [@ libEGL_VIVANTE.so@0x34d0] [@ libEGL_VIVANTE.so@0x3442] [@ libEGL_VIVANTE.so@0x336e] [@ libEGL_VIVANTE.so@0x3374] [@ libEGL_VIVANTE.so@0x337c] [@ libEGL_VIVAN… → [@ libEGL_VIVANTE.so@0x33fe] [@ libEGL_VIVANTE.so@0x3402] [@ libEGL_VIVANTE.so@0x3414] [@ libEGL_VIVANTE.so@0x34d0] [@ libEGL_VIVANTE.so@0x3442] [@ libEGL_VIVANTE.so@0x336e] [@ libEGL_VIVANTE.so@0x3374] [@ libEGL_VIVANTE.so@0x337c] [@ libEGL_VIVAN…

Comment 47

5 years ago
If I see things correctly, then bug 887097 has landed in 23.0b2, but these signatures don't look like they have really diminished in that version.
status-firefox23: affected → wontfix
22+ ship has sailed. Need to re-triage this.
tracking-fennec: 22+ → ?
tracking-fennec: ? → +
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #41)
> Ok, thanks. More samples would be helpful but in this case it shows the same
> thing as the logs from comment 35.
> 
> In a nutshell: the page is taking a long time to reflow and paint, causing
> the gecko thread to be busy for long periods of time. This means when the
> Java UI thread needs to set up GL state (which blocks on the gecko thread)
> it can't do it in a timely manner. The UI thread aborts the wait after 4
> seconds to prevent Android from ANR'ing the app, which results in invalid GL
> state and eventually this crash.

Still a topcrash on Release, a little lower on Beta. bug 845867 is very similar and also remains a huge topcrash on ARMv6.

Triage's understanding is that the root cause is difficult to identify. Is there anything we can do to recover and not crash? For instance, stop reflow or painting when a certain condition is met?
Flags: needinfo?(bugmail.mozilla)
As Kairo said in comment 47, bug 887097 has landed but this crash is still happening which is somewhat unexpected to me. I would have expected the ANR to trigger if we get into a scenario where gecko is busy and the UI thread is blocked on it for 5+ seconds. And even if it doesn't, we no longer abort the gecko event sync so the compositor thread state should never get out of sync with the gecko thread state.

I think we need to go back to the device and try to reproduce this again on a recent nightly and see if the observable behaviour is the same. I would expect not - the ANR dialog should pop up instead. If it still crashes, the logging I recently added to nightly for bug 884047 should provide some additional insight into the problem.

Aaron, would you mind trying to repro this again on the Mediapad device with a recent nightly and attaching the logcat?
Flags: needinfo?(bugmail.mozilla)
topcrash is being replaced by more precise keywords per https://bugzilla.mozilla.org/show_bug.cgi?id=927557#c3
Keywords: topcrash → topcrash-android-armv7
For comment 50
Flags: needinfo?(aaron.train)
I have not hit a crash, but I am running into the following which attempts to write out a traces.txt but fails. Any idea how to correct this?

D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
e to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496
I/InputDispatcher(  679): Application is not responding: Window{41b9dbc8 org.mozilla.fennec/org.mozilla.fennec.App paused=false}.  5001.9ms since event, 5001.6ms since wait started
I/WindowManager(  679): Input event dispatching timed out sending to org.mozilla.fennec/org.mozilla.fennec.App
I/SystemProperties(  679): get key=events.cpu
I/SystemProperties(  679): get key=dalvik.vm.stack-trace-file,def=null
I/Process (  679): Sending signal. PID: 2958 SIG: 3
I/dalvikvm( 2958): threadid=3: reacting to signal 3
I/dalvikvm( 2958): Wrote stack traces to '/data/anr/traces.txt'
I/Process (  679): Sending signal. PID: 679 SIG: 3
I/dalvikvm(  679): threadid=3: reacting to signal 3
I/dalvikvm(  679): Wrote stack traces to '/data/anr/traces.txt'
I/Process (  679): Sending signal. PID: 751 SIG: 3
I/dalvikvm(  751): threadid=3: reacting to signal 3
I/dalvikvm(  751): Wrote stack traces to '/data/anr/traces.txt'
I/Process (  679): Sending signal. PID: 840 SIG: 3
I/dalvikvm(  840): threadid=3: reacting to signal 3
I/dalvikvm(  840): Wrote stack traces to '/data/anr/traces.txt'
I/Process (  679): Sending signal. PID: 850 SIG: 3
I/dalvikvm(  850): threadid=3: reacting to signal 3
I/dalvikvm(  850): Wrote stack traces to '/data/anr/traces.txt'
I/Process (  679): Sending signal. PID: 865 SIG: 3
I/dalvikvm(  865): threadid=3: reacting to signal 3
I/dalvikvm(  865): Wrote stack traces to '/data/anr/traces.txt'
D/dalvikvm(  679): GC_EXPLICIT freed 765K, 24% free 11545K/15011K, paused 11ms+8ms
I/Lights  (  679): >>> Enter set_buttons_light
D/GeckoLayerClient( 2958): Aborting draw due to resolution change: 1.0 != 0.61224496

How can I pull traces.txt?

E/ActivityManager(  679): Error reading /data/anr/traces.txt
E/ActivityManager(  679): java.io.FileNotFoundException: /data/anr/traces.txt: open failed: ENOENT (No such file or directory)
Flags: needinfo?(aaron.train)
Created attachment 8360463 [details]
traces_org.mozilla.fennec.txt

Actually it looks like it was written to traces_org.mozilla.fennec.txt - hopefully something helpful in here?
In that logcat it looks the android main thread is at:

 at com.google.android.gles_jni.EGLImpl.eglSwapBuffers(Native Method)
 at android.view.HardwareRenderer$GlRenderer.draw(HardwareRenderer.java:875)

which should be a pretty fast operation. I'm not sure why it would ANR there. :(

As for the "Aborting draw due to resolution change" messages - mostly that can be ignored but usually there shouldn't be a ton of them getting printed like that. NI to Cwiiis; maybe he can provide some insight on it.
Flags: needinfo?(chrislord.net)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #55)
> In that logcat it looks the android main thread is at:
> 
>  at com.google.android.gles_jni.EGLImpl.eglSwapBuffers(Native Method)
>  at android.view.HardwareRenderer$GlRenderer.draw(HardwareRenderer.java:875)
> 
> which should be a pretty fast operation. I'm not sure why it would ANR
> there. :(

I saw a hang similar to this in bug 935676. It was waiting on some mutex inside libEGL, but I don't see EGL appear anywhere else in this trace. The Compositor thread isn't listed for some reason?
Hm, good point. Maybe the compositor thread died somehow and so the main thread is left waiting for the mutex that will never be released?
The number of aborts is because on the first paint, we ignore if the front-end tries to abort - so that's likely just the number of transactions to do the paint and the front-end is trying to abort on each one. The other reason is that the zoom front-end side is incorrect on first-paint because setFirstPaintViewport isn't called *after* the corresponding paint (I think that was it anyway).
Flags: needinfo?(chrislord.net)
This crash seems to have dropped significantly in 28, possibly because of bug 925608. There's still a handful I see on crash-stats but I think we can take off the topcrash status?
Depends on: 925608
Flags: needinfo?(kairo)

Comment 60

5 years ago
Yes, that is as expected and it's really awesome, 28 looks like it will be one of the most stable releases we shipped so far. I'd mark this crash as a dupe of bug 925608, but feel free to resolve in other ways. :)
Flags: needinfo?(kairo)
With pleasure :)
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 925608
Keywords: regressionwindow-wanted
You need to log in before you can comment on or make changes to this bug.