<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

(PTO June 19-July 7) Ryan VanderMeulen

Comment 3

•

7 years ago

This crash is currently #6 overall nightly on Fennec.

Comment 4

•

7 years ago

Almost 1500 crashes in the last week on 59.0.2. Is it possible for someone to take another look at this?

status-firefox59: --- → wontfix

status-firefox60: --- → fix-optional

status-firefox61: --- → affected

status-firefox-esr52: --- → unaffected

Flags: needinfo?(sdaswani)

Petru-Mugurel Lingurar [:petru]

Comment 5

•

7 years ago

Petru, can one of you look at this ASAP? I think it may land in 59 or 60 if a fix is found.

Flags: needinfo?(sdaswani) → needinfo?(petru.lingurar)

Whiteboard: [Leanplum][61]

Comment 6

•

7 years ago

Things I found: - the crashes started appearing December 26th 2017 from [1] although no recent changes seems to have been made to that file prior to the crashes. - before the line where the crash occurs the CreateContext()[2] method is executed, but that method hasn't been modified in more than a year prior to the crashes - from the gathered crash reports there doesn't seem to be a clear scenario in which this crash would occur although there are a few situations which appeared appeared in 2-3 reports - try to do a search/tap the address bar - open the app in multi-window mode [1] https://hg.mozilla.org/releases/mozilla-release/annotate/d2e449c73dac/gfx/layers/opengl/CompositorOGL.cpp#l238 [2] https://hg.mozilla.org/releases/mozilla-release/annotate/d2e449c73dac/gfx/layers/opengl/CompositorOGL.cpp#l111

Flags: needinfo?(petru.lingurar)

(PTO June 19-July 7) Ryan VanderMeulen

Comment 7

•

7 years ago

Thanks Petru. Ryan, it looks like the crashes aren't related to a code change, per Petru's analysis. Do we have an idea if the crash is more prevalent on a set of devices or OS versions?

Flags: needinfo?(ryanvm)

Comment 8

•

7 years ago

Petru's analysis only reflects the crash data purge in late December. Note that the bug was filed a month prior in November.

Flags: needinfo?(ryanvm)

Petru-Mugurel Lingurar [:petru]

Comment 9

•

7 years ago

Ah, I wasn't away of the 'purge'. Petru can you spend some time trying to repro?

Flags: needinfo?(petru.lingurar)

Comment 10

•

7 years ago

Indeed, the first crash appeared in September [1] but only in the last few months they've become more prevalent. Trying to reproduce based on the few comments in the crash reports (tapping in the address bar, multi-window), nothing yet. [1] First crash on 57.0b3 - https://crash-stats.mozilla.com/report/index/1eaed915-4848-4e95-b324-488e10180120

Flags: needinfo?(petru.lingurar)

Updated

•

7 years ago

Whiteboard: [Leanplum][61] → --do_not_change--[priority:high]

Kevin Brosnan [Ex-Mozilla]

Comment 11

•

7 years ago

Marcia is this still a frequent crasher?

Flags: needinfo?(mozillamarcia.knous)

Comment 12

•

7 years ago

Looking at the affected devices I am seeing mostly emulator devices. Common ways to detect this are known ARM devices running on x86 and mentions of emulator or VMWare. unknown AOSP on ARM Emulator 18 (REL) armeabi-v7a 36 5.5% samsung SM-G960F 22 (REL) x86 25 3.8% zte Z982 22 (REL) x86 24 3.7% samsung SM-G950F 26 (REL) armeabi-v7a 18 2.7% gmbh VirtualBox 19 (REL) x86 16 2.4% innotek VirtualBox 19 (REL) x86 16 2.4% samsung GT-P5210 19 (REL) x86 15 2.3% oppo A37f 22 (REL) x86 14 2.1% lge Nexus 5X 27 (REL) armeabi-v7a 13 2.0% inc VMware Virtual Platform 19 (REL) x86 12 1.8% vmware VMware Virtual Platform 19 (REL) x86 12 1.8% samsung SM-A520F 22 (REL) x86 11 1.7%

Comment 13

•

7 years ago

As Kevin notes there is a mix of emulator devices in the recent 62 data (including betas). One of the top crashing devices is SM-G965U, which is the Samsung Galaxy S9. There aren't many URLs to try to reproduce. I think we wait and see how this plays out in 62 volume since we just shipped, and we can reevaluate at a later time.

Flags: needinfo?(mozillamarcia.knous)

Comment 14

•

7 years ago

Volume in 62.0.1 is pretty low so far - 193 crashes so far.

Updated

•

7 years ago

Whiteboard: --do_not_change--[priority:high] → [priority:low]

Comment 15

•

7 years ago

Updating affected branches. While there are some emulator devices, it appears as if a fair amount of regular devices crash as well. Volume is relatively low on 62/63/64.

status-firefox62: --- → affected

status-firefox63: --- → affected

status-firefox64: --- → affected

Updated

•

7 years ago

Priority: -- → P1

Comment 16

•

6 years ago

Adding 65/66 as affected. Currently on 66 nightly this is the top 6 crashes.

status-firefox65: --- → affected

status-firefox66: --- → affected

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Updated

•

6 years ago

status-firefox67: --- → affected

Pascal Chevrel:pascalc

Comment 17

•

6 years ago

Tracking for 67 as it spiked on Nightly over the last few days. James, could we have somebody investigate what worsened the situation since Feb 22. Thanks

status-firefox60: fix-optional → wontfix

status-firefox61: affected → wontfix

status-firefox62: affected → wontfix

status-firefox63: affected → wontfix

status-firefox64: affected → wontfix

tracking-firefox67: --- → +

Flags: needinfo?(snorp)

Comment 18

•

6 years ago

Recent crashes appear to be a MOZ_CRASH() where we fail to create a GLContext for the compositor. I found the following messages from several logcats:

02-26 10:26:27.200 17406 18095 I Gecko : Attempting load of libEGL.so
02-26 10:26:27.220 17406 18095 I Gecko : [GFX1]: Flushing glGetError still 0x40514048 after 100 calls.
...
02-26 10:26:27.510 17406 18173 I Gecko : [GFX1]: Flushing glGetError still 0x40514048 after 100 calls.
02-26 10:26:27.510 17406 18173 I Gecko : [GFX1-]: Failed to create EGLContext!

That error value makes no sense to me and looks a lot like a pointer address, so not sure what's going on.

Flags: needinfo?(snorp) → needinfo?(jgilbert)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 19

•

6 years ago

Moving this to GFX since it's clearly GLContext/Compositor stuff.

Component: Widget: Android → Graphics

Kevin Brosnan [Ex-Mozilla]

Assignee

Comment 20

•

6 years ago

It's not a pointer:
https://searchfox.org/mozilla-central/rev/dbddac86aadf1d4871fb350bbe66db43728a9f81/gfx/gl/GLContext.cpp#2766

(E)GL library loading changed recently, so this might be fallout from that: Bug 1528396

Snorp, were these local logcats? If so, on what device(s), and is there an STR?

Flags: needinfo?(jgilbert) → needinfo?(snorp)

Comment 21

•

6 years ago

A significant number of the devices are emulators. see comment 12

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 22

•

6 years ago

It doesn't happen locally for me, I got the logs from crash-stats: https://crash-stats.mozilla.com/report/index/adbf502b-212c-4d54-ac20-b9f520190226#tab-metadata

Flags: needinfo?(snorp)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 23

•

6 years ago

(In reply to Jeff Gilbert [:jgilbert] from comment #20)

It's not a pointer:
https://searchfox.org/mozilla-central/rev/dbddac86aadf1d4871fb350bbe66db43728a9f81/gfx/gl/GLContext.cpp#2766

Yeah, but wtf is 0x40514048? That's not a known error AFAICT? And the value differs in reports.

Assignee

Comment 24

•

6 years ago

Huh, nooo idea. None of those numbers are GLenums.

Assignee

Comment 25

•

6 years ago

Here's isolated to the spiking build IDs:
https://crash-stats.mozilla.com/signature/?product=FennecAndroid&build_id=%3E%3D20190225102402&signature=mozilla%3A%3Alayers%3A%3ACompositorOGL%3A%3AInitialize&date=%3E%3D2019-02-19T23%3A01%3A00.000Z&date=%3C2019-02-26T23%3A01%3A00.000Z#aggregations

1   GT-I8552B                  25   6.25 %
2   SO-04E                     18   4.50 %
3   SHV-E300K                  17   4.25 %
4   Vodafone Smart Tab III 10  16   4.00 %
5   GT-I9100                   15   3.75 %
6   AOSP on ARM Emulator       12   3.00 %
7   M4 SS4040                  12   3.00 %
8   GT-P5110                   10   2.50 %
9   PSP5307DUO                 10   2.50 %
10  V865M                      10   2.50 %

There are multiple runs of users trying to restart the browser and accumulating crashes. (eesh, sorry all :( )

Interestingly that pointer-looking value changes across users, but is consistent across runs per-user.

Assignee

Comment 26

•

6 years ago

It's crazy bizarre for glGetError to return any value like that. It's like we're hitting some shim and it's immediately returning to us, giving us a pointer-like value that happened to already be on the stack.

I've re-vetted our EGL loading code, and the only thing I can think of is that we try eglGetProcAddress before we try to load from the library, which is the opposite order of what we used to do. (and we no longer try to load from the process)

EGL <= 1.4:

eglGetProcAddress may not be queried for core (non-extension) functions in EGL or client APIs.

EGL >= 1.5:

eglGetProcAddress may be queried for all EGL and client API functions supported by the implementation (whether those functions are extensions or not, and whether they are supported by the current client API context or not).

If EGL 1.4's eglGetProcAddress is, on some hardware, giving us a pfn to a dummy thunk that logs an error somewhere and returns void, we might get this behavior.

The top three devices (all I checked) are all circa 2013, which does predate EGL 1.5 (2014).

Thing is, I think we want to prefer to use (egl/wgl)GetProcAddress first before trying to dlsym from the library, but maybe we can try to reverse this?

All told, this is a relatively small number of long-tail crashes.

Assignee: nobody → jgilbert

Severity: critical → major

Depends on: 1528396

https://hg.mozilla.org/mozilla-central/rev/b437ff8ed47c

Assignee

Comment 27

•

6 years ago

Attached file Bug 1420745 - dlsym from lib before wsiGetProcAddress. — Details

Pulsebot

Comment 28

•

6 years ago

Pushed by jgilbert@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b437ff8ed47c dlsym from lib before wsiGetProcAddress. r=snorp

Cosmin Sabou [:CosminS]

Comment 29

•

6 years ago

bugherder

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox67: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla67

(PTO June 19-July 7) Ryan VanderMeulen

Assignee

Comment 30

•

6 years ago

I'll check back tomorrow, but 11 hours in, 1 crash from 1 reporter is encouraging!

Flags: needinfo?(jgilbert)

Comment 31

•

6 years ago

Is this something we should consider backporting to Beta for Fennec 66?

status-firefox65: affected → wontfix

status-firefox-esr60: --- → unaffected

Assignee

Comment 32

•

6 years ago

So we sort of commandeered this bug for this crash spike, which is actually a different bug that is 67-only.
I'll duplicate this bug so we keep tracking the low-volume crash bug.

Assignee

Updated

•

6 years ago

Blocks: 1532456

Assignee

Updated

•

6 years ago

Flags: needinfo?(jgilbert)

Summary: Crash in mozilla::layers::CompositorOGL::Initialize → (spike in 67 of) Crash in mozilla::layers::CompositorOGL::Initialize

Assignee

Updated

•

6 years ago

status-firefox66: affected → unaffected