Closed Bug 737949 Opened 8 years ago Closed 8 years ago

Remove GLThread, don't initialize EGL from Java code

Categories

(Firefox for Android :: General, defect, critical)

14 Branch
ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Firefox 14

People

(Reporter: scoobidiver, Assigned: joe)

References

Details

(Keywords: crash, regression, topcrash, Whiteboard: [native-crash][gfx][startupcrash])

Crash Data

Attachments

(1 file, 2 obsolete files)

It first appeared in 14.0a1/20120319 and occurs on Galaxy Nexus at startup.
The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e94edfdb1f5b&tochange=58a2cd0203ee

Signature 	libGLESv2_POWERVR_SGX540_120.so@0xb8ae More Reports Search
UUID	b2640624-477f-494e-9317-122952120321
Date Processed	2012-03-21 13:26:39
Uptime	2
Last Crash	2.5 hours before submission
Install Age	1.9 hours since version was first installed.
Install Time	2012-03-21 11:34:33
Product	FennecAndroid
Version	14.0a1
Build ID	20120321031151
Release Channel	nightly
OS	Linux
OS Version	0.0.0 Linux 3.0.8-g8e0f3bc #1 SMP PREEMPT Wed Dec 14 18:52:51 PST 2011 armv7l
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0x0
App Notes 	
EGL? EGL+ AdapterVendorID: tuna, AdapterDeviceID: Galaxy Nexus.
AdapterDescription: 'Android, Model: 'Galaxy Nexus', Product: 'yakju', Manufacturer: 'samsung', Hardware: 'tuna''.
samsung Galaxy Nexus
google/yakju/maguro:4.0.1/ITL41F/228551:user/release-keys
EMCheckCompatibility	True

Frame 	Module 	Signature [Expand] 	Source
0 	libc.so 	libc.so@0xdbac 	
1 	libGLESv2_POWERVR_SGX540_120.so 	libGLESv2_POWERVR_SGX540_120.so@0xb8ae 	
2 	libGLESv2_POWERVR_SGX540_120.so 	libGLESv2_POWERVR_SGX540_120.so@0xda66 	
3 	libGLESv2_POWERVR_SGX540_120.so 	libGLESv2_POWERVR_SGX540_120.so@0x5a38e 	
4 	libGLESv2_POWERVR_SGX540_120.so 	libGLESv2_POWERVR_SGX540_120.so@0xebc6 	
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=libGLESv2_POWERVR_SGX540_120.so%400xb8ae
Bug 735353 is within libGLESv1_POWERVR_SGX540_120.so; this bug is wihin libGLESv2_POWERVR_SGX540_120.so
Crash Signature: [@ libGLESv2_POWERVR_SGX540_120.so@0xb8ae] → [@ libGLESv2_POWERVR_SGX540_120.so@0xb8ae] [@ memcpy | libGLESv2_POWERVR_SGX540_120.so@0x6df6 ] [@ libGLESv2_POWERVR_SGX540_120.so@0xb042] [@ memcpy | libGLESv2_POWERVR_SGX540_120.so@0x6e7e] [@ libGLESv2_POWERVR_SGX540_120.so@0xbefe] [@ libGLESv2_POW…
With all the dupes, it's no longer specific to Nexus.
Summary: crash in libGLESv2_POWERVR_SGX540_120 on Galaxy Nexus → crash in libGLESv2_POWERVR_SGX540_120
There are different versions of this DLL, but I don't think crashes in 0x1..., 0x6..., 0x7..., 0xa..., 0xb... are correlated.
If you think so, bug 736436 and bug 736666 are a dupe of this bug.
Depends on: 738188
libGLESv2_POWERVR_SGX540_120.so@0xebc6 is glDrawArrays, so this does indeed seem to be a dupe of Bug 736666 and Bug 737957.
(In reply to Ali Juma [:ajuma] from comment #7)
> libGLESv2_POWERVR_SGX540_120.so@0xebc6 is glDrawArrays
How do you know this so that we don't group unrelated signatures in a single bug or file duplicated bugs?
Crash Signature: [@ libGLESv2_POWERVR_SGX540_120.so@0xb8ae] [@ memcpy | libGLESv2_POWERVR_SGX540_120.so@0x6df6 ] [@ libGLESv2_POWERVR_SGX540_120.so@0xb042] [@ memcpy | libGLESv2_POWERVR_SGX540_120.so@0x6e7e] [@ libGLESv2_POWERVR_SGX540_120.so@0xbefe] [@ libGLESv2_POW… → [@ libGLESv2_POWERVR_SGX540_120.so@0xb8ae]
Summary: crash in libGLESv2_POWERVR_SGX540_120 → crash in libGLESv2_POWERVR_SGX540_120@0xb8ae
No longer depends on: 738188
(In reply to Scoobidiver from comment #8)
> (In reply to Ali Juma [:ajuma] from comment #7)
> > libGLESv2_POWERVR_SGX540_120.so@0xebc6 is glDrawArrays
> How do you know this so that we don't group unrelated signatures in a single
> bug or file duplicated bugs?

I have a local copy of drivers from some devices, and I'm using addr2line to map addresses to function names.
I get a similar stacktrace with this, I filed bug 736666 for it.
Keywords: topcrash
Assignee: nobody → joe
This is in glClear, apparently.

joe@ubuntu:~/moz-gdb/lib/0149A4550C01701E/system/lib$ ~/android-ndk-r5c/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-addr2line -C -f -e libGLESv2_POWERVR_SGX540_120.so 0xb8ae
glClear
??:0
Summary: crash in libGLESv2_POWERVR_SGX540_120@0xb8ae → crash in glClear
Summary: crash in glClear → startup crash in glClear called from java code
Ah, it's probably at some random address that happens to be near glClear - we have no such calls in our codebase. However, up on the callstack there's glDrawElements, which seems much more likely. However, we still don't have any calls to glDrawElements in our code. Maybe it's being called by general Android code?

STR desperately needed.
Summary: startup crash in glClear called from java code → startup crash under glDrawElements called from java code
Keywords: qawanted
In an effort to reduce the number of moving parts (and event queues to manage), I've tried removing GLThread, instead running all the GLController methods directly from within FlexibleGLSurfaceView. This has lead to a crash, unfortunately.

The compositor thread, when spinning up, asks the Java code through JNI for an EGLSurface (GLController::provideEGLSurface). From within this, eglCreateWindowSurface is throwing an error, with the following in adb lolcat:

E/GeckoGLController( 2190): provideEGLSurface
E/SurfaceTexture(  117): [SurfaceView] connect: already connected (cur=1, req=1)
E/libEGL  ( 2190): EGLNativeWindowType 0x1bc658 already connected to another API
E/libEGL  ( 2190): eglCreateWindowSurface:374 error 300b (EGL_BAD_NATIVE_WINDOW)

It seems that eglCreateWindowSurface doesn't want to create a surface for a native window that is already bound to a surface. However, I've verified that the call to provideEGLSurface happens after GLController::disposeGLContext is called, so I'm not sure what else needs to happen for the native window to be disconnected.
Does the fix I landed for bug 738188 fix this?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #14)
> Does the fix I landed for bug 738188 fix this?
There have been no crashes with this signature since 14.0a1/20120323031214, long before the patch in bug 738188 landed. So either it's fixed by another bug or Galaxy Nexus users gave up because of those startup crashes.
That may be also under other close libGLESv2_POWERVR_SGX540_120.so signatures for devices different from Galaxy Nexus.
Summary: startup crash under glDrawElements called from java code → Remove GLThread, don't initialize EGL from Java code
Attached patch initial stab at this (obsolete) — Splinter Review
This is an initial stab at this. It removes GLThread and stops initializing OpenGL from Java before Gecko starts up.
Attachment #610701 - Flags: feedback?(jmuizelaar)
Attachment #610701 - Flags: feedback?(bugmail.mozilla)
Attachment #610701 - Flags: feedback?(bgirard)
Attachment #610701 - Flags: feedback?(ajuma)
Comment on attachment 610701 [details] [diff] [review]
initial stab at this

Provided this doesn't regress much, I like it.
Attachment #610701 - Flags: feedback?(jmuizelaar) → feedback+
Comment on attachment 610701 [details] [diff] [review]
initial stab at this

Review of attachment 610701 [details] [diff] [review]:
-----------------------------------------------------------------

Hm, looks ok to me but I have no confidence in my ability to review this code. pcwalton would be a better reviewer, since he wrote the FlexibleGLSurfaceView.
Attachment #610701 - Flags: feedback?(bugmail.mozilla) → feedback+
Comment on attachment 610701 [details] [diff] [review]
initial stab at this

This makes our GL context handling much saner.
Attachment #610701 - Flags: feedback?(ajuma) → feedback+
Attachment #610701 - Flags: feedback?(bgirard) → feedback?(pwalton)
The reason that FlexibleGLSurfaceView works this way is so that it can render a pannable/zoomable screenshot maintained entirely in Java before Gecko comes up. This was a product directive from the beginning. If Gecko itself is responsible for bringing up GL in all cases, then this cannot be done.

I'm ok with removing that possibility personally (I don't think it helps much at this point), but I can't unilaterally approve it.
OK. Let me deal with product; if you can do feedback on the patch, or possibly an initial review, I'd appreciate it.
We do have Fennec drivers and Product approval for removing the code. We are communicating the decision outward. An initial feedback or review on the patch would be great.
Comment on attachment 610701 [details] [diff] [review]
initial stab at this

Review of attachment 610701 [details] [diff] [review]:
-----------------------------------------------------------------

f=me then.
Attachment #610701 - Flags: feedback?(pwalton) → feedback+
This also seems to fix the drawing of scrollbars and backgrounds.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #24)
> This also seems to fix the drawing of scrollbars and backgrounds.

This really shouldn't affect that I don't think... Are you sure about this? The scrollbars/backgrounds disappearing is intermittent - I sometimes see them, sometimes I don't, and it can depend on the page, scroll position, etc.
(In reply to Chris Lord [:cwiiis] from comment #25)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #24)
> > This also seems to fix the drawing of scrollbars and backgrounds.
> 
> This really shouldn't affect that I don't think... Are you sure about this?
> The scrollbars/backgrounds disappearing is intermittent - I sometimes see
> them, sometimes I don't, and it can depend on the page, scroll position, etc.

I am not sure.
Attached patch v1 (obsolete) — Splinter Review
Almost the same as the previous version of this patch, but with more unused code removed.
Attachment #610701 - Attachment is obsolete: true
Attachment #610935 - Flags: review?(pwalton)
Whiteboard: [native-crash][gfx][startupcrash] → [native-crash][gfx][startupcrash][autoland-try]
Whiteboard: [native-crash][gfx][startupcrash][autoland-try] → [native-crash][gfx][startupcrash][autoland-in-queue]
Comment on attachment 610935 [details] [diff] [review]
v1

Review of attachment 610935 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #610935 - Flags: review?(pwalton) → review+
Autoland Patchset:
	Patches: 610935
	Branch: mozilla-central => try
Patch 610935 could not be applied to mozilla-central.
patching file mobile/android/base/Makefile.in
Hunk #1 FAILED at 118
1 out of 1 hunks FAILED -- saving rejects to file mobile/android/base/Makefile.in.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir

Patch 610935 could not be applied to mozilla-central.
patching file mobile/android/base/Makefile.in
Hunk #1 FAILED at 118
1 out of 1 hunks FAILED -- saving rejects to file mobile/android/base/Makefile.in.rej
patching file mobile/android/base/gfx/FlexibleGLSurfaceView.java
Hunk #1 FAILED at 39
Hunk #2 FAILED at 56
Hunk #3 FAILED at 91
3 out of 3 hunks FAILED -- saving rejects to file mobile/android/base/gfx/FlexibleGLSurfaceView.java.rej
patching file mobile/android/base/gfx/GLController.java
Hunk #1 FAILED at 49
Hunk #2 FAILED at 74
Hunk #3 FAILED at 146
Hunk #4 FAILED at 236
4 out of 4 hunks FAILED -- saving rejects to file mobile/android/base/gfx/GLController.java.rej
unable to find 'mobile/android/base/gfx/GLThread.java' for patching
1 out of 1 hunks FAILED -- saving rejects to file mobile/android/base/gfx/GLThread.java.rej
patching file mobile/android/base/gfx/LayerView.java
Hunk #1 FAILED at 97
1 out of 1 hunks FAILED -- saving rejects to file mobile/android/base/gfx/LayerView.java.rej
patching file widget/android/AndroidBridge.cpp
Hunk #1 FAILED at 1104
1 out of 1 hunks FAILED -- saving rejects to file widget/android/AndroidBridge.cpp.rej
patching file widget/android/AndroidFlexViewWrapper.cpp
Hunk #1 FAILED at 45
Hunk #2 FAILED at 81
2 out of 2 hunks FAILED -- saving rejects to file widget/android/AndroidFlexViewWrapper.cpp.rej
patching file widget/android/AndroidFlexViewWrapper.h
Hunk #1 FAILED at 52
1 out of 1 hunks FAILED -- saving rejects to file widget/android/AndroidFlexViewWrapper.h.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir

Patchset could not be applied and pushed.
Whiteboard: [native-crash][gfx][startupcrash][autoland-in-queue] → [native-crash][gfx][startupcrash]
Blocks: 740898
Comment on attachment 610935 [details] [diff] [review]
v1

This makes it so the code from GLThread.renderFrame() is called unconditionally, as though mGLThread != null. We don't want to call it at all.
Attachment #610935 - Flags: review-
Attached patch v2Splinter Review
Jeff pointed out that I had moved the contents of GLThread.requestRender into FlexibleGLSurfaceView.requestRender, but it was normally used only before the C++ compositor started up (and mGLThread was nulled out). The refactored code would, however, always be run, regardless of whether the C++ compositor was started up.

Since we've made the decision to stop using OpenGL from Java on startup, we don't need the GLThread.requestRender code at all, so I've removed it.

This is also rebased on current m-c trunk.

Carrying forward r=pcwalton, and adding r=jrmuizel.
Attachment #610935 - Attachment is obsolete: true
Attachment #611037 - Flags: review+
Try run for 320256571f7a is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=320256571f7a
Results (out of 237 total builds):
    exception: 1
    success: 214
    warnings: 22
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jdrew@mozilla.com-320256571f7a
https://hg.mozilla.org/mozilla-central/rev/84463b2ba5ff
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 14
Keywords: qawanted
Depends on: 741166
You need to log in before you can comment on or make changes to this bug.