Closed Bug 736123 Opened 8 years ago Closed 7 years ago

crash in delete_texture when using WebGL on Adreno GPU

Categories

(Firefox for Android :: General, defect, critical)

ARM
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
Firefox 15
Tracking Status
firefox14 --- verified
firefox15 --- verified
blocking-fennec1.0 --- +
fennec + ---

People

(Reporter: martijn.martijn, Assigned: bjacob)

References

()

Details

(4 keywords, Whiteboard: [gfx][native-crash][mobile-crash][sumo])

Crash Data

Attachments

(3 files, 2 obsolete files)

I was crashing on this WebGL demo site: http://www.ro.me/
Using the HTC Desire HD, using today's trunk build (with Maple merged in), Android 2.3.5.

Steps to reproduce:
- I visited the site from the link in this page: http://people.mozilla.org/~mwargers/tests/layout/layout_demos.htm
- I'm only seeing black and after a while, I get a 'not supported' page. In that case, I press the Android back button and tap again on the http://www.ro.me/ from the layout_demos page.

Usually, it crashes for me within 3 times on the HTC Desire HD.

This bug was filed from the Socorro interface and is 
report bp-cf2915be-4b88-408a-b7d6-2e4c92120315 .
============================================================= 
0 	libc.so 	__libc_android_abort 	
1 	libc.so 	dlfree 	
2 	libc.so 	free 	
3 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x6d9eb 	
4 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x65b27 	
5 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x65627 	
6 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x65b27 	
7 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x65b55 	
8 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x65b27 	
9 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x66dd3 	
10 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x1c380e 	
11 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x615cb 	
12 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x1c380e 	
13 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x6177b 	
14 	libGLESv2_adreno200.so 	libGLESv2_adreno200.so@0x640cf 	
15 	libEGL_adreno200.so 	libEGL_adreno200.so@0x1438e 	
16 	libEGL_adreno200.so 	libEGL_adreno200.so@0xedce 	
17 	libEGL_adreno200.so 	libEGL_adreno200.so@0x1438e 	
18 	libEGL_adreno200.so 	libEGL_adreno200.so@0x1438e 	
19 	libEGL_adreno200.so 	libEGL_adreno200.so@0xee56 	
20 	libEGL_adreno200.so 	libEGL_adreno200.so@0x1438e 	
21 	libEGL_adreno200.so 	libEGL_adreno200.so@0xef8a 	
22 	libgsl.so 	os_mutex_unlock 	
23 	libEGL_adreno200.so 	libEGL_adreno200.so@0xe85e 	
24 	libEGL_adreno200.so 	libEGL_adreno200.so@0x7746 	
25 	libEGL_adreno200.so 	libEGL_adreno200.so@0x1438e 	
26 	libEGL_adreno200.so 	libEGL_adreno200.so@0x10faa 	
27 	libEGL.so 	eglGetProcAddress 	
28 	libEGL_adreno200.so 	libEGL_adreno200.so@0x10fc6 	
29 	libEGL.so 	eglMakeCurrent 	
30 	libxul.so 	mozilla::gl::GLContextEGL::MakeCurrentImpl 	gfx/gl/GLContextProviderEGL.cpp:444
31 	libxul.so 	mozilla::gl::GLContext::MakeCurrent 	GLContext.h:620
32 	libxul.so 	mozilla::WebGLContext::DestroyResourcesAndContext 	content/canvas/src/WebGLContext.cpp:190
33 	libxul.so 	mozilla::WebGLContext::~WebGLContext 	content/canvas/src/WebGLContext.cpp:178
34 	libxul.so 	mozilla::WebGLContext::~WebGLContext 	content/canvas/src/WebGLContext.cpp:182
35 	libxul.so 	mozilla::WebGLContext::Release 	content/canvas/src/WebGLContext.cpp:1146
36 	libxul.so 	XPCJSRuntime::GCCallback 	js/xpconnect/src/XPCJSRuntime.cpp:622
37 	libxul.so 	js::GCSlice 	js/src/jsgc.cpp:3721
38 	libxul.so 	js::IncrementalGC 	js/src/jsfriendapi.cpp:158
39 	libxul.so 	nsXPConnect::Collect 	js/xpconnect/src/nsXPConnect.cpp:422
40 	libxul.so 	nsXPConnect::GarbageCollect 	js/xpconnect/src/nsXPConnect.cpp:432
41 	libxul.so 	nsJSContext::GarbageCollectNow 	dom/base/nsJSEnvironment.cpp:3059
42 	libxul.so 	GCTimerFired 	dom/base/nsJSEnvironment.cpp:3219
43 	libxul.so 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:508
44 	libxul.so 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:591
45 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
46 	libxul.so 	NS_ProcessNextEvent_P 	obj-firefox/xpcom/build/nsThreadUtils.cpp:245
47 	libxul.so 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:110
48 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:208
49 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:201
50 	libxul.so 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:189
Summary: crash in __libc_android_abort → crash in __libc_android_abort when using WebGL on adreno
tracking-fennec: --- → ?
blocking-fennec1.0: --- → ?
libGLESv2_adreno200.so@0x6d9eb is delete_texture.
Ok, I get this crash too while zooming in on the cube at http://get.webgl.org/ .
(In reply to Ali Juma [:ajuma] from comment #1)
> libGLESv2_adreno200.so@0x6d9eb is delete_texture.

Suggestion: check (using APItrace or a hacked MOZ_GL_DEBUG_VERBOSE) if we're deleting the same texture twice. OpenGL specifies that this should silently do nothing, but it's easy to guess that some drivers will crash in that case, doing a double free.
tracking-fennec: ? → +
blocking-fennec1.0: ? → beta+
Whiteboard: [gfx]
Benoit, we're either going to have to block WebGL on Adreno devices, or do some sort of workaround in which we change scripts' use of textures to only use 2^N and only use glTexImage (never glTexSubImage). I leave it up to you to decide which is more palatable!
Assignee: nobody → bjacob
The workarounds you describe sound awful; in particular they would require us to cache all texture images on our side, unless perhaps glGetTexImage is somehow available (and reliable) as a OpenGL ES extension on this driver. So, if the choice really boils down to either do such workarounds or block, then I would rather block.

However:
 1. please share the analysis leading you to think it's the only alternative
 2. If it comes to this, how about instead allowing WebGL but disabling glTexSubImage2D and rejecting NPOT textures? That would be non-conformant of course, but might be considered better than no WebGL at all. Perhaps if we know a list of WebGL demos/games we want to support, we could check if they would still run with this limited WebGL support.
(In reply to Benoit Jacob [:bjacob] (On vacation until March 18) from comment #5)
>  2. If it comes to this, how about instead allowing WebGL but disabling
> glTexSubImage2D and rejecting NPOT textures? That would be non-conformant of
> course, but might be considered better than no WebGL at all. Perhaps if we
> know a list of WebGL demos/games we want to support, we could check if they
> would still run with this limited WebGL support.

Ignore that part, sorry, that was stupid: it would break web pages that have a WebGL and a non-WebGL versions: these would rather fall back to the non-WebGL version, than have a non-correctly-rendering WebGL version.
Keywords: reproducible
Whiteboard: [gfx] → [gfx][native-crash]
Hardware: All → ARM
I was able to reproduce this bug on the latest Nightly build by going to http://webglsamples.googlecode.com/hg/aquarium/aquarium.html

Report:
https://crash-stats.mozilla.com/report/index/bp-c578c99a-3e4c-4620-919e-2fa112120316

--
Firefox 14.0a1 (2012-03-16)
Device: HTC Desire Z
OS: Android 2.3.3
Crash Signature: [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x6d9eb] → [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x6d9eb] [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x91843 ]
Summary: crash in __libc_android_abort when using WebGL on adreno → crash in delete_texture when using WebGL on Adreno GPU
Joe, are you able to respond to #6?
We know that glTexSubImage2D and (non-POT textures in general) are broken on older Adreno drivers. Therefore, we can't let scripts use them, because they'll either cause crashes or expose random texture memory. As you say, we shouldn't just disable the broken parts, because sites already have to have fallbacks for not supporting WebGL.

Therefore, we should block WebGL on Adreno GPUs.

Does that seem reasonable? Am I missing something else?
That's reasonable, yes. What I was missing is that you already know for a fact that glTexSubImage2D is broken. That wasn't obvious from the title and first comments in this bug, which pointed more to texture deletion.
I'm not sure if glTexSubImage2D is broken in its own right, or if it's just SubImage uploads with NPOT textures.
(In reply to Joe Drew (:JOEDREW!) from comment #9)
> We know that glTexSubImage2D and (non-POT textures in general) are broken on
> older Adreno drivers.
> Therefore, we should block WebGL on Adreno GPUs.
Can't you use a WebGL blocklist in order that recent drivers don't be blocklisted, if you known the cutoff version/date?
(In reply to Scoobidiver from comment #12)
> Can't you use a WebGL blocklist in order that recent drivers don't be
> blocklisted, if you known the cutoff version/date?

We don't know the cutoff version and we can't get that information
(In reply to George Wright (:gw280) from comment #13)
> we can't get that information
We can collect that if the driver version is written in crash reports by monitoring crash stats during the Aurora and Beta phase as we did for Direct2D with the desktop version during 4.0pre.
(In reply to Scoobidiver from comment #14)
> (In reply to George Wright (:gw280) from comment #13)
> > we can't get that information
> We can collect that if the driver version is written in crash reports by
> monitoring crash stats during the Aurora and Beta phase as we did for
> Direct2D with the desktop version during 4.0pre.

We can whitelist specific versions, but we've been told there's no way for us to tell from a given version number if it's good or not (ie - there isn't a simple version number which we can check against to see if it's greater than for NPOT texture fixes).
Can we whitelist recent Adreno GPUs where we are sure they work?
We believe that the renderer string changes from (for example) "Adreno 205" to "Adreno (TM) 205" in newer versions of the driver, so AIUI we are implicitly whitelisting newer drivers.
blocking-fennec1.0: beta+ → +
Crash Signature: [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x6d9eb] [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x91843 ] → [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x6d9eb] [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x91843 ] [@ libEGL_adreno200.so@0xbbf4] [@ libEGL_adreno200.so@0xc240 ] [@ libEGL_adreno200.so@0xbc7c ] [@ li…
Interesting.  I believe I ran into this crash just by launching, then hitting the back button and going back into fennc/nightly.
So, it looks like the conclusion was to block older Adreno devices.  Do we move this to AMO for that?
No, we should do this in code.
Looping in product for awareness.
Keywords: relnote
Whiteboard: [gfx][native-crash] → [gfx][native-crash] sumo
I would like to do this, but, looking at
https://crash-stats.mozilla.com/report/index/bp-c578c99a-3e4c-4620-919e-2fa112120316
I don't see "Adreno" in the GfxInfo strings. Am I missing something? How are we supposed to check for these Adreno GPUs?
it just returns "Android" on my phone and in the crash reports here.
Attachment #618824 - Flags: review?(jmuizelaar)
Rather ugly ad-hoc blacklisting in WebGL initialization code, as we need the actual GL strings to do that blacklisting and we can't easily have them in GfxInfo.
Attachment #618825 - Flags: review?(joe)
Attachment #618825 - Flags: review?(joe) → review+
Whiteboard: [gfx][native-crash] sumo → [gfx][native-crash][mobile-crash][sumo]
Duplicate of this bug: 750215
Comment on attachment 618824 [details] [diff] [review]
kill GetVendor() and EGLUtils.h

Review of attachment 618824 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/android/GfxInfo.cpp
@@ +90,5 @@
>  
>  nsresult
>  GfxInfo::Init()
>  {
> +  mAdapterDescription.AssignLiteral("");

Add a comment about why this is here.
Attachment #618824 - Flags: review?(jmuizelaar) → review+
Comment on attachment 618824 [details] [diff] [review]
kill GetVendor() and EGLUtils.h

[Approval Request Comment]
Regression caused by (bug #): 
User impact if declined: just to keep aurora consistent with central so that the subsequent patches that I want to land on it, apply cleanly.
Testing completed (on m-c, etc.): inbound
Risk to taking this patch (and alternatives if risky): no risk
String changes made by this patch: none
Attachment #618824 - Flags: approval-mozilla-aurora?
Comment on attachment 618825 [details] [diff] [review]
blacklist Adreno renderers for WebGL

[Approval Request Comment]
Regression caused by (bug #): 
User impact if declined: crashes for users of adreno gpus
Testing completed (on m-c, etc.): inbound
Risk to taking this patch (and alternatives if risky): could get unhappy users if somehow the blacklisting is not needed for some driver version. Only affects WebGL.
String changes made by this patch: none
Attachment #618825 - Flags: approval-mozilla-aurora?
Target Milestone: Firefox 14 → Firefox 15
Whiteboard: [gfx][native-crash][mobile-crash][sumo] → [landed on inbound][gfx][native-crash][mobile-crash][sumo]
Attachment #618824 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #618825 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #31)
> I'm still crashing with an Aurora build that supposedly has this fix in it:
> ftp://ftp.mozilla.org/pub/mobile/tinderbox-builds/mozilla-aurora-android/
> 1335995418/
> Changeset here:
> https://hg.mozilla.org/releases/mozilla-aurora/rev/8a68244c078b
> 
> Crash id:
> https://crash-stats.mozilla.com/report/index/bp-c71dd6c9-8cb6-4734-8755-
> 940812120502
> This is on the HTC Desire HD, trying to load http://www.ro.me/

Strange, i ran the same device and same build as this, and i get no crashes.  instead, i get an expected WebGL not supported message in content.

http://i.imgur.com/Gplc6.png
In about:support on that Aurora build, it say WebGl Renderer false.

On regular trunk, I get Qualcomm -- Adreno 205 -- OpenGL ES 2.0 1403843
Anyone who is still crashing with these patches: please go to about:support and paste here the 'WebGL Renderer' line.

As Joe explained in Comment 17, newer driver versions report "Adreno (TM) 205" instead of "Adreno 205". We currently only check for the old strings, without the (TM). If you crash with a 'WebGL Renderer' string that has the (TM), that means that we have to blacklist these too. That will be a simple patch to gfx/gl/GLContext.cpp.
(In reply to Benoit Jacob [:bjacob] from comment #35)
> Anyone who is still crashing with these patches: please go to about:support
> and paste here the 'WebGL Renderer' line.

My HTC Desire HD shows "WebGL Renderer = false"   But i have not experienced the  crashing like comment 31 and comment 34.
(In reply to Tony Chung [:tchung] from comment #36)

> My HTC Desire HD shows "WebGL Renderer = false"   But i have not experienced
> the  crashing like comment 31 and comment 34.

I guess that it might be related to Bug 746330
Crash report from today's trunk build (which also contains the fix now):
https://crash-stats.mozilla.com/report/index/bp-deb191ac-bfd7-4fca-a5cf-5f71e2120503
0 	libc.so 	__libc_android_abort 	
1 	libc.so 	dlfree 	
2 	libc.so 	free 	
3 	libstdc++.so 	_ZdlPv 	
4 	libEGL.so 	_ZN7android33egl_get_image_for_current_contextEPv 	
5 	libEGL.so 	eglDestroyContext 	
6 	libEGL.so 	eglMakeCurrent 	
7 	libxul.so 	mozilla::gl::GLContextEGL::MakeCurrentImpl 	gfx/gl/GLLibraryEGL.h:148
8 	libxul.so 	mozilla::gl::GLContext::MakeCurrent 	GLContext.h:603
9 	libxul.so 	mozilla::gl::GLContextEGL::Init 	gfx/gl/GLContextProviderEGL.cpp:332
10 	libxul.so 	mozilla::gl::GLContextEGL::CreateEGLPBufferOffscreenContext 	gfx/gl/GLContextProviderEGL.cpp:1735
11 	libxul.so 	mozilla::gl::GLContextProviderEGL::CreateOffscreen 	gfx/gl/GLContextProviderEGL.cpp:1912
12 	libxul.so 	mozilla::WebGLContext::SetDimensions 	content/canvas/src/WebGLContext.cpp:527
13 	libxul.so 	nsHTMLCanvasElement::UpdateContext 	content/html/content/src/nsHTMLCanvasElement.cpp:652
14 	libxul.so 	nsHTMLCanvasElement::GetContext 	content/html/content/src/nsHTMLCanvasElement.cpp:573
15 	libxul.so 	nsIDOMHTMLCanvasElement_GetContext 	obj-firefox/js/xpconnect/src/dom_quickstubs.cpp:18012
etc...
Comment on attachment 618824 [details] [diff] [review]
kill GetVendor() and EGLUtils.h

Review of attachment 618824 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/android/GfxInfo.cpp
@@ +90,5 @@
>  
>  nsresult
>  GfxInfo::Init()
>  {
> +  mAdapterDescription.AssignLiteral("");

And make it Truncate()
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The only way I see to make progress is for someone who knows how to use GDB to get a hold of a Adreno device. Do we have one in the Toronto office?
This is just a stopgap fix. It works, except that it relies on having a global GLContext, which we currently do have, but BenWa tells me that we may soon stop having. Just uploading it in the unlikely case we would fail to find a better solution for 14.
So, this seems to be working on the Adreno phone I'm using ("HTC Desire", a deliciously cynical name for a rectangular piece of plastic and other polluting materials).

The key is that the already landed blacklist seems (in my experience) to always allow us to survice the _first_ WebGL context creation. We crash during the second WebGL context creation, specifically in the first MakeCurrent on the newly created OpenGL context.

So this patch takes advantage of this, during the first WebGL context creation, when it blacklists Adreno, it now also sets a global flag disallowing further WebGL context creation. Since it's a global variable, the only assumption I'm making here is that we run all content in the same process, which should be safe enough for now. I can't crash my "Desire" anymore with that patch.

I don't know why the crashes only occur with WebGL's OpenGL contexts, not with the layermanager's.

Long term, this bug shows that we really REALLY need a way to get renderer info without creating a OpenGL context. If Android provides no API for this, maybe we should investigate doing something like we currently do on GLX with the glxtest process.
Attachment #621836 - Attachment is obsolete: true
Attachment #621844 - Flags: review?(joe)
Comment on attachment 621844 [details] [diff] [review]
hopefully better patch: keep webgl adreno blacklist and globally disallow further webgl context creation once it's been triggered

Still crashing on http://spidergl.org/example.php?id=1
Attachment #621844 - Attachment is obsolete: true
Attachment #621844 - Flags: review?(joe)
Alright, this is a polished version of the earlier 'stopgap fix'. It really works, but it's not future-proof as it relies on the global GL context that's going away soon. But it probably won't go away in Aurora 14, so this should at least allow us to ship Fennec 14.

Longer term, we need to do some message passing so we can get the LayerManager's OpenGL context's renderer id, from the OMTC thread.
Attachment #622078 - Flags: review?(bgirard)
Attachment #622078 - Flags: review?(bgirard) → review+
Comment on attachment 622078 [details] [diff] [review]
short-term fix: blacklist adreno using the global GLContext to get renderer info

[Approval Request Comment]
Regression caused by (bug #): Not a regression, just a driver crashing on us
User impact if declined: WebGL consistently crashes in Fennec on Android phones with Adreno GPUs
Testing completed (on m-c, etc.): m-i
Risk to taking this patch (and alternatives if risky): not risky
String changes made by this patch: none
Attachment #622078 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/be8ed36e1a20
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Whiteboard: [landed on inbound][gfx][native-crash][mobile-crash][sumo] → [gfx][native-crash][mobile-crash][sumo]
BenWa tells me that after all, the global GLContext is here to stay, therefore, the present fix may actually be our medium-term fix.

Long-term, we still want to have better blacklisting tools on Android. For example, the present approach wouldn't work in situations where the crash occurs during the creation of the global GLContext.
We are leaving all non-beta+ bugs nominated for Aurora approval in the queue until FN14 Beta 1 is signed off on by QA.
This issue is still reproducible on Beta:
Firefox Native 14.0b1 build2 (2012-05-09)
Device: HTC Desire Z
OS: Android 2.3.3
Yup, the real fix hasn't received aurora approval yet.
Comment on attachment 622078 [details] [diff] [review]
short-term fix: blacklist adreno using the global GLContext to get renderer info

we've got sign off on beta 1 so please go ahead and land this now.
Attachment #622078 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Is bp-5059dece-cb72-4c95-897b-5ff062120511 that happened in 15.0a1/20120511 a different issue?
(In reply to Scoobidiver from comment #55)
> Is bp-5059dece-cb72-4c95-897b-5ff062120511 that happened in 15.0a1/20120511
> a different issue?

This one doesn't seem involve WebGL: there is no trace of WebGL in the call stack. That also explains why my patch here, implementing the blocking in the WebGL implementation, does not avoid it.

This is worth a separate follow-up bug.
Verified on 2012-05-17 Nightly, Aurora and 14 beta 2. HTC Desire Z/G2, Sony Xperia HD (both Adreno 205) and HTC Desire (Adreno 200).
Status: RESOLVED → VERIFIED
This issue is still reproducible on Beta:
Firefox Native 15.0b6 build1 (2012-08-22)
Device: HTC Desire Z (Adreno 205)
OS: Android 2.3.3

STR:
1. Load http://get.webgl.org/
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Please file a new bug.
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
I filed Bug 784672 regarding this issue.
You need to log in before you can comment on or make changes to this bug.