Closed Bug 736123 Opened 13 years ago Closed 13 years ago

crash in delete_texture when using WebGL on Adreno GPU

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
critical

Tracking

(firefox14 verified, firefox15 verified, blocking-fennec1.0 +, fennec+)

VERIFIED FIXED
Firefox 15
Tracking Status
firefox14 --- verified
firefox15 --- verified
blocking-fennec1.0 --- +
fennec + ---

People

(Reporter: martijn.martijn, Assigned: bjacob)

References

()

Details

(4 keywords, Whiteboard: [gfx][native-crash][mobile-crash][sumo])

Crash Data

Attachments

(3 files, 2 obsolete files)

I was crashing on this WebGL demo site: http://www.ro.me/ Using the HTC Desire HD, using today's trunk build (with Maple merged in), Android 2.3.5. Steps to reproduce: - I visited the site from the link in this page: http://people.mozilla.org/~mwargers/tests/layout/layout_demos.htm - I'm only seeing black and after a while, I get a 'not supported' page. In that case, I press the Android back button and tap again on the http://www.ro.me/ from the layout_demos page. Usually, it crashes for me within 3 times on the HTC Desire HD. This bug was filed from the Socorro interface and is report bp-cf2915be-4b88-408a-b7d6-2e4c92120315 . ============================================================= 0 libc.so __libc_android_abort 1 libc.so dlfree 2 libc.so free 3 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x6d9eb 4 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x65b27 5 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x65627 6 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x65b27 7 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x65b55 8 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x65b27 9 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x66dd3 10 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x1c380e 11 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x615cb 12 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x1c380e 13 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x6177b 14 libGLESv2_adreno200.so libGLESv2_adreno200.so@0x640cf 15 libEGL_adreno200.so libEGL_adreno200.so@0x1438e 16 libEGL_adreno200.so libEGL_adreno200.so@0xedce 17 libEGL_adreno200.so libEGL_adreno200.so@0x1438e 18 libEGL_adreno200.so libEGL_adreno200.so@0x1438e 19 libEGL_adreno200.so libEGL_adreno200.so@0xee56 20 libEGL_adreno200.so libEGL_adreno200.so@0x1438e 21 libEGL_adreno200.so libEGL_adreno200.so@0xef8a 22 libgsl.so os_mutex_unlock 23 libEGL_adreno200.so libEGL_adreno200.so@0xe85e 24 libEGL_adreno200.so libEGL_adreno200.so@0x7746 25 libEGL_adreno200.so libEGL_adreno200.so@0x1438e 26 libEGL_adreno200.so libEGL_adreno200.so@0x10faa 27 libEGL.so eglGetProcAddress 28 libEGL_adreno200.so libEGL_adreno200.so@0x10fc6 29 libEGL.so eglMakeCurrent 30 libxul.so mozilla::gl::GLContextEGL::MakeCurrentImpl gfx/gl/GLContextProviderEGL.cpp:444 31 libxul.so mozilla::gl::GLContext::MakeCurrent GLContext.h:620 32 libxul.so mozilla::WebGLContext::DestroyResourcesAndContext content/canvas/src/WebGLContext.cpp:190 33 libxul.so mozilla::WebGLContext::~WebGLContext content/canvas/src/WebGLContext.cpp:178 34 libxul.so mozilla::WebGLContext::~WebGLContext content/canvas/src/WebGLContext.cpp:182 35 libxul.so mozilla::WebGLContext::Release content/canvas/src/WebGLContext.cpp:1146 36 libxul.so XPCJSRuntime::GCCallback js/xpconnect/src/XPCJSRuntime.cpp:622 37 libxul.so js::GCSlice js/src/jsgc.cpp:3721 38 libxul.so js::IncrementalGC js/src/jsfriendapi.cpp:158 39 libxul.so nsXPConnect::Collect js/xpconnect/src/nsXPConnect.cpp:422 40 libxul.so nsXPConnect::GarbageCollect js/xpconnect/src/nsXPConnect.cpp:432 41 libxul.so nsJSContext::GarbageCollectNow dom/base/nsJSEnvironment.cpp:3059 42 libxul.so GCTimerFired dom/base/nsJSEnvironment.cpp:3219 43 libxul.so nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:508 44 libxul.so nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:591 45 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:657 46 libxul.so NS_ProcessNextEvent_P obj-firefox/xpcom/build/nsThreadUtils.cpp:245 47 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:110 48 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:208 49 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:201 50 libxul.so nsBaseAppShell::Run widget/xpwidgets/nsBaseAppShell.cpp:189
Summary: crash in __libc_android_abort → crash in __libc_android_abort when using WebGL on adreno
tracking-fennec: --- → ?
blocking-fennec1.0: --- → ?
libGLESv2_adreno200.so@0x6d9eb is delete_texture.
Ok, I get this crash too while zooming in on the cube at http://get.webgl.org/ .
(In reply to Ali Juma [:ajuma] from comment #1) > libGLESv2_adreno200.so@0x6d9eb is delete_texture. Suggestion: check (using APItrace or a hacked MOZ_GL_DEBUG_VERBOSE) if we're deleting the same texture twice. OpenGL specifies that this should silently do nothing, but it's easy to guess that some drivers will crash in that case, doing a double free.
tracking-fennec: ? → +
blocking-fennec1.0: ? → beta+
Whiteboard: [gfx]
Benoit, we're either going to have to block WebGL on Adreno devices, or do some sort of workaround in which we change scripts' use of textures to only use 2^N and only use glTexImage (never glTexSubImage). I leave it up to you to decide which is more palatable!
Assignee: nobody → bjacob
The workarounds you describe sound awful; in particular they would require us to cache all texture images on our side, unless perhaps glGetTexImage is somehow available (and reliable) as a OpenGL ES extension on this driver. So, if the choice really boils down to either do such workarounds or block, then I would rather block. However: 1. please share the analysis leading you to think it's the only alternative 2. If it comes to this, how about instead allowing WebGL but disabling glTexSubImage2D and rejecting NPOT textures? That would be non-conformant of course, but might be considered better than no WebGL at all. Perhaps if we know a list of WebGL demos/games we want to support, we could check if they would still run with this limited WebGL support.
(In reply to Benoit Jacob [:bjacob] (On vacation until March 18) from comment #5) > 2. If it comes to this, how about instead allowing WebGL but disabling > glTexSubImage2D and rejecting NPOT textures? That would be non-conformant of > course, but might be considered better than no WebGL at all. Perhaps if we > know a list of WebGL demos/games we want to support, we could check if they > would still run with this limited WebGL support. Ignore that part, sorry, that was stupid: it would break web pages that have a WebGL and a non-WebGL versions: these would rather fall back to the non-WebGL version, than have a non-correctly-rendering WebGL version.
Keywords: reproducible
Whiteboard: [gfx] → [gfx][native-crash]
Hardware: All → ARM
I was able to reproduce this bug on the latest Nightly build by going to http://webglsamples.googlecode.com/hg/aquarium/aquarium.html Report: https://crash-stats.mozilla.com/report/index/bp-c578c99a-3e4c-4620-919e-2fa112120316 -- Firefox 14.0a1 (2012-03-16) Device: HTC Desire Z OS: Android 2.3.3
Crash Signature: [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x6d9eb] → [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x6d9eb] [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x91843 ]
Summary: crash in __libc_android_abort when using WebGL on adreno → crash in delete_texture when using WebGL on Adreno GPU
Joe, are you able to respond to #6?
We know that glTexSubImage2D and (non-POT textures in general) are broken on older Adreno drivers. Therefore, we can't let scripts use them, because they'll either cause crashes or expose random texture memory. As you say, we shouldn't just disable the broken parts, because sites already have to have fallbacks for not supporting WebGL. Therefore, we should block WebGL on Adreno GPUs. Does that seem reasonable? Am I missing something else?
That's reasonable, yes. What I was missing is that you already know for a fact that glTexSubImage2D is broken. That wasn't obvious from the title and first comments in this bug, which pointed more to texture deletion.
I'm not sure if glTexSubImage2D is broken in its own right, or if it's just SubImage uploads with NPOT textures.
(In reply to Joe Drew (:JOEDREW!) from comment #9) > We know that glTexSubImage2D and (non-POT textures in general) are broken on > older Adreno drivers. > Therefore, we should block WebGL on Adreno GPUs. Can't you use a WebGL blocklist in order that recent drivers don't be blocklisted, if you known the cutoff version/date?
(In reply to Scoobidiver from comment #12) > Can't you use a WebGL blocklist in order that recent drivers don't be > blocklisted, if you known the cutoff version/date? We don't know the cutoff version and we can't get that information
(In reply to George Wright (:gw280) from comment #13) > we can't get that information We can collect that if the driver version is written in crash reports by monitoring crash stats during the Aurora and Beta phase as we did for Direct2D with the desktop version during 4.0pre.
(In reply to Scoobidiver from comment #14) > (In reply to George Wright (:gw280) from comment #13) > > we can't get that information > We can collect that if the driver version is written in crash reports by > monitoring crash stats during the Aurora and Beta phase as we did for > Direct2D with the desktop version during 4.0pre. We can whitelist specific versions, but we've been told there's no way for us to tell from a given version number if it's good or not (ie - there isn't a simple version number which we can check against to see if it's greater than for NPOT texture fixes).
Can we whitelist recent Adreno GPUs where we are sure they work?
We believe that the renderer string changes from (for example) "Adreno 205" to "Adreno (TM) 205" in newer versions of the driver, so AIUI we are implicitly whitelisting newer drivers.
blocking-fennec1.0: beta+ → +
Crash Signature: [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x6d9eb] [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x91843 ] → [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x6d9eb] [@ __libc_android_abort | dlfree | free | libGLESv2_adreno200.so@0x91843 ] [@ libEGL_adreno200.so@0xbbf4] [@ libEGL_adreno200.so@0xc240 ] [@ libEGL_adreno200.so@0xbc7c ] [@ li…
Interesting. I believe I ran into this crash just by launching, then hitting the back button and going back into fennc/nightly.
So, it looks like the conclusion was to block older Adreno devices. Do we move this to AMO for that?
No, we should do this in code.
Looping in product for awareness.
Keywords: relnote
Whiteboard: [gfx][native-crash] → [gfx][native-crash] sumo
I would like to do this, but, looking at https://crash-stats.mozilla.com/report/index/bp-c578c99a-3e4c-4620-919e-2fa112120316 I don't see "Adreno" in the GfxInfo strings. Am I missing something? How are we supposed to check for these Adreno GPUs?
it just returns "Android" on my phone and in the crash reports here.
Attachment #618824 - Flags: review?(jmuizelaar)
Rather ugly ad-hoc blacklisting in WebGL initialization code, as we need the actual GL strings to do that blacklisting and we can't easily have them in GfxInfo.
Attachment #618825 - Flags: review?(joe)
Attachment #618825 - Flags: review?(joe) → review+
Whiteboard: [gfx][native-crash] sumo → [gfx][native-crash][mobile-crash][sumo]
Comment on attachment 618824 [details] [diff] [review] kill GetVendor() and EGLUtils.h Review of attachment 618824 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/android/GfxInfo.cpp @@ +90,5 @@ > > nsresult > GfxInfo::Init() > { > + mAdapterDescription.AssignLiteral(""); Add a comment about why this is here.
Attachment #618824 - Flags: review?(jmuizelaar) → review+
Comment on attachment 618824 [details] [diff] [review] kill GetVendor() and EGLUtils.h [Approval Request Comment] Regression caused by (bug #): User impact if declined: just to keep aurora consistent with central so that the subsequent patches that I want to land on it, apply cleanly. Testing completed (on m-c, etc.): inbound Risk to taking this patch (and alternatives if risky): no risk String changes made by this patch: none
Attachment #618824 - Flags: approval-mozilla-aurora?
Comment on attachment 618825 [details] [diff] [review] blacklist Adreno renderers for WebGL [Approval Request Comment] Regression caused by (bug #): User impact if declined: crashes for users of adreno gpus Testing completed (on m-c, etc.): inbound Risk to taking this patch (and alternatives if risky): could get unhappy users if somehow the blacklisting is not needed for some driver version. Only affects WebGL. String changes made by this patch: none
Attachment #618825 - Flags: approval-mozilla-aurora?
Target Milestone: Firefox 14 → Firefox 15
Whiteboard: [gfx][native-crash][mobile-crash][sumo] → [landed on inbound][gfx][native-crash][mobile-crash][sumo]
Attachment #618824 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #618825 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #31) > I'm still crashing with an Aurora build that supposedly has this fix in it: > ftp://ftp.mozilla.org/pub/mobile/tinderbox-builds/mozilla-aurora-android/ > 1335995418/ > Changeset here: > https://hg.mozilla.org/releases/mozilla-aurora/rev/8a68244c078b > > Crash id: > https://crash-stats.mozilla.com/report/index/bp-c71dd6c9-8cb6-4734-8755- > 940812120502 > This is on the HTC Desire HD, trying to load http://www.ro.me/ Strange, i ran the same device and same build as this, and i get no crashes. instead, i get an expected WebGL not supported message in content. http://i.imgur.com/Gplc6.png
In about:support on that Aurora build, it say WebGl Renderer false. On regular trunk, I get Qualcomm -- Adreno 205 -- OpenGL ES 2.0 1403843
Anyone who is still crashing with these patches: please go to about:support and paste here the 'WebGL Renderer' line. As Joe explained in Comment 17, newer driver versions report "Adreno (TM) 205" instead of "Adreno 205". We currently only check for the old strings, without the (TM). If you crash with a 'WebGL Renderer' string that has the (TM), that means that we have to blacklist these too. That will be a simple patch to gfx/gl/GLContext.cpp.
(In reply to Benoit Jacob [:bjacob] from comment #35) > Anyone who is still crashing with these patches: please go to about:support > and paste here the 'WebGL Renderer' line. My HTC Desire HD shows "WebGL Renderer = false" But i have not experienced the crashing like comment 31 and comment 34.
(In reply to Tony Chung [:tchung] from comment #36) > My HTC Desire HD shows "WebGL Renderer = false" But i have not experienced > the crashing like comment 31 and comment 34. I guess that it might be related to Bug 746330
Crash report from today's trunk build (which also contains the fix now): https://crash-stats.mozilla.com/report/index/bp-deb191ac-bfd7-4fca-a5cf-5f71e2120503 0 libc.so __libc_android_abort 1 libc.so dlfree 2 libc.so free 3 libstdc++.so _ZdlPv 4 libEGL.so _ZN7android33egl_get_image_for_current_contextEPv 5 libEGL.so eglDestroyContext 6 libEGL.so eglMakeCurrent 7 libxul.so mozilla::gl::GLContextEGL::MakeCurrentImpl gfx/gl/GLLibraryEGL.h:148 8 libxul.so mozilla::gl::GLContext::MakeCurrent GLContext.h:603 9 libxul.so mozilla::gl::GLContextEGL::Init gfx/gl/GLContextProviderEGL.cpp:332 10 libxul.so mozilla::gl::GLContextEGL::CreateEGLPBufferOffscreenContext gfx/gl/GLContextProviderEGL.cpp:1735 11 libxul.so mozilla::gl::GLContextProviderEGL::CreateOffscreen gfx/gl/GLContextProviderEGL.cpp:1912 12 libxul.so mozilla::WebGLContext::SetDimensions content/canvas/src/WebGLContext.cpp:527 13 libxul.so nsHTMLCanvasElement::UpdateContext content/html/content/src/nsHTMLCanvasElement.cpp:652 14 libxul.so nsHTMLCanvasElement::GetContext content/html/content/src/nsHTMLCanvasElement.cpp:573 15 libxul.so nsIDOMHTMLCanvasElement_GetContext obj-firefox/js/xpconnect/src/dom_quickstubs.cpp:18012 etc...
Comment on attachment 618824 [details] [diff] [review] kill GetVendor() and EGLUtils.h Review of attachment 618824 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/android/GfxInfo.cpp @@ +90,5 @@ > > nsresult > GfxInfo::Init() > { > + mAdapterDescription.AssignLiteral(""); And make it Truncate()
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The only way I see to make progress is for someone who knows how to use GDB to get a hold of a Adreno device. Do we have one in the Toronto office?
This is just a stopgap fix. It works, except that it relies on having a global GLContext, which we currently do have, but BenWa tells me that we may soon stop having. Just uploading it in the unlikely case we would fail to find a better solution for 14.
So, this seems to be working on the Adreno phone I'm using ("HTC Desire", a deliciously cynical name for a rectangular piece of plastic and other polluting materials). The key is that the already landed blacklist seems (in my experience) to always allow us to survice the _first_ WebGL context creation. We crash during the second WebGL context creation, specifically in the first MakeCurrent on the newly created OpenGL context. So this patch takes advantage of this, during the first WebGL context creation, when it blacklists Adreno, it now also sets a global flag disallowing further WebGL context creation. Since it's a global variable, the only assumption I'm making here is that we run all content in the same process, which should be safe enough for now. I can't crash my "Desire" anymore with that patch. I don't know why the crashes only occur with WebGL's OpenGL contexts, not with the layermanager's. Long term, this bug shows that we really REALLY need a way to get renderer info without creating a OpenGL context. If Android provides no API for this, maybe we should investigate doing something like we currently do on GLX with the glxtest process.
Attachment #621836 - Attachment is obsolete: true
Attachment #621844 - Flags: review?(joe)
Comment on attachment 621844 [details] [diff] [review] hopefully better patch: keep webgl adreno blacklist and globally disallow further webgl context creation once it's been triggered Still crashing on http://spidergl.org/example.php?id=1
Attachment #621844 - Attachment is obsolete: true
Attachment #621844 - Flags: review?(joe)
Alright, this is a polished version of the earlier 'stopgap fix'. It really works, but it's not future-proof as it relies on the global GL context that's going away soon. But it probably won't go away in Aurora 14, so this should at least allow us to ship Fennec 14. Longer term, we need to do some message passing so we can get the LayerManager's OpenGL context's renderer id, from the OMTC thread.
Attachment #622078 - Flags: review?(bgirard)
Attachment #622078 - Flags: review?(bgirard) → review+
Comment on attachment 622078 [details] [diff] [review] short-term fix: blacklist adreno using the global GLContext to get renderer info [Approval Request Comment] Regression caused by (bug #): Not a regression, just a driver crashing on us User impact if declined: WebGL consistently crashes in Fennec on Android phones with Adreno GPUs Testing completed (on m-c, etc.): m-i Risk to taking this patch (and alternatives if risky): not risky String changes made by this patch: none
Attachment #622078 - Flags: approval-mozilla-aurora?
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Whiteboard: [landed on inbound][gfx][native-crash][mobile-crash][sumo] → [gfx][native-crash][mobile-crash][sumo]
BenWa tells me that after all, the global GLContext is here to stay, therefore, the present fix may actually be our medium-term fix. Long-term, we still want to have better blacklisting tools on Android. For example, the present approach wouldn't work in situations where the crash occurs during the creation of the global GLContext.
We are leaving all non-beta+ bugs nominated for Aurora approval in the queue until FN14 Beta 1 is signed off on by QA.
This issue is still reproducible on Beta: Firefox Native 14.0b1 build2 (2012-05-09) Device: HTC Desire Z OS: Android 2.3.3
Yup, the real fix hasn't received aurora approval yet.
Comment on attachment 622078 [details] [diff] [review] short-term fix: blacklist adreno using the global GLContext to get renderer info we've got sign off on beta 1 so please go ahead and land this now.
Attachment #622078 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Is bp-5059dece-cb72-4c95-897b-5ff062120511 that happened in 15.0a1/20120511 a different issue?
(In reply to Scoobidiver from comment #55) > Is bp-5059dece-cb72-4c95-897b-5ff062120511 that happened in 15.0a1/20120511 > a different issue? This one doesn't seem involve WebGL: there is no trace of WebGL in the call stack. That also explains why my patch here, implementing the blocking in the WebGL implementation, does not avoid it. This is worth a separate follow-up bug.
Verified on 2012-05-17 Nightly, Aurora and 14 beta 2. HTC Desire Z/G2, Sony Xperia HD (both Adreno 205) and HTC Desire (Adreno 200).
Status: RESOLVED → VERIFIED
This issue is still reproducible on Beta: Firefox Native 15.0b6 build1 (2012-08-22) Device: HTC Desire Z (Adreno 205) OS: Android 2.3.3 STR: 1. Load http://get.webgl.org/
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Please file a new bug.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
I filed Bug 784672 regarding this issue.
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: