Closed Bug 827170 Opened 12 years ago Closed 12 years ago

Firefox 20 spike in crash at mozilla::layers::LayerManagerOGL::CreateFBOWithTexture with abort message: "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width <n>, aRect.height <m>"

Categories

(Core :: Graphics: Layers, defect)

20 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla21
Tracking Status
firefox19 + verified
firefox20 + verified
fennec 20+ ---

People

(Reporter: scoobidiver, Assigned: bjacob)

References

()

Details

(4 keywords, Whiteboard: [native-crash])

Crash Data

Attachments

(2 files, 2 obsolete files)

It's #3 top crasher in 20.0a1 and first showed up in 20.0a1/20130103. The regression range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a812ef63de87&tochange=6955309291ee It might be caused by bug 825692. Signature mozalloc_abort(char const*) | NS_DebugBreak_P | mozilla::layers::LayerManagerOGL::CreateFBOWithTexture(nsIntRect const&, mozilla::layers::LayerManagerOGL::InitMode, unsigned int, unsigned int*, unsigned int*) More Reports Search UUID c716a678-3641-4deb-85ec-af6f52130106 Date Processed 2013-01-06 19:26:19 Uptime 19 Last Crash 21.7 hours before submission Install Age 19 seconds since version was first installed. Install Time 2013-01-06 19:25:50 Product FennecAndroid Version 20.0a1 Build ID 20130106030902 Release Channel nightly OS Android OS Version 0.0.0 Linux 3.0.8-02784-g4dbe869 #1 SMP PREEMPT Wed Dec 5 01:54:41 UTC 2012 armv7l Android/tate/tate:4.0.3/IML74K/7.2.3_user_2330720:user/release-keys Build Architecture arm Build Architecture Info Crash Reason SIGSEGV Crash Address 0x0 App Notes AdapterDescription: 'Imagination Technologies -- PowerVR SGX 540 -- OpenGL ES 2.0 build 1.8@785978 -- Model: KFTT, Product: Kindle Fire, Manufacturer: Amazon, Hardware: bowser' EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ xpcom_runtime_abort(###!!! ABORT: Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width 660, aRect.height 4126: file ../../../gfx/layers/opengl/LayerManagerOGL.cpp, line 1534) Amazon KFTT Android/tate/tate:4.0.3/IML74K/7.2.3_user_2330720:user/release-keys Processor Notes /data/socorro/stackwalk/bin/exploitable: ERROR: unable to analyze dump EMCheckCompatibility True Adapter Vendor ID Imagination Technologies Adapter Device ID PowerVR SGX 540 Device Amazon KFTT Android API Version 15 (REL) Android CPU ABI armeabi-v7a Frame Module Signature Source 0 libmozalloc.so mozalloc_abort mozalloc_abort.cpp:30 1 libxul.so NS_DebugBreak_P nsDebugImpl.cpp:422 2 libxul.so mozilla::layers::LayerManagerOGL::CreateFBOWithTexture LayerManagerOGL.cpp:1534 3 libxul.so mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> ContainerLayerOGL.cpp:225 4 libxul.so mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> ContainerLayerOGL.cpp:263 5 libxul.so mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> ContainerLayerOGL.cpp:263 6 libxul.so mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> ContainerLayerOGL.cpp:263 7 libxul.so mozilla::layers::LayerManagerOGL::Render LayerManagerOGL.cpp:1120 8 libxul.so mozilla::layers::LayerManagerOGL::EndTransaction LayerManagerOGL.cpp:788 9 libxul.so mozilla::layers::LayerManagerOGL::EndEmptyTransaction LayerManagerOGL.cpp:729 10 libxul.so mozilla::layers::CompositorParent::Composite CompositorParent.cpp:620 11 libxul.so RunnableMethod<IPC::ChannelProxy::Context, void tuple.h:383 12 libxul.so MessageLoop::RunTask message_loop.cc:333 13 libxul.so MessageLoop::DeferOrRunPendingTask message_loop.cc:341 14 libxul.so MessageLoop::DoWork message_loop.cc:441 15 libxul.so base::MessagePumpDefault::Run message_pump_default.cc:23 16 libxul.so MessageLoop::RunInternal message_loop.cc:215 17 libxul.so MessageLoop::Run message_loop.cc:208 18 libxul.so base::Thread::ThreadMain thread.cc:156 19 libxul.so ThreadFunc platform_thread_posix.cc:39 20 libc.so libc.so@0x12cce 21 libc.so libc.so@0x12822 22 libEGL.so libEGL.so@0x23e82 More reports at: https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char+const*%29+|+NS_DebugBreak_P+|+mozilla%3A%3Alayers%3A%3ALayerManagerOGL%3A%3ACreateFBOWithTexture%28nsIntRect+const%26%2C+mozilla%3A%3Alayers%3A%3ALayerManagerOGL%3A%3AInitMode%2C+unsigned+int%2C+unsigned+int*%2C+unsigned+int*%29
The 'aRect.height 4126' suggests that this might be an issue about us trying to use a texture of size > 4096 on a device that doesn't support that.
(In reply to Benoit Jacob [:bjacob] (On vacation, back on Jan. 7th) from comment #1) > The 'aRect.height 4126' There are other abort messages with lower values.
Summary: Firefox 20 spike in crash at mozilla::layers::LayerManagerOGL::CreateFBOWithTexture with abort message: "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width 660, aRect.height 4126" → Firefox 20 spike in crash at mozilla::layers::LayerManagerOGL::CreateFBOWithTexture with abort message: "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width <n>, aRect.height <m>"
I'm crashing with this stack on http://m.vd.nl/heren/jeans.html When I tap on the select drop down box "Prijs" and choose on of the available options. This is on the Galaxy Nexus, using trunk.
In fact, the spike is not so obvious. It's the real ranking of bug 705641, i.e. #1, as stated in bug 705641 comment 44. Only the signature has morphed into a correct one. Indeed, here are crashes with this abort message per builddate: * 20130107: 12 * 20130106: 20 * 20130105: 19 * 20130104: 14 * 20130103: 11 <- first appearance of this signature * 20130102: 6 * 20130104: 15 * 20121231: 4 * 20121230: 10 * 20121229: 5 I think it should be duplicate to bug 705641.
tracking-fennec: ? → 20+
CCing Jgilbert. Jeff, can you please help with some investigation here .This could be a dup of 705641 as per comment# 4 ? Also CCing Jeff Muizelaar to see if he can take a look here as the description suggests bug 825692 may have regressed this.Thanks !
Let's see if we can deal with this while 20 is in Aurora, to avoid chasing it while in Beta.
Assignee: nobody → bjacob
OK, let's look a bit into these failures. Taking the 2013-01-04 crash data. First, let's check which GPUs are hitting this problem: bjacob:~/crash-stats$ zcat 20130104-pub-crashdata.csv.gz | grep "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width"| sed 's/^.*\(AdapterDescription[^|]*\)|.*$/\1/g' | sort | uniq -c | sort -rn | head -n10 892 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: Nexus 7, Product: nakasi, Manufacturer: asus, Hardware: grouper' 142 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: ASUS Transformer Pad TF700T, Product: WW_epad, Manufacturer: asus, Hardware: cardhu' 108 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: ASUS Transformer Pad TF700T, Product: US_epad, Manufacturer: asus, Hardware: cardhu' 67 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: Nexus 7, Product: nakasig, Manufacturer: asus, Hardware: grouper' 65 AdapterDescription: 'Imagination Technologies -- PowerVR SGX 540 -- OpenGL ES 2.0 build 1.8@785978 -- Model: GT-P5110, Product: espresso10wifixx, Manufacturer: samsung, Hardware: espresso10' 58 AdapterDescription: 'Imagination Technologies -- PowerVR SGX 540 -- OpenGL ES 2.0 build 1.8@785978 -- Model: GT-P5100, Product: espresso10rfxx, Manufacturer: samsung, Hardware: espresso10' 50 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: A700, Product: a700_emea_de, Manufacturer: Acer, Hardware: picasso_mf' 30 AdapterDescription: 'ARM -- Mali-T604 -- OpenGL ES 2.0 -- Model: Nexus 10, Product: mantaray, Manufacturer: samsung, Hardware: manta' 28 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: HTC One X, Product: endeavoru, Manufacturer: HTC, Hardware: endeavoru' 28 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: ASUS Transformer Pad TF300T, Product: WW_epad, Manufacturer: asus, Hardware: cardhu' So we have mostly Tegra 3's, but also enough PowerVR that we can't just shrug this off as just a Tegra 3 bug. OpenGL framebuffer completeless is very finicky, so I am very uncomfortable in the first place that we have an assertion on it. So let's look into more detail into why these drivers think that these framebuffers are incomplete. Let's look at the sizes. The following command outputs this in <count> <width> <height> form, sorted by decreasing count: $ zcat 20130104-pub-crashdata.csv.gz | grep "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width"| sed 's/^.*aRect.width\ \([0-9]\+\)\,\ aRect.height\ \([0-9]\+\).*$/\1 \2/g' | sort | uniq -c | sort -rn | head -n 30 32 548 2057 30 632 2391 29 609 2057 23 800 2190 17 800 2304 16 900 2057 16 755 2058 16 720 2304 16 659 2057 16 609 2066 15 652 2057 15 548 2066 13 653 2057 13 651 2057 12 900 2051 12 658 2057 12 647 2057 11 654 2057 10 655 2057 10 1792 2304 9 900 2066 9 649 2057 9 647 2066 9 646 2057 9 641 2057 8 656 2057 8 653 2066 8 648 2057 8 645 2057 7 768 2215 Something jumps to eyes here: in all of these cases, one of the dimensions (here the height; in some other cases the width) is greater than 2048. So these drivers mostly only give us trouble when we try to use sizes greater than 2048. To be fair, the above command only shows the 30th most common cases; the un-truncated list does show a very few crashes with smaller sizes, but that's so few that if we had only those few crashes we wouldn't care or even notice. Can't we just limit the max texture size we use in mobile Layers to 2048 and call it a day? That should be large enough for us.
Folks! As the previous comment shows, there are a lot of mobile GPUs out there that have trouble with texture sizes > 2048 (as render targets) even though they are supposed to support them. Can we simply limit texture size to 2048 on mobile altogether (since we'd have to do it at least on Tegra3 and PowerVR 540) or would that break things? I sort of assume that we should be able to, as a lot of mobile GPUs don't support texture sizes greater than 2048 anyways.
In fact, both the Tegra 3 in a Nexus 7 ("grouper", the #1 device in the above list) and PowerVR's have a 2048 max texture size. So the problem is that we don't even check that the texture sizes we request are in range. BenWa has made a testcase: http://people.mozilla.com/~bgirard/large_framebuffer.html this reproduces the crash. The next question is what do we do? Ideally we'd tile everything, but we need a short-term fix; we can either not render at all or render in a broken way; looking at what the competition does, both Chrome and the stock browser seem to opt for incorrect rendering.
(In reply to Benoit Jacob [:bjacob] from comment #8) > Folks! As the previous comment shows, there are a lot of mobile GPUs out > there that have trouble with texture sizes > 2048 (as render targets) even > though they are supposed to support them. Can we simply limit texture size > to 2048 on mobile altogether (since we'd have to do it at least on Tegra3 > and PowerVR 540) or would that break things? I sort of assume that we should > be able to, as a lot of mobile GPUs don't support texture sizes greater than > 2048 anyways. What's the downside? If a 3k x 3k image shows up on the page, we will still display it? What will stop working or slow down on platforms that do support > 2048? if we do this, how do we track that we've done it and revisit it every so often, to see if our assumptions and reasons are still valid?
Alright the problem is well understood: When using hardware acceleration sometimes we fall back to an intermediate surface (framebuffer) to provide the correct rendering but as bjacob mentions this surface can be very large as controlled by the web page and will just crash. Testcase: The test case has animation and group opacity and uses animated css transforms which recommends to the browser engine to put the content into layers. This causes a container layer that uses an intermediate surface with two child thebes layer. This intermediate surface grows past 2,0000 pixels as the divs rotate. The correct behavior is the blue square will be over the red square and NOT blend with the red. The two squares will fade in and out. Behavior using a 2k max PowerVR gpu: 1) Firefox Mobile: We crash if the page causes the layer to be the the GPU allows without any fallback. 2) Stock: They don't honor group opacity thus wouldn't need an intermediate surface (framebuffer). 3) Chrome: They appear to fallback to a very slow rendering. This is glitchy since the fallback isn't fast enough to keep up with the animation. I think if the intermediate surface is small enough we should honor group opacity and if it's not we should behave like stock and ignore it.
BenWa, do you plan to write the patch or do you want to teach me to?
Keywords: testcase
Here's the outline of the patch: In ContainerRender() we decide if we should use a frameBuffer to render if UseIntermediateSurface() is true. We should additionally check the size of the framebuffer we want to allocate 'framebufferRect' to be support by the GL driver. If so we ignore that we need an intermediate surface and simply get bad rendering.
Can't we just allocate a smaller sized framebuffer than we actually need, and have GL scale down? We'd lose quality, but surely that's better than abandoning correctness.
Comment on attachment 703671 [details] [diff] [review] tentative patch, but OOM's for now this patch implements comment 13. Unexpectedly, it replaces the crash by... and out of memory crash. Need to investigate why.
Attachment #703671 - Attachment description: this patch implements comment 13. Unexpectedly, it replaces the crash by... and out of memory crash. Need to investigate why. → tentative patch, but OOM's for now
(In reply to Matt Woodrow (:mattwoodrow) from comment #14) > Can't we just allocate a smaller sized framebuffer than we actually need, > and have GL scale down? > > We'd lose quality, but surely that's better than abandoning correctness. YES! I forgot about that. It's a great idea.
(In reply to Matt Woodrow (:mattwoodrow) from comment #14) > Can't we just allocate a smaller sized framebuffer than we actually need, > and have GL scale down? > > We'd lose quality, but surely that's better than abandoning correctness. Sounds good, but that also gave me an OOM :-/
Attached patch limit framebuffer size (obsolete) — Splinter Review
This implements Matt's idea; it doesn't actually OOM as I erroneously said. I think I just needed to reboot my phone. Good thing I went to a concert yesterday night and turned it off! It actually runs without crashing, and about:memory looks sane. There are some occasional rendering glitches that seem to pertain to tiling, but it's better than a crash.
Attachment #703671 - Attachment is obsolete: true
Attachment #704041 - Flags: review?(bgirard)
remove unwanted change
Attachment #704041 - Attachment is obsolete: true
Attachment #704041 - Flags: review?(bgirard)
Attachment #704042 - Flags: review?(bgirard)
Comment on attachment 704042 [details] [diff] [review] limit framebuffer size The patch looks good but you need a big fat comment explaining what is going on here and a good patch summary.
Attachment #704042 - Flags: review?(bgirard) → review-
Attachment #704071 - Flags: review?(bgirard)
Probably, as that looks like the same bug (same place according to the stack, size > 2048; Intel GPU, so it is plausible that 2048 would be the max texture size.
OS: Android → All
Hardware: ARM → All
Comment on attachment 704071 [details] [diff] [review] limit framebuffer size v2 Review of attachment 704071 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/opengl/ContainerLayerOGL.cpp @@ +197,5 @@ > + // we're about to create a framebuffer backed by textures to use as an intermediate > + // surface. What to do if its size (as given by framebufferRect) would exceed the > + // maximum texture size supported by the GL? The present code chooses the compromise > + // of just clamping the framebuffer's size to the max supported size. > + // See bug 827170 for a discussion. Can you add the following to be clear that we're not truncating the surface but rather 'resizing' it: This gives us a lower resolution rendering of the intermediate surface (children layers).
Attachment #704071 - Flags: review?(bgirard) → review+
Comment on attachment 704071 [details] [diff] [review] limit framebuffer size v2 [Approval Request Comment] Bug caused by (feature/regressing bug #): comment 0 suspects it's regressed in bug 825692. But the underlying issue being fixed here has been around longer. User impact if declined: crashes Testing completed (on m-c, etc.): m-i Risk to taking this patch (and alternatives if risky): not risky, tiny simple patch String or UUID changes made by this patch: none
Attachment #704071 - Flags: approval-mozilla-aurora?
(In reply to Benoit Jacob [:bjacob] from comment #28) > Bug caused by (feature/regressing bug #): comment 0 suspects it's regressed > in bug 825692. But the underlying issue being fixed here has been around > longer. I said in comment 4 it's likely a dupe of bug 705641 (#1 on Android and #4-5 on Mac in 18.0 and 19.0 Beta) as the spike is not obvious. The only certainty is that 20.0 and above have a right stack trace and a single crash signature making it more visible in top crashers. Is it safe to uplift it also in 19.0 Beta 4?
This feels like a very safe patch, so, let's uplift it to wherever it's useful.
Comment on attachment 704071 [details] [diff] [review] limit framebuffer size v2 [Approval Request Comment] See preceding comments.
Attachment #704071 - Flags: approval-mozilla-beta?
Backed out because we suspect that this is what might have caused the Android R4 oranges, SVG Reftest failures. https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=31399fd0cb5b That would indeed happen if SVG reftests caused layers larger than 2048 pixels in one dimension. In which case, the easiest path forward would be to specifically disable these tests. https://hg.mozilla.org/integration/mozilla-inbound/rev/a6ab6b0770ce
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment on attachment 704071 [details] [diff] [review] limit framebuffer size v2 Considering patch is very safe and helps fix a top-crasher, approving on aurora/beta . Please make sure to land on beta before EOD today/tomorrow morning to get this into 19.0b4.Thanks !
Attachment #704071 - Flags: approval-mozilla-beta?
Attachment #704071 - Flags: approval-mozilla-beta+
Attachment #704071 - Flags: approval-mozilla-aurora?
Attachment #704071 - Flags: approval-mozilla-aurora+
Oh, and so far it looks like this helped on trunk, no crashes after the build from the 27th so far.
Is this also related to FF desktop ?
(In reply to Paul Silaghi [QA] from comment #39) > Is this also related to FF desktop ? Yes (see comment 24) but the remaining crashes are tracked in bug 705641 where 20 and above seem unaffected while 19.0 Beta is still affected.
Thanks Scoobidiver. Verified fixed based on comment 40.
Blocks: 746730
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: