Firefox 20 spike in crash at mozilla::layers::LayerManagerOGL::CreateFBOWithTexture with abort message: "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width <n>, aRect.height <m>"

RESOLVED FIXED in Firefox 19

Status

()

--
critical
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: scoobidiver, Assigned: bjacob)

Tracking

(4 keywords)

20 Branch
mozilla21
crash, regression, testcase, topcrash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox19+ verified, firefox20+ verified, fennec20+)

Details

(Whiteboard: [native-crash], crash signature, URL)

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

6 years ago
It's #3 top crasher in 20.0a1 and first showed up in 20.0a1/20130103. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a812ef63de87&tochange=6955309291ee
It might be caused by bug 825692.

Signature 	mozalloc_abort(char const*) | NS_DebugBreak_P | mozilla::layers::LayerManagerOGL::CreateFBOWithTexture(nsIntRect const&, mozilla::layers::LayerManagerOGL::InitMode, unsigned int, unsigned int*, unsigned int*) More Reports Search
UUID	c716a678-3641-4deb-85ec-af6f52130106
Date Processed	2013-01-06 19:26:19
Uptime	19
Last Crash	21.7 hours before submission
Install Age	19 seconds since version was first installed.
Install Time	2013-01-06 19:25:50
Product	FennecAndroid
Version	20.0a1
Build ID	20130106030902
Release Channel	nightly
OS	Android
OS Version	0.0.0 Linux 3.0.8-02784-g4dbe869 #1 SMP PREEMPT Wed Dec 5 01:54:41 UTC 2012 armv7l Android/tate/tate:4.0.3/IML74K/7.2.3_user_2330720:user/release-keys
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0x0
App Notes 	
AdapterDescription: 'Imagination Technologies -- PowerVR SGX 540 -- OpenGL ES 2.0 build 1.8@785978 -- Model: KFTT, Product: Kindle Fire, Manufacturer: Amazon, Hardware: bowser'
EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ xpcom_runtime_abort(###!!! ABORT: Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width 660, aRect.height 4126: file ../../../gfx/layers/opengl/LayerManagerOGL.cpp, line 1534)
Amazon KFTT
Android/tate/tate:4.0.3/IML74K/7.2.3_user_2330720:user/release-keys
Processor Notes 	/data/socorro/stackwalk/bin/exploitable: ERROR: unable to analyze dump
EMCheckCompatibility	True
Adapter Vendor ID	Imagination Technologies
Adapter Device ID	PowerVR SGX 540
Device	Amazon KFTT
Android API Version	15 (REL)
Android CPU ABI	armeabi-v7a

Frame 	Module 	Signature 	Source
0 	libmozalloc.so 	mozalloc_abort 	mozalloc_abort.cpp:30
1 	libxul.so 	NS_DebugBreak_P 	nsDebugImpl.cpp:422
2 	libxul.so 	mozilla::layers::LayerManagerOGL::CreateFBOWithTexture 	LayerManagerOGL.cpp:1534
3 	libxul.so 	mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> 	ContainerLayerOGL.cpp:225
4 	libxul.so 	mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> 	ContainerLayerOGL.cpp:263
5 	libxul.so 	mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> 	ContainerLayerOGL.cpp:263
6 	libxul.so 	mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> 	ContainerLayerOGL.cpp:263
7 	libxul.so 	mozilla::layers::LayerManagerOGL::Render 	LayerManagerOGL.cpp:1120
8 	libxul.so 	mozilla::layers::LayerManagerOGL::EndTransaction 	LayerManagerOGL.cpp:788
9 	libxul.so 	mozilla::layers::LayerManagerOGL::EndEmptyTransaction 	LayerManagerOGL.cpp:729
10 	libxul.so 	mozilla::layers::CompositorParent::Composite 	CompositorParent.cpp:620
11 	libxul.so 	RunnableMethod<IPC::ChannelProxy::Context, void 	tuple.h:383
12 	libxul.so 	MessageLoop::RunTask 	message_loop.cc:333
13 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	message_loop.cc:341
14 	libxul.so 	MessageLoop::DoWork 	message_loop.cc:441
15 	libxul.so 	base::MessagePumpDefault::Run 	message_pump_default.cc:23
16 	libxul.so 	MessageLoop::RunInternal 	message_loop.cc:215
17 	libxul.so 	MessageLoop::Run 	message_loop.cc:208
18 	libxul.so 	base::Thread::ThreadMain 	thread.cc:156
19 	libxul.so 	ThreadFunc 	platform_thread_posix.cc:39
20 	libc.so 	libc.so@0x12cce 	
21 	libc.so 	libc.so@0x12822 	
22 	libEGL.so 	libEGL.so@0x23e82

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char+const*%29+|+NS_DebugBreak_P+|+mozilla%3A%3Alayers%3A%3ALayerManagerOGL%3A%3ACreateFBOWithTexture%28nsIntRect+const%26%2C+mozilla%3A%3Alayers%3A%3ALayerManagerOGL%3A%3AInitMode%2C+unsigned+int%2C+unsigned+int*%2C+unsigned+int*%29
(Assignee)

Comment 1

6 years ago
The 'aRect.height 4126' suggests that this might be an issue about us trying to use a texture of size > 4096 on a device that doesn't support that.
(Reporter)

Comment 2

6 years ago
(In reply to Benoit Jacob [:bjacob] (On vacation, back on Jan. 7th) from comment #1)
> The 'aRect.height 4126'
There are other abort messages with lower values.
Summary: Firefox 20 spike in crash at mozilla::layers::LayerManagerOGL::CreateFBOWithTexture with abort message: "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width 660, aRect.height 4126" → Firefox 20 spike in crash at mozilla::layers::LayerManagerOGL::CreateFBOWithTexture with abort message: "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width <n>, aRect.height <m>"

Updated

6 years ago
tracking-firefox20: ? → +
I'm crashing with this stack on http://m.vd.nl/heren/jeans.html
When I tap on the select drop down box "Prijs" and choose on of the available options. 
This is on the Galaxy Nexus, using trunk.
(Reporter)

Comment 4

6 years ago
In fact, the spike is not so obvious. It's the real ranking of bug 705641, i.e. #1, as stated in bug 705641 comment 44. Only the signature has morphed into a correct one.
Indeed, here are crashes with this abort message per builddate:
* 20130107: 12
* 20130106: 20
* 20130105: 19
* 20130104: 14
* 20130103: 11  <- first appearance of this signature
* 20130102: 6
* 20130104: 15
* 20121231: 4
* 20121230: 10
* 20121229: 5

I think it should be duplicate to bug 705641.
tracking-fennec: ? → 20+
CCing Jgilbert. Jeff, can you please help with some investigation here .This could be  a dup of 705641 as per comment# 4 ? 

Also CCing  Jeff Muizelaar to see if he can take a look here as the description suggests bug 825692 may have regressed this.Thanks !
Let's see if we can deal with this while 20 is in Aurora, to avoid chasing it while in Beta.
Assignee: nobody → bjacob
(Assignee)

Comment 7

6 years ago
OK, let's look a bit into these failures. Taking the 2013-01-04 crash data.

First, let's check which GPUs are hitting this problem:

bjacob:~/crash-stats$ zcat 20130104-pub-crashdata.csv.gz | grep "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width"| sed 's/^.*\(AdapterDescription[^|]*\)|.*$/\1/g' | sort | uniq -c | sort -rn | head -n10
    892 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: Nexus 7, Product: nakasi, Manufacturer: asus, Hardware: grouper' 
    142 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: ASUS Transformer Pad TF700T, Product: WW_epad, Manufacturer: asus, Hardware: cardhu' 
    108 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: ASUS Transformer Pad TF700T, Product: US_epad, Manufacturer: asus, Hardware: cardhu' 
     67 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: Nexus 7, Product: nakasig, Manufacturer: asus, Hardware: grouper' 
     65 AdapterDescription: 'Imagination Technologies -- PowerVR SGX 540 -- OpenGL ES 2.0 build 1.8@785978 -- Model: GT-P5110, Product: espresso10wifixx, Manufacturer: samsung, Hardware: espresso10' 
     58 AdapterDescription: 'Imagination Technologies -- PowerVR SGX 540 -- OpenGL ES 2.0 build 1.8@785978 -- Model: GT-P5100, Product: espresso10rfxx, Manufacturer: samsung, Hardware: espresso10' 
     50 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: A700, Product: a700_emea_de, Manufacturer: Acer, Hardware: picasso_mf' 
     30 AdapterDescription: 'ARM -- Mali-T604 -- OpenGL ES 2.0 -- Model: Nexus 10, Product: mantaray, Manufacturer: samsung, Hardware: manta' 
     28 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: HTC One X, Product: endeavoru, Manufacturer: HTC, Hardware: endeavoru' 
     28 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: ASUS Transformer Pad TF300T, Product: WW_epad, Manufacturer: asus, Hardware: cardhu' 


So we have mostly Tegra 3's, but also enough PowerVR that we can't just shrug this off as just a Tegra 3 bug. OpenGL framebuffer completeless is very finicky, so I am very uncomfortable in the first place that we have an assertion on it.

So let's look into more detail into why these drivers think that these framebuffers are incomplete. Let's look at the sizes. The following command outputs this in <count> <width> <height> form, sorted by decreasing count:

$ zcat 20130104-pub-crashdata.csv.gz | grep "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width"| sed 's/^.*aRect.width\ \([0-9]\+\)\,\ aRect.height\ \([0-9]\+\).*$/\1 \2/g' | sort | uniq -c | sort -rn | head -n 30
     32 548 2057
     30 632 2391
     29 609 2057
     23 800 2190
     17 800 2304
     16 900 2057
     16 755 2058
     16 720 2304
     16 659 2057
     16 609 2066
     15 652 2057
     15 548 2066
     13 653 2057
     13 651 2057
     12 900 2051
     12 658 2057
     12 647 2057
     11 654 2057
     10 655 2057
     10 1792 2304
      9 900 2066
      9 649 2057
      9 647 2066
      9 646 2057
      9 641 2057
      8 656 2057
      8 653 2066
      8 648 2057
      8 645 2057
      7 768 2215

Something jumps to eyes here: in all of these cases, one of the dimensions (here the height; in some other cases the width) is greater than 2048. So these drivers mostly only give us trouble when we try to use sizes greater than 2048. To be fair, the above command only shows the 30th most common cases; the un-truncated list does show a very few crashes with smaller sizes, but that's so few that if we had only those few crashes we wouldn't care or even notice.

Can't we just limit the max texture size we use in mobile Layers to 2048 and call it a day? That should be large enough for us.
(Assignee)

Comment 8

6 years ago
Folks! As the previous comment shows, there are a lot of mobile GPUs out there that have trouble with texture sizes > 2048 (as render targets) even though they are supposed to support them. Can we simply limit texture size to 2048 on mobile altogether (since we'd have to do it at least on Tegra3 and PowerVR 540) or would that break things? I sort of assume that we should be able to, as a lot of mobile GPUs don't support texture sizes greater than 2048 anyways.
(Assignee)

Comment 9

6 years ago
In fact, both the Tegra 3 in a Nexus 7 ("grouper", the #1 device in the above list) and PowerVR's have a 2048 max texture size. So the problem is that we don't even check that the texture sizes we request are in range.

BenWa has made a testcase:

  http://people.mozilla.com/~bgirard/large_framebuffer.html

this reproduces the crash.

The next question is what do we do? Ideally we'd tile everything, but we need a short-term fix; we can either not render at all or render in a broken way; looking at what the competition does, both Chrome and the stock browser seem to opt for incorrect rendering.
(In reply to Benoit Jacob [:bjacob] from comment #8)
> Folks! As the previous comment shows, there are a lot of mobile GPUs out
> there that have trouble with texture sizes > 2048 (as render targets) even
> though they are supposed to support them. Can we simply limit texture size
> to 2048 on mobile altogether (since we'd have to do it at least on Tegra3
> and PowerVR 540) or would that break things? I sort of assume that we should
> be able to, as a lot of mobile GPUs don't support texture sizes greater than
> 2048 anyways.

What's the downside?  If a 3k x 3k image shows up on the page, we will still display it?  What will stop working or slow down on platforms that do support > 2048?  if we do this, how do we track that we've done it and revisit it every so often, to see if our assumptions and reasons are still valid?
Alright the problem is well understood:

When using hardware acceleration sometimes we fall back to an intermediate surface (framebuffer) to provide the correct rendering but as bjacob mentions this surface can be very large as controlled by the web page and will just crash.

Testcase:
The test case has animation and group opacity and uses animated css transforms which recommends to the browser engine to put the content into layers. This causes a container layer that uses an intermediate surface with two child thebes layer. This intermediate surface grows past 2,0000 pixels as the divs rotate.

The correct behavior is the blue square will be over the red square and NOT blend with the red. The two squares will fade in and out. 

Behavior using a 2k max PowerVR gpu:
1) Firefox Mobile: We crash if the page causes the layer to be the the GPU allows without any fallback.
2) Stock: They don't honor group opacity thus wouldn't need an intermediate surface (framebuffer).
3) Chrome: They appear to fallback to a very slow rendering. This is glitchy since the fallback isn't fast enough to keep up with the animation.

I think if the intermediate surface is small enough we should honor group opacity and if it's not we should behave like stock and ignore it.
BenWa, do you plan to write the patch or do you want to teach me to?
(Reporter)

Updated

6 years ago
Keywords: testcase
Here's the outline of the patch:
In ContainerRender() we decide if we should use a frameBuffer to render if UseIntermediateSurface() is true. We should additionally check the size of the framebuffer we want to allocate 'framebufferRect' to be support by the GL driver. If so we ignore that we need an intermediate surface and simply get bad rendering.
Can't we just allocate a smaller sized framebuffer than we actually need, and have GL scale down?

We'd lose quality, but surely that's better than abandoning correctness.
Created attachment 703671 [details] [diff] [review]
tentative patch, but OOM's for now
Comment on attachment 703671 [details] [diff] [review]
tentative patch, but OOM's for now

this patch implements comment 13. Unexpectedly, it replaces the crash by... and out of memory crash.  Need to investigate why.
Attachment #703671 - Attachment description: this patch implements comment 13. Unexpectedly, it replaces the crash by... and out of memory crash. Need to investigate why. → tentative patch, but OOM's for now
(In reply to Matt Woodrow (:mattwoodrow) from comment #14)
> Can't we just allocate a smaller sized framebuffer than we actually need,
> and have GL scale down?
> 
> We'd lose quality, but surely that's better than abandoning correctness.

YES! I forgot about that. It's a great idea.
(In reply to Matt Woodrow (:mattwoodrow) from comment #14)
> Can't we just allocate a smaller sized framebuffer than we actually need,
> and have GL scale down?
> 
> We'd lose quality, but surely that's better than abandoning correctness.

Sounds good, but that also gave me an OOM :-/
Created attachment 704041 [details] [diff] [review]
limit framebuffer size

This implements Matt's idea; it doesn't actually OOM as I erroneously said. I think I just needed to reboot my phone. Good thing I went to a concert yesterday night and turned it off!

It actually runs without crashing, and about:memory looks sane. There are some occasional rendering glitches that seem to pertain to tiling, but it's better than a crash.
Attachment #703671 - Attachment is obsolete: true
Attachment #704041 - Flags: review?(bgirard)
Created attachment 704042 [details] [diff] [review]
limit framebuffer size

remove unwanted change
Attachment #704041 - Attachment is obsolete: true
Attachment #704041 - Flags: review?(bgirard)
Attachment #704042 - Flags: review?(bgirard)
Comment on attachment 704042 [details] [diff] [review]
limit framebuffer size

The patch looks good but you need a big fat comment explaining what is going on here and a good patch summary.
Attachment #704042 - Flags: review?(bgirard) → review-
Created attachment 704071 [details] [diff] [review]
limit framebuffer size v2
Attachment #704071 - Flags: review?(bgirard)
Probably, as that looks like the same bug (same place according to the stack, size > 2048; Intel GPU, so it is plausible that 2048 would be the max texture size.
(Reporter)

Updated

6 years ago
OS: Android → All
Hardware: ARM → All
Comment on attachment 704071 [details] [diff] [review]
limit framebuffer size v2

Review of attachment 704071 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/ContainerLayerOGL.cpp
@@ +197,5 @@
> +    // we're about to create a framebuffer backed by textures to use as an intermediate
> +    // surface. What to do if its size (as given by framebufferRect) would exceed the
> +    // maximum texture size supported by the GL? The present code chooses the compromise
> +    // of just clamping the framebuffer's size to the max supported size.
> +    // See bug 827170 for a discussion.

Can you add the following to be clear that we're not truncating the surface but rather 'resizing' it:
This gives us a lower resolution rendering of the intermediate surface (children layers).
Attachment #704071 - Flags: review?(bgirard) → review+
Comment on attachment 704071 [details] [diff] [review]
limit framebuffer size v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): comment 0 suspects it's regressed in bug 825692. But the underlying issue being fixed here has been around longer.
User impact if declined: crashes
Testing completed (on m-c, etc.): m-i
Risk to taking this patch (and alternatives if risky): not risky, tiny simple patch
String or UUID changes made by this patch: none
Attachment #704071 - Flags: approval-mozilla-aurora?
(Reporter)

Comment 29

6 years ago
(In reply to Benoit Jacob [:bjacob] from comment #28)
> Bug caused by (feature/regressing bug #): comment 0 suspects it's regressed
> in bug 825692. But the underlying issue being fixed here has been around
> longer.
I said in comment 4 it's likely a dupe of bug 705641 (#1 on Android and #4-5 on Mac in 18.0 and 19.0 Beta) as the spike is not obvious. The only certainty is that 20.0 and above have a right stack trace and a single crash signature making it more visible in top crashers.
Is it safe to uplift it also in 19.0 Beta 4?
This feels like a very safe patch, so, let's uplift it to wherever it's useful.
(Assignee)

Updated

6 years ago
status-firefox20: affected → fixed
(Assignee)

Updated

6 years ago
status-firefox20: fixed → affected
Comment on attachment 704071 [details] [diff] [review]
limit framebuffer size v2

[Approval Request Comment]
See preceding comments.
Attachment #704071 - Flags: approval-mozilla-beta?
Backed out because we suspect that this is what might have caused the Android R4 oranges, SVG Reftest failures.

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=31399fd0cb5b

That would indeed happen if SVG reftests caused layers larger than 2048 pixels in one dimension.

In which case, the easiest path forward would be to specifically disable these tests.

https://hg.mozilla.org/integration/mozilla-inbound/rev/a6ab6b0770ce

Updated

6 years ago
status-firefox19: --- → affected
tracking-firefox19: --- → +
https://hg.mozilla.org/mozilla-central/rev/20ca3148d336
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Comment on attachment 704071 [details] [diff] [review]
limit framebuffer size v2

Considering patch is very safe and helps fix a top-crasher, approving on aurora/beta .

Please make sure to land on beta before EOD today/tomorrow morning to get this into 19.0b4.Thanks !
Attachment #704071 - Flags: approval-mozilla-beta?
Attachment #704071 - Flags: approval-mozilla-beta+
Attachment #704071 - Flags: approval-mozilla-aurora?
Attachment #704071 - Flags: approval-mozilla-aurora+

Comment 38

6 years ago
Oh, and so far it looks like this helped on trunk, no crashes after the build from the 27th so far.
Is this also related to FF desktop ?
(Reporter)

Comment 40

6 years ago
(In reply to Paul Silaghi [QA] from comment #39)
> Is this also related to FF desktop ?
Yes (see comment 24) but the remaining crashes are tracked in bug 705641 where 20 and above seem unaffected while 19.0 Beta is still affected.
Thanks Scoobidiver.
Verified fixed based on comment 40.
status-firefox19: fixed → verified
status-firefox20: fixed → verified
Blocks: 746730
You need to log in before you can comment on or make changes to this bug.