Closed
Bug 827170
Opened 12 years ago
Closed 12 years ago
Firefox 20 spike in crash at mozilla::layers::LayerManagerOGL::CreateFBOWithTexture with abort message: "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width <n>, aRect.height <m>"
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla21
People
(Reporter: scoobidiver, Assigned: bjacob)
References
()
Details
(4 keywords, Whiteboard: [native-crash])
Crash Data
Attachments
(2 files, 2 obsolete files)
1.49 KB,
patch
|
BenWa
:
review-
|
Details | Diff | Splinter Review |
2.27 KB,
patch
|
BenWa
:
review+
bajaj
:
approval-mozilla-aurora+
bajaj
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
It's #3 top crasher in 20.0a1 and first showed up in 20.0a1/20130103. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a812ef63de87&tochange=6955309291ee
It might be caused by bug 825692.
Signature mozalloc_abort(char const*) | NS_DebugBreak_P | mozilla::layers::LayerManagerOGL::CreateFBOWithTexture(nsIntRect const&, mozilla::layers::LayerManagerOGL::InitMode, unsigned int, unsigned int*, unsigned int*) More Reports Search
UUID c716a678-3641-4deb-85ec-af6f52130106
Date Processed 2013-01-06 19:26:19
Uptime 19
Last Crash 21.7 hours before submission
Install Age 19 seconds since version was first installed.
Install Time 2013-01-06 19:25:50
Product FennecAndroid
Version 20.0a1
Build ID 20130106030902
Release Channel nightly
OS Android
OS Version 0.0.0 Linux 3.0.8-02784-g4dbe869 #1 SMP PREEMPT Wed Dec 5 01:54:41 UTC 2012 armv7l Android/tate/tate:4.0.3/IML74K/7.2.3_user_2330720:user/release-keys
Build Architecture arm
Build Architecture Info
Crash Reason SIGSEGV
Crash Address 0x0
App Notes
AdapterDescription: 'Imagination Technologies -- PowerVR SGX 540 -- OpenGL ES 2.0 build 1.8@785978 -- Model: KFTT, Product: Kindle Fire, Manufacturer: Amazon, Hardware: bowser'
EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ xpcom_runtime_abort(###!!! ABORT: Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width 660, aRect.height 4126: file ../../../gfx/layers/opengl/LayerManagerOGL.cpp, line 1534)
Amazon KFTT
Android/tate/tate:4.0.3/IML74K/7.2.3_user_2330720:user/release-keys
Processor Notes /data/socorro/stackwalk/bin/exploitable: ERROR: unable to analyze dump
EMCheckCompatibility True
Adapter Vendor ID Imagination Technologies
Adapter Device ID PowerVR SGX 540
Device Amazon KFTT
Android API Version 15 (REL)
Android CPU ABI armeabi-v7a
Frame Module Signature Source
0 libmozalloc.so mozalloc_abort mozalloc_abort.cpp:30
1 libxul.so NS_DebugBreak_P nsDebugImpl.cpp:422
2 libxul.so mozilla::layers::LayerManagerOGL::CreateFBOWithTexture LayerManagerOGL.cpp:1534
3 libxul.so mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> ContainerLayerOGL.cpp:225
4 libxul.so mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> ContainerLayerOGL.cpp:263
5 libxul.so mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> ContainerLayerOGL.cpp:263
6 libxul.so mozilla::layers::ContainerRender<mozilla::layers::ShadowContainerLayerOGL> ContainerLayerOGL.cpp:263
7 libxul.so mozilla::layers::LayerManagerOGL::Render LayerManagerOGL.cpp:1120
8 libxul.so mozilla::layers::LayerManagerOGL::EndTransaction LayerManagerOGL.cpp:788
9 libxul.so mozilla::layers::LayerManagerOGL::EndEmptyTransaction LayerManagerOGL.cpp:729
10 libxul.so mozilla::layers::CompositorParent::Composite CompositorParent.cpp:620
11 libxul.so RunnableMethod<IPC::ChannelProxy::Context, void tuple.h:383
12 libxul.so MessageLoop::RunTask message_loop.cc:333
13 libxul.so MessageLoop::DeferOrRunPendingTask message_loop.cc:341
14 libxul.so MessageLoop::DoWork message_loop.cc:441
15 libxul.so base::MessagePumpDefault::Run message_pump_default.cc:23
16 libxul.so MessageLoop::RunInternal message_loop.cc:215
17 libxul.so MessageLoop::Run message_loop.cc:208
18 libxul.so base::Thread::ThreadMain thread.cc:156
19 libxul.so ThreadFunc platform_thread_posix.cc:39
20 libc.so libc.so@0x12cce
21 libc.so libc.so@0x12822
22 libEGL.so libEGL.so@0x23e82
More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char+const*%29+|+NS_DebugBreak_P+|+mozilla%3A%3Alayers%3A%3ALayerManagerOGL%3A%3ACreateFBOWithTexture%28nsIntRect+const%26%2C+mozilla%3A%3Alayers%3A%3ALayerManagerOGL%3A%3AInitMode%2C+unsigned+int%2C+unsigned+int*%2C+unsigned+int*%29
Assignee | ||
Comment 1•12 years ago
|
||
The 'aRect.height 4126' suggests that this might be an issue about us trying to use a texture of size > 4096 on a device that doesn't support that.
Reporter | ||
Comment 2•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] (On vacation, back on Jan. 7th) from comment #1)
> The 'aRect.height 4126'
There are other abort messages with lower values.
Summary: Firefox 20 spike in crash at mozilla::layers::LayerManagerOGL::CreateFBOWithTexture with abort message: "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width 660, aRect.height 4126" → Firefox 20 spike in crash at mozilla::layers::LayerManagerOGL::CreateFBOWithTexture with abort message: "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width <n>, aRect.height <m>"
Updated•12 years ago
|
Comment 3•12 years ago
|
||
I'm crashing with this stack on http://m.vd.nl/heren/jeans.html
When I tap on the select drop down box "Prijs" and choose on of the available options.
This is on the Galaxy Nexus, using trunk.
Reporter | ||
Comment 4•12 years ago
|
||
In fact, the spike is not so obvious. It's the real ranking of bug 705641, i.e. #1, as stated in bug 705641 comment 44. Only the signature has morphed into a correct one.
Indeed, here are crashes with this abort message per builddate:
* 20130107: 12
* 20130106: 20
* 20130105: 19
* 20130104: 14
* 20130103: 11 <- first appearance of this signature
* 20130102: 6
* 20130104: 15
* 20121231: 4
* 20121230: 10
* 20121229: 5
I think it should be duplicate to bug 705641.
Updated•12 years ago
|
tracking-fennec: ? → 20+
Comment 5•12 years ago
|
||
CCing Jgilbert. Jeff, can you please help with some investigation here .This could be a dup of 705641 as per comment# 4 ?
Also CCing Jeff Muizelaar to see if he can take a look here as the description suggests bug 825692 may have regressed this.Thanks !
Comment 6•12 years ago
|
||
Let's see if we can deal with this while 20 is in Aurora, to avoid chasing it while in Beta.
Assignee: nobody → bjacob
Assignee | ||
Comment 7•12 years ago
|
||
OK, let's look a bit into these failures. Taking the 2013-01-04 crash data.
First, let's check which GPUs are hitting this problem:
bjacob:~/crash-stats$ zcat 20130104-pub-crashdata.csv.gz | grep "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width"| sed 's/^.*\(AdapterDescription[^|]*\)|.*$/\1/g' | sort | uniq -c | sort -rn | head -n10
892 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: Nexus 7, Product: nakasi, Manufacturer: asus, Hardware: grouper'
142 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: ASUS Transformer Pad TF700T, Product: WW_epad, Manufacturer: asus, Hardware: cardhu'
108 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: ASUS Transformer Pad TF700T, Product: US_epad, Manufacturer: asus, Hardware: cardhu'
67 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: Nexus 7, Product: nakasig, Manufacturer: asus, Hardware: grouper'
65 AdapterDescription: 'Imagination Technologies -- PowerVR SGX 540 -- OpenGL ES 2.0 build 1.8@785978 -- Model: GT-P5110, Product: espresso10wifixx, Manufacturer: samsung, Hardware: espresso10'
58 AdapterDescription: 'Imagination Technologies -- PowerVR SGX 540 -- OpenGL ES 2.0 build 1.8@785978 -- Model: GT-P5100, Product: espresso10rfxx, Manufacturer: samsung, Hardware: espresso10'
50 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: A700, Product: a700_emea_de, Manufacturer: Acer, Hardware: picasso_mf'
30 AdapterDescription: 'ARM -- Mali-T604 -- OpenGL ES 2.0 -- Model: Nexus 10, Product: mantaray, Manufacturer: samsung, Hardware: manta'
28 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: HTC One X, Product: endeavoru, Manufacturer: HTC, Hardware: endeavoru'
28 AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra 3 -- OpenGL ES 2.0 14.01002 -- Model: ASUS Transformer Pad TF300T, Product: WW_epad, Manufacturer: asus, Hardware: cardhu'
So we have mostly Tegra 3's, but also enough PowerVR that we can't just shrug this off as just a Tegra 3 bug. OpenGL framebuffer completeless is very finicky, so I am very uncomfortable in the first place that we have an assertion on it.
So let's look into more detail into why these drivers think that these framebuffers are incomplete. Let's look at the sizes. The following command outputs this in <count> <width> <height> form, sorted by decreasing count:
$ zcat 20130104-pub-crashdata.csv.gz | grep "Framebuffer not complete -- error 0x8cd6, mFBOTextureTarget 0xde1, aRect.width"| sed 's/^.*aRect.width\ \([0-9]\+\)\,\ aRect.height\ \([0-9]\+\).*$/\1 \2/g' | sort | uniq -c | sort -rn | head -n 30
32 548 2057
30 632 2391
29 609 2057
23 800 2190
17 800 2304
16 900 2057
16 755 2058
16 720 2304
16 659 2057
16 609 2066
15 652 2057
15 548 2066
13 653 2057
13 651 2057
12 900 2051
12 658 2057
12 647 2057
11 654 2057
10 655 2057
10 1792 2304
9 900 2066
9 649 2057
9 647 2066
9 646 2057
9 641 2057
8 656 2057
8 653 2066
8 648 2057
8 645 2057
7 768 2215
Something jumps to eyes here: in all of these cases, one of the dimensions (here the height; in some other cases the width) is greater than 2048. So these drivers mostly only give us trouble when we try to use sizes greater than 2048. To be fair, the above command only shows the 30th most common cases; the un-truncated list does show a very few crashes with smaller sizes, but that's so few that if we had only those few crashes we wouldn't care or even notice.
Can't we just limit the max texture size we use in mobile Layers to 2048 and call it a day? That should be large enough for us.
Assignee | ||
Comment 8•12 years ago
|
||
Folks! As the previous comment shows, there are a lot of mobile GPUs out there that have trouble with texture sizes > 2048 (as render targets) even though they are supposed to support them. Can we simply limit texture size to 2048 on mobile altogether (since we'd have to do it at least on Tegra3 and PowerVR 540) or would that break things? I sort of assume that we should be able to, as a lot of mobile GPUs don't support texture sizes greater than 2048 anyways.
Assignee | ||
Comment 9•12 years ago
|
||
In fact, both the Tegra 3 in a Nexus 7 ("grouper", the #1 device in the above list) and PowerVR's have a 2048 max texture size. So the problem is that we don't even check that the texture sizes we request are in range.
BenWa has made a testcase:
http://people.mozilla.com/~bgirard/large_framebuffer.html
this reproduces the crash.
The next question is what do we do? Ideally we'd tile everything, but we need a short-term fix; we can either not render at all or render in a broken way; looking at what the competition does, both Chrome and the stock browser seem to opt for incorrect rendering.
Comment 10•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #8)
> Folks! As the previous comment shows, there are a lot of mobile GPUs out
> there that have trouble with texture sizes > 2048 (as render targets) even
> though they are supposed to support them. Can we simply limit texture size
> to 2048 on mobile altogether (since we'd have to do it at least on Tegra3
> and PowerVR 540) or would that break things? I sort of assume that we should
> be able to, as a lot of mobile GPUs don't support texture sizes greater than
> 2048 anyways.
What's the downside? If a 3k x 3k image shows up on the page, we will still display it? What will stop working or slow down on platforms that do support > 2048? if we do this, how do we track that we've done it and revisit it every so often, to see if our assumptions and reasons are still valid?
Comment 11•12 years ago
|
||
Alright the problem is well understood:
When using hardware acceleration sometimes we fall back to an intermediate surface (framebuffer) to provide the correct rendering but as bjacob mentions this surface can be very large as controlled by the web page and will just crash.
Testcase:
The test case has animation and group opacity and uses animated css transforms which recommends to the browser engine to put the content into layers. This causes a container layer that uses an intermediate surface with two child thebes layer. This intermediate surface grows past 2,0000 pixels as the divs rotate.
The correct behavior is the blue square will be over the red square and NOT blend with the red. The two squares will fade in and out.
Behavior using a 2k max PowerVR gpu:
1) Firefox Mobile: We crash if the page causes the layer to be the the GPU allows without any fallback.
2) Stock: They don't honor group opacity thus wouldn't need an intermediate surface (framebuffer).
3) Chrome: They appear to fallback to a very slow rendering. This is glitchy since the fallback isn't fast enough to keep up with the animation.
I think if the intermediate surface is small enough we should honor group opacity and if it's not we should behave like stock and ignore it.
Assignee | ||
Updated•12 years ago
|
Assignee | ||
Comment 12•12 years ago
|
||
BenWa, do you plan to write the patch or do you want to teach me to?
Comment 13•12 years ago
|
||
Here's the outline of the patch:
In ContainerRender() we decide if we should use a frameBuffer to render if UseIntermediateSurface() is true. We should additionally check the size of the framebuffer we want to allocate 'framebufferRect' to be support by the GL driver. If so we ignore that we need an intermediate surface and simply get bad rendering.
Comment 14•12 years ago
|
||
Can't we just allocate a smaller sized framebuffer than we actually need, and have GL scale down?
We'd lose quality, but surely that's better than abandoning correctness.
Assignee | ||
Comment 15•12 years ago
|
||
Assignee | ||
Comment 16•12 years ago
|
||
Comment on attachment 703671 [details] [diff] [review]
tentative patch, but OOM's for now
this patch implements comment 13. Unexpectedly, it replaces the crash by... and out of memory crash. Need to investigate why.
Attachment #703671 -
Attachment description: this patch implements comment 13. Unexpectedly, it replaces the crash by... and out of memory crash. Need to investigate why. → tentative patch, but OOM's for now
Comment 17•12 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #14)
> Can't we just allocate a smaller sized framebuffer than we actually need,
> and have GL scale down?
>
> We'd lose quality, but surely that's better than abandoning correctness.
YES! I forgot about that. It's a great idea.
Assignee | ||
Comment 18•12 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #14)
> Can't we just allocate a smaller sized framebuffer than we actually need,
> and have GL scale down?
>
> We'd lose quality, but surely that's better than abandoning correctness.
Sounds good, but that also gave me an OOM :-/
Assignee | ||
Comment 19•12 years ago
|
||
This implements Matt's idea; it doesn't actually OOM as I erroneously said. I think I just needed to reboot my phone. Good thing I went to a concert yesterday night and turned it off!
It actually runs without crashing, and about:memory looks sane. There are some occasional rendering glitches that seem to pertain to tiling, but it's better than a crash.
Attachment #703671 -
Attachment is obsolete: true
Attachment #704041 -
Flags: review?(bgirard)
Assignee | ||
Comment 20•12 years ago
|
||
remove unwanted change
Attachment #704041 -
Attachment is obsolete: true
Attachment #704041 -
Flags: review?(bgirard)
Attachment #704042 -
Flags: review?(bgirard)
Comment 21•12 years ago
|
||
Comment on attachment 704042 [details] [diff] [review]
limit framebuffer size
The patch looks good but you need a big fat comment explaining what is going on here and a good patch summary.
Attachment #704042 -
Flags: review?(bgirard) → review-
Assignee | ||
Comment 22•12 years ago
|
||
Attachment #704071 -
Flags: review?(bgirard)
Reporter | ||
Comment 23•12 years ago
|
||
Will the patch fix the Mac OS X crash when logging in to iCloud? See https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozalloc_abort%28char%20const*%29%20|%20NS_DebugBreak_P%20|%20mozilla%3A%3Alayers%3A%3ALayerManagerOGL%3A%3ACreateFBOWithTexture%28nsIntRect%20const%26%2C%20mozilla%3A%3Alayers%3A%3ALayerManagerOGL%3A%3AInitMode%2C%20unsigned%20int%2C%20unsigned%20int*%2C%20unsigned%20int*%29
Assignee | ||
Comment 24•12 years ago
|
||
Probably, as that looks like the same bug (same place according to the stack, size > 2048; Intel GPU, so it is plausible that 2048 would be the max texture size.
Reporter | ||
Updated•12 years ago
|
OS: Android → All
Hardware: ARM → All
Comment 25•12 years ago
|
||
Comment 26•12 years ago
|
||
Comment on attachment 704071 [details] [diff] [review]
limit framebuffer size v2
Review of attachment 704071 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/opengl/ContainerLayerOGL.cpp
@@ +197,5 @@
> + // we're about to create a framebuffer backed by textures to use as an intermediate
> + // surface. What to do if its size (as given by framebufferRect) would exceed the
> + // maximum texture size supported by the GL? The present code chooses the compromise
> + // of just clamping the framebuffer's size to the max supported size.
> + // See bug 827170 for a discussion.
Can you add the following to be clear that we're not truncating the surface but rather 'resizing' it:
This gives us a lower resolution rendering of the intermediate surface (children layers).
Attachment #704071 -
Flags: review?(bgirard) → review+
Assignee | ||
Comment 27•12 years ago
|
||
Target Milestone: --- → mozilla21
Assignee | ||
Comment 28•12 years ago
|
||
Comment on attachment 704071 [details] [diff] [review]
limit framebuffer size v2
[Approval Request Comment]
Bug caused by (feature/regressing bug #): comment 0 suspects it's regressed in bug 825692. But the underlying issue being fixed here has been around longer.
User impact if declined: crashes
Testing completed (on m-c, etc.): m-i
Risk to taking this patch (and alternatives if risky): not risky, tiny simple patch
String or UUID changes made by this patch: none
Attachment #704071 -
Flags: approval-mozilla-aurora?
Reporter | ||
Comment 29•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #28)
> Bug caused by (feature/regressing bug #): comment 0 suspects it's regressed
> in bug 825692. But the underlying issue being fixed here has been around
> longer.
I said in comment 4 it's likely a dupe of bug 705641 (#1 on Android and #4-5 on Mac in 18.0 and 19.0 Beta) as the spike is not obvious. The only certainty is that 20.0 and above have a right stack trace and a single crash signature making it more visible in top crashers.
Is it safe to uplift it also in 19.0 Beta 4?
Assignee | ||
Comment 30•12 years ago
|
||
This feels like a very safe patch, so, let's uplift it to wherever it's useful.
Assignee | ||
Updated•12 years ago
|
Assignee | ||
Updated•12 years ago
|
Assignee | ||
Comment 31•12 years ago
|
||
Comment on attachment 704071 [details] [diff] [review]
limit framebuffer size v2
[Approval Request Comment]
See preceding comments.
Attachment #704071 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 32•12 years ago
|
||
Backed out because we suspect that this is what might have caused the Android R4 oranges, SVG Reftest failures.
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=31399fd0cb5b
That would indeed happen if SVG reftests caused layers larger than 2048 pixels in one dimension.
In which case, the easiest path forward would be to specifically disable these tests.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a6ab6b0770ce
Updated•12 years ago
|
status-firefox19:
--- → affected
tracking-firefox19:
--- → +
Assignee | ||
Comment 33•12 years ago
|
||
Comment 34•12 years ago
|
||
Comment 35•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 36•12 years ago
|
||
Comment on attachment 704071 [details] [diff] [review]
limit framebuffer size v2
Considering patch is very safe and helps fix a top-crasher, approving on aurora/beta .
Please make sure to land on beta before EOD today/tomorrow morning to get this into 19.0b4.Thanks !
Attachment #704071 -
Flags: approval-mozilla-beta?
Attachment #704071 -
Flags: approval-mozilla-beta+
Attachment #704071 -
Flags: approval-mozilla-aurora?
Attachment #704071 -
Flags: approval-mozilla-aurora+
Assignee | ||
Comment 37•12 years ago
|
||
Comment 38•12 years ago
|
||
Oh, and so far it looks like this helped on trunk, no crashes after the build from the 27th so far.
Comment 39•12 years ago
|
||
Is this also related to FF desktop ?
Reporter | ||
Comment 40•12 years ago
|
||
(In reply to Paul Silaghi [QA] from comment #39)
> Is this also related to FF desktop ?
Yes (see comment 24) but the remaining crashes are tracked in bug 705641 where 20 and above seem unaffected while 19.0 Beta is still affected.
Comment 41•12 years ago
|
||
Thanks Scoobidiver.
Verified fixed based on comment 40.
You need to log in
before you can comment on or make changes to this bug.
Description
•