Closed
Bug 881018
Opened 12 years ago
Closed 12 years ago
crash in mozilla::layers::floor_div
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla25
People
(Reporter: scoobidiver, Assigned: jchen)
Details
(Keywords: crash, regression, topcrash, Whiteboard: [native-crash])
Crash Data
Attachments
(4 files, 1 obsolete file)
|
1.94 KB,
patch
|
kats
:
review+
kats
:
checkin-
|
Details | Diff | Splinter Review |
|
2.54 KB,
patch
|
kats
:
review+
cwiiis
:
review+
|
Details | Diff | Splinter Review |
|
1.54 KB,
patch
|
cwiiis
:
review+
|
Details | Diff | Splinter Review |
|
1.20 KB,
patch
|
cwiiis
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
It started spiking in 24.0a1/20130607. The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=204de5b7e0a6&tochange=dc8e78ed8c44
It might be a regression from bug 862952.
Signature mozilla::layers::floor_div More Reports Search
UUID 726b142d-ac26-4993-9cbc-259ca2130608
Date Processed 2013-06-08 18:33:32
Uptime 99
Last Crash 1.4 weeks before submission
Install Age 1.6 minutes since version was first installed.
Install Time 2013-06-08 18:30:40
Product FennecAndroid
Version 24.0a1
Build ID 20130608031212
Release Channel nightly
OS Android
OS Version 0.0.0 Linux 2.6.39.4-g4e32b94 #1 SMP PREEMPT Mon Sep 10 14:01:08 PDT 2012 armv7l motorola/tervigon/wingray:4.1.2/JZO54K/485486:user/release-keys
Build Architecture arm
Build Architecture Info ARMv0
Crash Reason SIGFPE
Crash Address 0x39b1
User Comments looking at non member section of Netflix Canada, i swear
App Notes
AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra -- OpenGL ES 2.0 -- Model: Xoom, Product: tervigon, Manufacturer: Motorola, Hardware: stingray'
GL Layers! EGL? EGL+ GL Context? GL Context+ GL Layers+ Stagefright? Stagefright+
Motorola Xoom
motorola/tervigon/wingray:4.1.2/JZO54K/485486:user/release-keys
Processor Notes sp-processor01_phx1_mozilla_com_17437:2012; exploitability tool: ERROR: unable to analyze dump
EMCheckCompatibility True
Adapter Vendor ID NVIDIA Corporation
Adapter Device ID NVIDIA Tegra
Device Motorola Xoom
Android API Version 16 (REL)
Android CPU ABI armeabi-v7a
Frame Module Signature Source
0 libc.so libc.so@0xdd00
1 libc.so libc.so@0x13937
2 libc.so libc.so@0x132e6
3 libxul.so mozilla::layers::floor_div gfx/layers/TiledLayerBuffer.h:209
4 libxul.so mozilla::layers::TiledContentHost::RenderLayerBuffer gfx/layers/TiledLayerBuffer.h:228
More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayers%3A%3Afloor_div
| Reporter | ||
Comment 1•12 years ago
|
||
It's #5 top crasher in 24.0a1.
Comment 2•12 years ago
|
||
It looks like if TiledLayerBuffer::SetResolution is called with > 256, we'll have a divide by zero in floor_div.
Flags: needinfo?(ncameron)
Comment 3•12 years ago
|
||
You probably want cwiiis for this, I don't know how the resolution should affect the tile buffer.
Flags: needinfo?(ncameron) → needinfo?(chrislord.net)
Updated•12 years ago
|
Comment 4•12 years ago
|
||
Yes, but the tiled buffer resolution is strictly controlled, so this should never happen. We only use it for low precision rendering at the moment, so if there's a case where it isn't <= 1, we have some other problem.
Flags: needinfo?(chrislord.net)
| Reporter | ||
Comment 5•12 years ago
|
||
A comment says: "pinching to zoom".
Updated•12 years ago
|
tracking-fennec: ? → 24+
| Reporter | ||
Comment 6•12 years ago
|
||
There are now non buggy stack traces. See bp-604ef497-0640-4731-82d1-7b2f72130621.
Comment 7•12 years ago
|
||
From that report:
0 libc.so libc.so@0x18188
1 libc.so libc.so@0xeee7
2 libc.so libc.so@0xe91e
3 libxul.so mozilla::layers::floor_div gfx/layers/TiledLayerBuffer.h:209
4 libxul.so mozilla::layers::TiledContentHost::RenderLayerBuffer gfx/layers/TiledLayerBuffer.h:228
5 libxul.so mozilla::layers::TiledContentHost::Composite gfx/layers/composite/TiledContentHost.cpp:171
6 libxul.so mozilla::layers::ThebesLayerComposite::RenderLayer gfx/layers/composite/ThebesLayerComposite.cpp:136
7 libxul.so void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>
8 libxul.so void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>
9 libxul.so void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>
10 libxul.so mozilla::layers::LayerManagerComposite::Render gfx/layers/composite/LayerManagerComposite.cpp:303
11 libxul.so mozilla::layers::LayerManagerComposite::EndTransaction gfx/layers/composite/LayerManagerComposite.cpp:194
12 libxul.so mozilla::layers::LayerManagerComposite::EndEmptyTransaction gfx/layers/composite/LayerManagerComposite.cpp:159
13 libxul.so mozilla::layers::CompositorParent::Composite gfx/layers/ipc/CompositorParent.cpp:478
14 libxul.so RunnableMethod<IPC::ChannelProxy::Context, void tuple.h:383
15 libxul.so MessageLoop::RunTask message_loop.cc:337
16 libxul.so MessageLoop::DeferOrRunPendingTask message_loop.cc:345
17 libxul.so MessageLoop::DoWork message_loop.cc:445
18 libxul.so base::MessagePumpDefault::Run message_pump_default.cc:23
19 libxul.so MessageLoop::RunInternal message_loop.cc:219
20 libxul.so MessageLoop::Run message_loop.cc:212
21 libxul.so base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:160
22 libxul.so ThreadFunc platform_thread_posix.cc:39
23 libc.so libc.so@0xe40a
24 libc.so libc.so@0xdafa
Comment 8•12 years ago
|
||
It appears that GetScaledTileLength() is 0, which shouldn't happen since its implementation is (TILEDLAYERBUFFER_TILE_SIZE is #define'd to 256):
uint32_t GetScaledTileLength() const { return TILEDLAYERBUFFER_TILE_SIZE / mResolution; }
Though I suppose it would be possible to get this stack if mResolution were 0.
Comment 9•12 years ago
|
||
(In reply to Brad Lassey [:blassey] from comment #8)
> It appears that GetScaledTileLength() is 0, which shouldn't happen since its
> implementation is (TILEDLAYERBUFFER_TILE_SIZE is #define'd to 256):
>
> uint32_t GetScaledTileLength() const { return TILEDLAYERBUFFER_TILE_SIZE
> / mResolution; }
>
> Though I suppose it would be possible to get this stack if mResolution were
> 0.
Did you mean mResolution > 256, so that GetScaledTileLength() returns 0? See Comment 2 and answer in Comment 4, though I still like that, if the stack is correct.
Comment 10•12 years ago
|
||
I agree with Milan's analysis. Maybe we should assert in the SetResolution call to ensure that the resolution being passed is <= 256. Then if it blows up we'll have a stack pointing a little closer to the bad code.
Comment 11•12 years ago
|
||
Brad, Milan can you please help find an assignee here ? From earlier comments it appears we know what the issue is and a probable way to fix it.
Flags: needinfo?(milan)
Flags: needinfo?(blassey.bugs)
Comment 12•12 years ago
|
||
I'm not convinced we know what it is. In particular, if we were getting a zero as a second argument to floor_div, we probably should hit an exception in line 208 (a % b), rather than line 209 (a / b). We could argue in circles here. Kats' suggestion from comment 10 would probably be useful, if it can get us more information. We don't have a reproducible case for this, right?
Flags: needinfo?(milan)
Updated•12 years ago
|
Assignee: nobody → bugmail.mozilla
Flags: needinfo?(blassey.bugs)
Updated•12 years ago
|
Keywords: steps-wanted
Updated•12 years ago
|
Comment 13•12 years ago
|
||
Not sure if the URLs are very helpful, but here they are:
2 http://www.bigmike.it/ircontrol/
1 http://www.google.com/news?vanilla=1
1 https://support.google.com/googleplay/answer/2851615?hl=de
1 http://www.pof.com/basicsearch.aspx?iama=m&minage=38&maxage=48&z_code=ta94ra&seekinga=fðnicity=0&sorting=0&miles=25&country=92&imagesetting=0&page=2&count=600
1 https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=11&ct=1372100582&rver=6.1.6
1 http://m.chip.de/news/Xbox-One-Microsoft-streicht-Online-Zwang-DRM_62600735.html
1 http://archive.org/search.php?query=%28collection%3Agutenberg%20OR%20mediatype%3Agutenberg%29%20AND%20-mediatype%3Acollection&sort=-avg_rating%3B-num_reviews&page=5
1 http://www.telegraph.co.uk/news/worldnews/northamerica/canada/10136184/Canadian-floods-claim-at-least-three-lives-and-force-evacuation-of-downtown-Calgary.html
1 http://www.realmofgaming.com/news/Remember-Me-Launch-Trailer/1822.html
Keywords: needURLs
Comment 14•12 years ago
|
||
every url may crash.
Comment 15•12 years ago
|
||
Attachment #767555 -
Flags: review?(chrislord.net)
| Reporter | ||
Updated•12 years ago
|
Whiteboard: [native-crash] → [native-crash][leave open]
Comment 16•12 years ago
|
||
Comment on attachment 767555 [details] [diff] [review]
Patch to diagnose the problem
Review of attachment 767555 [details] [diff] [review]:
-----------------------------------------------------------------
Fine, see comment.
::: gfx/layers/TiledLayerBuffer.h
@@ +139,5 @@
> if (mResolution == aResolution) {
> return;
> }
>
> + if (aResolution > TILEDLAYERBUFFER_TILE_SIZE) {
Afaik we only use tile resolution for low precision rendering (correct me if I'm wrong) - in which case, we can change this to aResolution > 1 - I'd also crash if aResolution < 0.25, seeing as that's the value we use. Basically any time mResolution isn't either 1 or 0.25, something funny is happening.
Attachment #767555 -
Flags: review?(chrislord.net) → review+
Comment 17•12 years ago
|
||
Updated the patch to catch other bugs too. Cwiiis r+'d on IRC.
Attachment #767555 -
Attachment is obsolete: true
Attachment #767750 -
Flags: review+
Comment 18•12 years ago
|
||
Adding MOZ_CRASH calls is worth at least a little try push: https://tbpl.mozilla.org/?tree=Try&rev=805036fbb0b0
Comment 20•12 years ago
|
||
Keywords: checkin-needed
Comment 21•12 years ago
|
||
| Reporter | ||
Comment 22•12 years ago
|
||
There have been no crashes since 25.0a1/20130627 (included). The working range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cc80aa0c7c13&tochange=3b955f306226
I suspect it's fixed by the patch of bug 885030.
Comment 23•12 years ago
|
||
That would be very surprising to me; that code just moved some functions from one file to another and shouldn't have any functional effects. If that did fix it, then either I screwed up the patch or it was a compiler bug.
| Reporter | ||
Comment 24•12 years ago
|
||
I talked too fast. There's one crash in 25.0a1/20130629: bp-0d9726de-4604-4f29-a896-88ae02130629.
| Reporter | ||
Updated•12 years ago
|
status-firefox25:
--- → affected
Comment 25•12 years ago
|
||
Well it looks like my MOZ_CRASH patch didn't change anything. These crashes are still happening. Could it be related to GetScaledTileLength() returning a uint32_t and floor_div taking an int? I don't see how the compiler could mis-convert that though.
Comment 26•12 years ago
|
||
QA would like to not use qawanted in conjunction with other specific requests (ie steps-wanted or needURLs) I'm not sure how widely that has been announced, but removing that keyword.
Keywords: qawanted
Comment 27•12 years ago
|
||
Comment on attachment 767750 [details] [diff] [review]
Patch to diagnose the problem
I backed out this patch since it didn't help any.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1222548020a6
Attachment #767750 -
Flags: checkin-
Comment 28•12 years ago
|
||
Comment 29•12 years ago
|
||
Any device OS or correlation that can help QA here ?
Flags: needinfo?(twalker)
Flags: needinfo?(kairo)
Comment 30•12 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #29)
> Any device OS or correlation that can help QA here ?
err, spoke too fast, meant *device or OS*
| Assignee | ||
Comment 31•12 years ago
|
||
Talked to Kats and I will be looking at this bug more.
Assignee: bugmail.mozilla → nchen
| Assignee | ||
Comment 32•12 years ago
|
||
This patch will save the value of mResolution at different times to the stack. Looking at these values from the crash dumps will hopefully shed some light on the crash.
Attachment #777336 -
Flags: review?(bugmail.mozilla)
Comment 33•12 years ago
|
||
Comment on attachment 777336 [details] [diff] [review]
Save TiledLayerBuffer::mResolution to stack to help investigate crash (v1)
Review of attachment 777336 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM but any substantive patches should probably be reviewed by Cwiiis.
Attachment #777336 -
Flags: review?(bugmail.mozilla) → review+
| Assignee | ||
Comment 34•12 years ago
|
||
Comment 35•12 years ago
|
||
Comment on attachment 777336 [details] [diff] [review]
Save TiledLayerBuffer::mResolution to stack to help investigate crash (v1)
Review of attachment 777336 [details] [diff] [review]:
-----------------------------------------------------------------
Drive-by r+
::: gfx/layers/composite/TiledContentHost.cpp
@@ +227,5 @@
> if (!mCompositor) {
> NS_WARNING("Can't render tiled content host - no compositor");
> return;
> }
> + volatile float resolution = aLayerBuffer.GetResolution(); // bug 881018 investigation
Might've been nice to be a bit more descriptive with the comment here, given the variable is actually used regardless of the investigation (so something like 'marked as volatile to help investigate bug 881018), but yeah, no biggie, I guess that's what version control is for :)
Attachment #777336 -
Flags: review+
| Assignee | ||
Comment 36•12 years ago
|
||
Looking at recent crash reports, the value of mResolution always stays at 1.0, which makes this crash even crazier. I want to back out the last patch and try this patch. It will crash whenever GetScaledTileLength() returns 0 which leads to the subsequent divide-by-0 exception.
Attachment #779812 -
Flags: review?(chrislord.net)
Comment 37•12 years ago
|
||
Device data for the last week of Aurora:
mozilla::layers::floor_div 48
Asus Transformer TF101 12
TOSHIBA AT100 11
Sony Tablet S 6
LENOVO ThinkPad Tablet 6
Motorola Xoom 5
LENOVO ThinkPadTablet 3
Asus Transformer 3
Motorola MZ604 2
Flags: needinfo?(twalker)
Flags: needinfo?(kairo)
Comment 38•12 years ago
|
||
Comment on attachment 779812 [details] [diff] [review]
Crash when TiledLayerBuffer::GetScaledTileLength returns 0 (v1)
Review of attachment 779812 [details] [diff] [review]:
-----------------------------------------------------------------
Sure.
Attachment #779812 -
Flags: review?(chrislord.net) → review+
Comment 39•12 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #37)
> Device data for the last week of Aurora:
>
> mozilla::layers::floor_div 48
> Asus Transformer TF101 12
> TOSHIBA AT100 11
> Sony Tablet S 6
> LENOVO ThinkPad Tablet 6
> Motorola Xoom 5
> LENOVO ThinkPadTablet 3
> Asus Transformer 3
> Motorola MZ604 2
Semi-interesting to note that these are all devices that could be running Honeycomb... Coincidence?
Comment 40•12 years ago
|
||
(In reply to Chris Lord [:cwiiis] from comment #39)
> Semi-interesting to note that these are all devices that could be running
> Honeycomb... Coincidence?
They could, but aren't. I just looked at a few of the reports, and I see API versions 10 (GB), 15 (ICS), and 17 (JB), most have 15, though.
What I find interesting though is that all of those are tablets.
Comment 41•12 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #40)
> (In reply to Chris Lord [:cwiiis] from comment #39)
> > Semi-interesting to note that these are all devices that could be running
> > Honeycomb... Coincidence?
>
> They could, but aren't. I just looked at a few of the reports, and I see API
> versions 10 (GB), 15 (ICS), and 17 (JB), most have 15, though.
>
> What I find interesting though is that all of those are tablets.
Are they all Tegra 2 tablets? Looks like it, but again, could just be coincidence...
| Assignee | ||
Comment 42•12 years ago
|
||
(In reply to Chris Lord [:cwiiis] from comment #41)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #40)
> > (In reply to Chris Lord [:cwiiis] from comment #39)
> > > Semi-interesting to note that these are all devices that could be running
> > > Honeycomb... Coincidence?
> >
> > They could, but aren't. I just looked at a few of the reports, and I see API
> > versions 10 (GB), 15 (ICS), and 17 (JB), most have 15, though.
> >
> > What I find interesting though is that all of those are tablets.
>
> Are they all Tegra 2 tablets? Looks like it, but again, could just be
> coincidence...
Seems like most are Tegras, but not all. Also, most are running on 2.6.* kernels, but again not all. The "not all" parts are what confuse me :)
| Assignee | ||
Comment 43•12 years ago
|
||
| Reporter | ||
Comment 46•12 years ago
|
||
There have been no crashes with this signature since the landing of the investigation patch! There are no new signatures appeared around July 25 either.
In case it's caused by another patch landed along the investigation one, here is the working range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2983ca6d4d1a&tochange=a4c1961bf723
| Reporter | ||
Comment 47•12 years ago
|
||
It seems a compiler bug. Can someone fix it in a proper way instead of the investigation patch?
| Assignee | ||
Comment 48•12 years ago
|
||
Assuming inlining causes GetScaledTileLength to return a wrong result, this patch should avoid that.
Attachment #783800 -
Flags: review?(chrislord.net)
Comment 49•12 years ago
|
||
Comment on attachment 783800 [details] [diff] [review]
Never inline TiledLayerBuffer::GetScaledTileLength to avoid crash (v1)
Review of attachment 783800 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM, let's hope this is it.
Attachment #783800 -
Flags: review?(chrislord.net) → review+
| Reporter | ||
Comment 50•12 years ago
|
||
You should backout the investigation patch to know whether it will fix it.
| Assignee | ||
Comment 51•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/31655889fdc6
https://hg.mozilla.org/integration/mozilla-inbound/rev/091fb11c0d27
Status: NEW → ASSIGNED
Whiteboard: [native-crash][leave open] → [native-crash]
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
| Reporter | ||
Comment 53•12 years ago
|
||
Backout of the investigation patch:
https://hg.mozilla.org/mozilla-central/rev/31655889fdc6
| Reporter | ||
Comment 54•12 years ago
|
||
Uplift to Aurora 24 before the merge?
| Assignee | ||
Comment 55•12 years ago
|
||
Comment on attachment 783800 [details] [diff] [review]
Never inline TiledLayerBuffer::GetScaledTileLength to avoid crash (v1)
[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A
User impact if declined: Random crash (#1 top crash in Aurora 24)
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): None; patch has no impact on functionality
String or IDL/UUID changes made by this patch: None
Attachment #783800 -
Flags: approval-mozilla-aurora?
Comment 56•12 years ago
|
||
Comment on attachment 783800 [details] [diff] [review]
Never inline TiledLayerBuffer::GetScaledTileLength to avoid crash (v1)
\
Attachment #783800 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 57•12 years ago
|
||
| Reporter | ||
Comment 58•12 years ago
|
||
Firefox 23 is affected (#7 top crasher in 23.0b10). See https://crash-stats.mozilla.com/report/list?product=FennecAndroid&signature=libpvrANDROID_WSEGL_SGX540_120.so%400xb0c
Crash Signature: [@ mozilla::layers::floor_div] → [@ mozilla::layers::floor_div]
[@ libpvrANDROID_WSEGL_SGX540_120.so@0xb0c ]
Updated•12 years ago
|
Keywords: steps-wanted
You need to log in
before you can comment on or make changes to this bug.
Description
•