crash in mozilla::layers::floor_div

RESOLVED FIXED in Firefox 24

Status

()

Core
Graphics: Layers
--
critical
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: Scoobidiver (away), Assigned: jchen)

Tracking

({crash, regression, topcrash})

24 Branch
mozilla25
ARM
Android
crash, regression, topcrash
Points:
---

Firefox Tracking Flags

(firefox23 affected, firefox24+ fixed, firefox25 fixed, fennec24+)

Details

(Whiteboard: [native-crash], crash signature)

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

5 years ago
It started spiking in 24.0a1/20130607. The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=204de5b7e0a6&tochange=dc8e78ed8c44
It might be a regression from bug 862952.

Signature 	mozilla::layers::floor_div More Reports Search
UUID	726b142d-ac26-4993-9cbc-259ca2130608
Date Processed	2013-06-08 18:33:32
Uptime	99
Last Crash	1.4 weeks before submission
Install Age	1.6 minutes since version was first installed.
Install Time	2013-06-08 18:30:40
Product	FennecAndroid
Version	24.0a1
Build ID	20130608031212
Release Channel	nightly
OS	Android
OS Version	0.0.0 Linux 2.6.39.4-g4e32b94 #1 SMP PREEMPT Mon Sep 10 14:01:08 PDT 2012 armv7l motorola/tervigon/wingray:4.1.2/JZO54K/485486:user/release-keys
Build Architecture	arm
Build Architecture Info	ARMv0
Crash Reason	SIGFPE
Crash Address	0x39b1
User Comments	looking at non member section of Netflix Canada, i swear
App Notes 	
AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra -- OpenGL ES 2.0 -- Model: Xoom, Product: tervigon, Manufacturer: Motorola, Hardware: stingray'
GL Layers! EGL? EGL+ GL Context? GL Context+ GL Layers+ Stagefright? Stagefright+ 
Motorola Xoom
motorola/tervigon/wingray:4.1.2/JZO54K/485486:user/release-keys
Processor Notes 	sp-processor01_phx1_mozilla_com_17437:2012; exploitability tool: ERROR: unable to analyze dump
EMCheckCompatibility	True
Adapter Vendor ID	NVIDIA Corporation
Adapter Device ID	NVIDIA Tegra
Device	Motorola Xoom
Android API Version	16 (REL)
Android CPU ABI	armeabi-v7a

Frame 	Module 	Signature 	Source
0 	libc.so 	libc.so@0xdd00 	
1 	libc.so 	libc.so@0x13937 	
2 	libc.so 	libc.so@0x132e6 	
3 	libxul.so 	mozilla::layers::floor_div 	gfx/layers/TiledLayerBuffer.h:209
4 	libxul.so 	mozilla::layers::TiledContentHost::RenderLayerBuffer 	gfx/layers/TiledLayerBuffer.h:228

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayers%3A%3Afloor_div
(Reporter)

Comment 1

5 years ago
It's #5 top crasher in 24.0a1.
tracking-fennec: --- → ?
tracking-firefox24: --- → ?
Keywords: topcrash
It looks like if TiledLayerBuffer::SetResolution is called with > 256, we'll have a divide by zero in floor_div.
Flags: needinfo?(ncameron)

Comment 3

5 years ago
You probably want cwiiis for this, I don't know how the resolution should affect the tile buffer.
Flags: needinfo?(ncameron) → needinfo?(chrislord.net)

Updated

5 years ago
tracking-firefox24: ? → +

Comment 4

5 years ago
Yes, but the tiled buffer resolution is strictly controlled, so this should never happen. We only use it for low precision rendering at the moment, so if there's a case where it isn't <= 1, we have some other problem.
Flags: needinfo?(chrislord.net)
(Reporter)

Comment 5

5 years ago
A comment says: "pinching to zoom".
tracking-fennec: ? → 24+
(Reporter)

Comment 6

5 years ago
There are now non buggy stack traces. See bp-604ef497-0640-4731-82d1-7b2f72130621.
From that report:
0 	libc.so 	libc.so@0x18188 	
1 	libc.so 	libc.so@0xeee7 	
2 	libc.so 	libc.so@0xe91e 	
3 	libxul.so 	mozilla::layers::floor_div 	gfx/layers/TiledLayerBuffer.h:209
4 	libxul.so 	mozilla::layers::TiledContentHost::RenderLayerBuffer 	gfx/layers/TiledLayerBuffer.h:228
5 	libxul.so 	mozilla::layers::TiledContentHost::Composite 	gfx/layers/composite/TiledContentHost.cpp:171
6 	libxul.so 	mozilla::layers::ThebesLayerComposite::RenderLayer 	gfx/layers/composite/ThebesLayerComposite.cpp:136
7 	libxul.so 	void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> 	
8 	libxul.so 	void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> 	
9 	libxul.so 	void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> 	
10 	libxul.so 	mozilla::layers::LayerManagerComposite::Render 	gfx/layers/composite/LayerManagerComposite.cpp:303
11 	libxul.so 	mozilla::layers::LayerManagerComposite::EndTransaction 	gfx/layers/composite/LayerManagerComposite.cpp:194
12 	libxul.so 	mozilla::layers::LayerManagerComposite::EndEmptyTransaction 	gfx/layers/composite/LayerManagerComposite.cpp:159
13 	libxul.so 	mozilla::layers::CompositorParent::Composite 	gfx/layers/ipc/CompositorParent.cpp:478
14 	libxul.so 	RunnableMethod<IPC::ChannelProxy::Context, void 	tuple.h:383
15 	libxul.so 	MessageLoop::RunTask 	message_loop.cc:337
16 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	message_loop.cc:345
17 	libxul.so 	MessageLoop::DoWork 	message_loop.cc:445
18 	libxul.so 	base::MessagePumpDefault::Run 	message_pump_default.cc:23
19 	libxul.so 	MessageLoop::RunInternal 	message_loop.cc:219
20 	libxul.so 	MessageLoop::Run 	message_loop.cc:212
21 	libxul.so 	base::Thread::ThreadMain 	ipc/chromium/src/base/thread.cc:160
22 	libxul.so 	ThreadFunc 	platform_thread_posix.cc:39
23 	libc.so 	libc.so@0xe40a 	
24 	libc.so 	libc.so@0xdafa
It appears that GetScaledTileLength() is 0, which shouldn't happen since its implementation is (TILEDLAYERBUFFER_TILE_SIZE is #define'd to 256):

   uint32_t GetScaledTileLength() const { return TILEDLAYERBUFFER_TILE_SIZE / mResolution; }

Though I suppose it would be possible to get this stack if mResolution were 0.
(In reply to Brad Lassey [:blassey] from comment #8)
> It appears that GetScaledTileLength() is 0, which shouldn't happen since its
> implementation is (TILEDLAYERBUFFER_TILE_SIZE is #define'd to 256):
> 
>    uint32_t GetScaledTileLength() const { return TILEDLAYERBUFFER_TILE_SIZE
> / mResolution; }
> 
> Though I suppose it would be possible to get this stack if mResolution were
> 0.

Did you mean mResolution > 256, so that GetScaledTileLength() returns 0?  See Comment 2 and answer in Comment 4, though I still like that, if the stack is correct.
I agree with Milan's analysis. Maybe we should assert in the SetResolution call to ensure that the resolution being passed is <= 256. Then if it blows up we'll have a stack pointing a little closer to the bad code.
Brad, Milan can you please help find an assignee here ? From earlier comments it appears we know what the issue is and a probable way to fix it.
Flags: needinfo?(milan)
Flags: needinfo?(blassey.bugs)
I'm not convinced we know what it is.  In particular, if we were getting a zero as a second argument to floor_div, we probably should hit an exception in line 208 (a % b), rather than line 209 (a / b).  We could argue in circles here.  Kats' suggestion from comment 10 would probably be useful, if it can get us more information.  We don't have a reproducible case for this, right?
Flags: needinfo?(milan)
Assignee: nobody → bugmail.mozilla
Flags: needinfo?(blassey.bugs)

Updated

5 years ago
Keywords: steps-wanted

Updated

5 years ago
Keywords: needURLs, qawanted

Comment 14

5 years ago
every url may crash.
Created attachment 767555 [details] [diff] [review]
Patch to diagnose the problem
Attachment #767555 - Flags: review?(chrislord.net)
(Reporter)

Updated

5 years ago
Whiteboard: [native-crash] → [native-crash][leave open]
Comment on attachment 767555 [details] [diff] [review]
Patch to diagnose the problem

Review of attachment 767555 [details] [diff] [review]:
-----------------------------------------------------------------

Fine, see comment.

::: gfx/layers/TiledLayerBuffer.h
@@ +139,5 @@
>      if (mResolution == aResolution) {
>        return;
>      }
>  
> +    if (aResolution > TILEDLAYERBUFFER_TILE_SIZE) {

Afaik we only use tile resolution for low precision rendering (correct me if I'm wrong) - in which case, we can change this to aResolution > 1 - I'd also crash if aResolution < 0.25, seeing as that's the value we use. Basically any time mResolution isn't either 1 or 0.25, something funny is happening.
Attachment #767555 - Flags: review?(chrislord.net) → review+
Created attachment 767750 [details] [diff] [review]
Patch to diagnose the problem

Updated the patch to catch other bugs too. Cwiiis r+'d on IRC.
Attachment #767555 - Attachment is obsolete: true
Attachment #767750 - Flags: review+
Adding MOZ_CRASH calls is worth at least a little try push: https://tbpl.mozilla.org/?tree=Try&rev=805036fbb0b0
Try looks fine, inbound is closed.
Keywords: checkin-needed
(Reporter)

Comment 22

5 years ago
There have been no crashes since 25.0a1/20130627 (included). The working range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cc80aa0c7c13&tochange=3b955f306226
I suspect it's fixed by the patch of bug 885030.
That would be very surprising to me; that code just moved some functions from one file to another and shouldn't have any functional effects. If that did fix it, then either I screwed up the patch or it was a compiler bug.
(Reporter)

Comment 24

5 years ago
I talked too fast. There's one crash in 25.0a1/20130629: bp-0d9726de-4604-4f29-a896-88ae02130629.
(Reporter)

Updated

5 years ago
status-firefox25: --- → affected
Well it looks like my MOZ_CRASH patch didn't change anything. These crashes are still happening. Could it be related to GetScaledTileLength() returning a uint32_t and floor_div taking an int? I don't see how the compiler could mis-convert that though.
QA would like to not use qawanted in conjunction with other specific requests (ie steps-wanted or needURLs) I'm not sure how widely that has been announced, but removing that keyword.
Keywords: qawanted
Comment on attachment 767750 [details] [diff] [review]
Patch to diagnose the problem

I backed out this patch since it didn't help any.

https://hg.mozilla.org/integration/mozilla-inbound/rev/1222548020a6
Attachment #767750 - Flags: checkin-
Any device OS  or correlation that can help QA here ?
Flags: needinfo?(twalker)
Flags: needinfo?(kairo)
(In reply to bhavana bajaj [:bajaj] from comment #29)
> Any device OS  or correlation that can help QA here ?

err, spoke too fast, meant *device or OS*
(Assignee)

Comment 31

5 years ago
Talked to Kats and I will be looking at this bug more.
Assignee: bugmail.mozilla → nchen
(Assignee)

Comment 32

5 years ago
Created attachment 777336 [details] [diff] [review]
Save TiledLayerBuffer::mResolution to stack to help investigate crash (v1)

This patch will save the value of mResolution at different times to the stack. Looking at these values from the crash dumps will hopefully shed some light on the crash.
Attachment #777336 - Flags: review?(bugmail.mozilla)
Comment on attachment 777336 [details] [diff] [review]
Save TiledLayerBuffer::mResolution to stack to help investigate crash (v1)

Review of attachment 777336 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM but any substantive patches should probably be reviewed by Cwiiis.
Attachment #777336 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 777336 [details] [diff] [review]
Save TiledLayerBuffer::mResolution to stack to help investigate crash (v1)

Review of attachment 777336 [details] [diff] [review]:
-----------------------------------------------------------------

Drive-by r+

::: gfx/layers/composite/TiledContentHost.cpp
@@ +227,5 @@
>    if (!mCompositor) {
>      NS_WARNING("Can't render tiled content host - no compositor");
>      return;
>    }
> +  volatile float resolution = aLayerBuffer.GetResolution(); // bug 881018 investigation

Might've been nice to be a bit more descriptive with the comment here, given the variable is actually used regardless of the investigation (so something like 'marked as volatile to help investigate bug 881018), but yeah, no biggie, I guess that's what version control is for :)
Attachment #777336 - Flags: review+
(Assignee)

Comment 36

5 years ago
Created attachment 779812 [details] [diff] [review]
Crash when TiledLayerBuffer::GetScaledTileLength returns 0 (v1)

Looking at recent crash reports, the value of mResolution always stays at 1.0, which makes this crash even crazier. I want to back out the last patch and try this patch. It will crash whenever GetScaledTileLength() returns 0 which leads to the subsequent divide-by-0 exception.
Attachment #779812 - Flags: review?(chrislord.net)

Comment 37

5 years ago
Device data for the last week of Aurora:

mozilla::layers::floor_div 	48
Asus Transformer TF101 	12
TOSHIBA AT100 	11
Sony Tablet S 	6
LENOVO ThinkPad Tablet 	6
Motorola Xoom 	5
LENOVO ThinkPadTablet 	3
Asus Transformer 	3
Motorola MZ604 	2
Flags: needinfo?(twalker)
Flags: needinfo?(kairo)
Comment on attachment 779812 [details] [diff] [review]
Crash when TiledLayerBuffer::GetScaledTileLength returns 0 (v1)

Review of attachment 779812 [details] [diff] [review]:
-----------------------------------------------------------------

Sure.
Attachment #779812 - Flags: review?(chrislord.net) → review+
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #37)
> Device data for the last week of Aurora:
> 
> mozilla::layers::floor_div 	48
> Asus Transformer TF101 	12
> TOSHIBA AT100 	11
> Sony Tablet S 	6
> LENOVO ThinkPad Tablet 	6
> Motorola Xoom 	5
> LENOVO ThinkPadTablet 	3
> Asus Transformer 	3
> Motorola MZ604 	2

Semi-interesting to note that these are all devices that could be running Honeycomb... Coincidence?

Comment 40

5 years ago
(In reply to Chris Lord [:cwiiis] from comment #39)
> Semi-interesting to note that these are all devices that could be running
> Honeycomb... Coincidence?

They could, but aren't. I just looked at a few of the reports, and I see API versions 10 (GB), 15 (ICS), and 17 (JB), most have 15, though.

What I find interesting though is that all of those are tablets.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #40)
> (In reply to Chris Lord [:cwiiis] from comment #39)
> > Semi-interesting to note that these are all devices that could be running
> > Honeycomb... Coincidence?
> 
> They could, but aren't. I just looked at a few of the reports, and I see API
> versions 10 (GB), 15 (ICS), and 17 (JB), most have 15, though.
> 
> What I find interesting though is that all of those are tablets.

Are they all Tegra 2 tablets? Looks like it, but again, could just be coincidence...
(Assignee)

Comment 42

5 years ago
(In reply to Chris Lord [:cwiiis] from comment #41)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #40)
> > (In reply to Chris Lord [:cwiiis] from comment #39)
> > > Semi-interesting to note that these are all devices that could be running
> > > Honeycomb... Coincidence?
> > 
> > They could, but aren't. I just looked at a few of the reports, and I see API
> > versions 10 (GB), 15 (ICS), and 17 (JB), most have 15, though.
> > 
> > What I find interesting though is that all of those are tablets.
> 
> Are they all Tegra 2 tablets? Looks like it, but again, could just be
> coincidence...

Seems like most are Tegras, but not all. Also, most are running on 2.6.* kernels, but again not all. The "not all" parts are what confuse me :)
(Reporter)

Comment 46

5 years ago
There have been no crashes with this signature since the landing of the investigation patch! There are no new signatures appeared around July 25 either.
In case it's caused by another patch landed along the investigation one, here is the working range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2983ca6d4d1a&tochange=a4c1961bf723
(Reporter)

Comment 47

5 years ago
It seems a compiler bug. Can someone fix it in a proper way instead of the investigation patch?
(Assignee)

Comment 48

5 years ago
Created attachment 783800 [details] [diff] [review]
Never inline TiledLayerBuffer::GetScaledTileLength to avoid crash (v1)

Assuming inlining causes GetScaledTileLength to return a wrong result, this patch should avoid that.
Attachment #783800 - Flags: review?(chrislord.net)
Comment on attachment 783800 [details] [diff] [review]
Never inline TiledLayerBuffer::GetScaledTileLength to avoid crash (v1)

Review of attachment 783800 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, let's hope this is it.
Attachment #783800 - Flags: review?(chrislord.net) → review+
(Reporter)

Comment 50

5 years ago
You should backout the investigation patch to know whether it will fix it.
https://hg.mozilla.org/mozilla-central/rev/091fb11c0d27
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
(Reporter)

Comment 53

5 years ago
Backout of the investigation patch:
https://hg.mozilla.org/mozilla-central/rev/31655889fdc6
status-firefox25: affected → fixed
(Reporter)

Comment 54

5 years ago
Uplift to Aurora 24 before the merge?
(Assignee)

Comment 55

5 years ago
Comment on attachment 783800 [details] [diff] [review]
Never inline TiledLayerBuffer::GetScaledTileLength to avoid crash (v1)

[Approval Request Comment]

Bug caused by (feature/regressing bug #): N/A

User impact if declined: Random crash (#1 top crash in Aurora 24)

Testing completed (on m-c, etc.): m-c

Risk to taking this patch (and alternatives if risky): None; patch has no impact on functionality

String or IDL/UUID changes made by this patch: None
Attachment #783800 - Flags: approval-mozilla-aurora?
Comment on attachment 783800 [details] [diff] [review]
Never inline TiledLayerBuffer::GetScaledTileLength to avoid crash (v1)

\
Attachment #783800 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Reporter)

Comment 58

5 years ago
Firefox 23 is affected (#7 top crasher in 23.0b10). See https://crash-stats.mozilla.com/report/list?product=FennecAndroid&signature=libpvrANDROID_WSEGL_SGX540_120.so%400xb0c
Crash Signature: [@ mozilla::layers::floor_div] → [@ mozilla::layers::floor_div] [@ libpvrANDROID_WSEGL_SGX540_120.so@0xb0c ]
status-firefox23: unaffected → affected

Updated

4 years ago
Keywords: steps-wanted
You need to log in before you can comment on or make changes to this bug.