Closed Bug 881018 Opened 12 years ago Closed 12 years ago

crash in mozilla::layers::floor_div

Categories

(Core :: Graphics: Layers, defect)

24 Branch
ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla25
Tracking Status
firefox23 --- affected
firefox24 + fixed
firefox25 --- fixed
fennec 24+ ---

People

(Reporter: scoobidiver, Assigned: jchen)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [native-crash])

Crash Data

Attachments

(4 files, 1 obsolete file)

It started spiking in 24.0a1/20130607. The regression range for the spike is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=204de5b7e0a6&tochange=dc8e78ed8c44 It might be a regression from bug 862952. Signature mozilla::layers::floor_div More Reports Search UUID 726b142d-ac26-4993-9cbc-259ca2130608 Date Processed 2013-06-08 18:33:32 Uptime 99 Last Crash 1.4 weeks before submission Install Age 1.6 minutes since version was first installed. Install Time 2013-06-08 18:30:40 Product FennecAndroid Version 24.0a1 Build ID 20130608031212 Release Channel nightly OS Android OS Version 0.0.0 Linux 2.6.39.4-g4e32b94 #1 SMP PREEMPT Mon Sep 10 14:01:08 PDT 2012 armv7l motorola/tervigon/wingray:4.1.2/JZO54K/485486:user/release-keys Build Architecture arm Build Architecture Info ARMv0 Crash Reason SIGFPE Crash Address 0x39b1 User Comments looking at non member section of Netflix Canada, i swear App Notes AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra -- OpenGL ES 2.0 -- Model: Xoom, Product: tervigon, Manufacturer: Motorola, Hardware: stingray' GL Layers! EGL? EGL+ GL Context? GL Context+ GL Layers+ Stagefright? Stagefright+ Motorola Xoom motorola/tervigon/wingray:4.1.2/JZO54K/485486:user/release-keys Processor Notes sp-processor01_phx1_mozilla_com_17437:2012; exploitability tool: ERROR: unable to analyze dump EMCheckCompatibility True Adapter Vendor ID NVIDIA Corporation Adapter Device ID NVIDIA Tegra Device Motorola Xoom Android API Version 16 (REL) Android CPU ABI armeabi-v7a Frame Module Signature Source 0 libc.so libc.so@0xdd00 1 libc.so libc.so@0x13937 2 libc.so libc.so@0x132e6 3 libxul.so mozilla::layers::floor_div gfx/layers/TiledLayerBuffer.h:209 4 libxul.so mozilla::layers::TiledContentHost::RenderLayerBuffer gfx/layers/TiledLayerBuffer.h:228 More reports at: https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayers%3A%3Afloor_div
It's #5 top crasher in 24.0a1.
tracking-fennec: --- → ?
Keywords: topcrash
It looks like if TiledLayerBuffer::SetResolution is called with > 256, we'll have a divide by zero in floor_div.
Flags: needinfo?(ncameron)
You probably want cwiiis for this, I don't know how the resolution should affect the tile buffer.
Flags: needinfo?(ncameron) → needinfo?(chrislord.net)
Yes, but the tiled buffer resolution is strictly controlled, so this should never happen. We only use it for low precision rendering at the moment, so if there's a case where it isn't <= 1, we have some other problem.
Flags: needinfo?(chrislord.net)
A comment says: "pinching to zoom".
tracking-fennec: ? → 24+
There are now non buggy stack traces. See bp-604ef497-0640-4731-82d1-7b2f72130621.
From that report: 0 libc.so libc.so@0x18188 1 libc.so libc.so@0xeee7 2 libc.so libc.so@0xe91e 3 libxul.so mozilla::layers::floor_div gfx/layers/TiledLayerBuffer.h:209 4 libxul.so mozilla::layers::TiledContentHost::RenderLayerBuffer gfx/layers/TiledLayerBuffer.h:228 5 libxul.so mozilla::layers::TiledContentHost::Composite gfx/layers/composite/TiledContentHost.cpp:171 6 libxul.so mozilla::layers::ThebesLayerComposite::RenderLayer gfx/layers/composite/ThebesLayerComposite.cpp:136 7 libxul.so void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> 8 libxul.so void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> 9 libxul.so void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> 10 libxul.so mozilla::layers::LayerManagerComposite::Render gfx/layers/composite/LayerManagerComposite.cpp:303 11 libxul.so mozilla::layers::LayerManagerComposite::EndTransaction gfx/layers/composite/LayerManagerComposite.cpp:194 12 libxul.so mozilla::layers::LayerManagerComposite::EndEmptyTransaction gfx/layers/composite/LayerManagerComposite.cpp:159 13 libxul.so mozilla::layers::CompositorParent::Composite gfx/layers/ipc/CompositorParent.cpp:478 14 libxul.so RunnableMethod<IPC::ChannelProxy::Context, void tuple.h:383 15 libxul.so MessageLoop::RunTask message_loop.cc:337 16 libxul.so MessageLoop::DeferOrRunPendingTask message_loop.cc:345 17 libxul.so MessageLoop::DoWork message_loop.cc:445 18 libxul.so base::MessagePumpDefault::Run message_pump_default.cc:23 19 libxul.so MessageLoop::RunInternal message_loop.cc:219 20 libxul.so MessageLoop::Run message_loop.cc:212 21 libxul.so base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:160 22 libxul.so ThreadFunc platform_thread_posix.cc:39 23 libc.so libc.so@0xe40a 24 libc.so libc.so@0xdafa
It appears that GetScaledTileLength() is 0, which shouldn't happen since its implementation is (TILEDLAYERBUFFER_TILE_SIZE is #define'd to 256): uint32_t GetScaledTileLength() const { return TILEDLAYERBUFFER_TILE_SIZE / mResolution; } Though I suppose it would be possible to get this stack if mResolution were 0.
(In reply to Brad Lassey [:blassey] from comment #8) > It appears that GetScaledTileLength() is 0, which shouldn't happen since its > implementation is (TILEDLAYERBUFFER_TILE_SIZE is #define'd to 256): > > uint32_t GetScaledTileLength() const { return TILEDLAYERBUFFER_TILE_SIZE > / mResolution; } > > Though I suppose it would be possible to get this stack if mResolution were > 0. Did you mean mResolution > 256, so that GetScaledTileLength() returns 0? See Comment 2 and answer in Comment 4, though I still like that, if the stack is correct.
I agree with Milan's analysis. Maybe we should assert in the SetResolution call to ensure that the resolution being passed is <= 256. Then if it blows up we'll have a stack pointing a little closer to the bad code.
Brad, Milan can you please help find an assignee here ? From earlier comments it appears we know what the issue is and a probable way to fix it.
Flags: needinfo?(milan)
Flags: needinfo?(blassey.bugs)
I'm not convinced we know what it is. In particular, if we were getting a zero as a second argument to floor_div, we probably should hit an exception in line 208 (a % b), rather than line 209 (a / b). We could argue in circles here. Kats' suggestion from comment 10 would probably be useful, if it can get us more information. We don't have a reproducible case for this, right?
Flags: needinfo?(milan)
Assignee: nobody → bugmail.mozilla
Flags: needinfo?(blassey.bugs)
Keywords: steps-wanted
Keywords: needURLs, qawanted
every url may crash.
Whiteboard: [native-crash] → [native-crash][leave open]
Comment on attachment 767555 [details] [diff] [review] Patch to diagnose the problem Review of attachment 767555 [details] [diff] [review]: ----------------------------------------------------------------- Fine, see comment. ::: gfx/layers/TiledLayerBuffer.h @@ +139,5 @@ > if (mResolution == aResolution) { > return; > } > > + if (aResolution > TILEDLAYERBUFFER_TILE_SIZE) { Afaik we only use tile resolution for low precision rendering (correct me if I'm wrong) - in which case, we can change this to aResolution > 1 - I'd also crash if aResolution < 0.25, seeing as that's the value we use. Basically any time mResolution isn't either 1 or 0.25, something funny is happening.
Attachment #767555 - Flags: review?(chrislord.net) → review+
Updated the patch to catch other bugs too. Cwiiis r+'d on IRC.
Attachment #767555 - Attachment is obsolete: true
Attachment #767750 - Flags: review+
Adding MOZ_CRASH calls is worth at least a little try push: https://tbpl.mozilla.org/?tree=Try&rev=805036fbb0b0
Try looks fine, inbound is closed.
Keywords: checkin-needed
There have been no crashes since 25.0a1/20130627 (included). The working range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cc80aa0c7c13&tochange=3b955f306226 I suspect it's fixed by the patch of bug 885030.
That would be very surprising to me; that code just moved some functions from one file to another and shouldn't have any functional effects. If that did fix it, then either I screwed up the patch or it was a compiler bug.
I talked too fast. There's one crash in 25.0a1/20130629: bp-0d9726de-4604-4f29-a896-88ae02130629.
Well it looks like my MOZ_CRASH patch didn't change anything. These crashes are still happening. Could it be related to GetScaledTileLength() returning a uint32_t and floor_div taking an int? I don't see how the compiler could mis-convert that though.
QA would like to not use qawanted in conjunction with other specific requests (ie steps-wanted or needURLs) I'm not sure how widely that has been announced, but removing that keyword.
Keywords: qawanted
Comment on attachment 767750 [details] [diff] [review] Patch to diagnose the problem I backed out this patch since it didn't help any. https://hg.mozilla.org/integration/mozilla-inbound/rev/1222548020a6
Attachment #767750 - Flags: checkin-
Any device OS or correlation that can help QA here ?
Flags: needinfo?(twalker)
Flags: needinfo?(kairo)
(In reply to bhavana bajaj [:bajaj] from comment #29) > Any device OS or correlation that can help QA here ? err, spoke too fast, meant *device or OS*
Talked to Kats and I will be looking at this bug more.
Assignee: bugmail.mozilla → nchen
This patch will save the value of mResolution at different times to the stack. Looking at these values from the crash dumps will hopefully shed some light on the crash.
Attachment #777336 - Flags: review?(bugmail.mozilla)
Comment on attachment 777336 [details] [diff] [review] Save TiledLayerBuffer::mResolution to stack to help investigate crash (v1) Review of attachment 777336 [details] [diff] [review]: ----------------------------------------------------------------- LGTM but any substantive patches should probably be reviewed by Cwiiis.
Attachment #777336 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 777336 [details] [diff] [review] Save TiledLayerBuffer::mResolution to stack to help investigate crash (v1) Review of attachment 777336 [details] [diff] [review]: ----------------------------------------------------------------- Drive-by r+ ::: gfx/layers/composite/TiledContentHost.cpp @@ +227,5 @@ > if (!mCompositor) { > NS_WARNING("Can't render tiled content host - no compositor"); > return; > } > + volatile float resolution = aLayerBuffer.GetResolution(); // bug 881018 investigation Might've been nice to be a bit more descriptive with the comment here, given the variable is actually used regardless of the investigation (so something like 'marked as volatile to help investigate bug 881018), but yeah, no biggie, I guess that's what version control is for :)
Attachment #777336 - Flags: review+
Looking at recent crash reports, the value of mResolution always stays at 1.0, which makes this crash even crazier. I want to back out the last patch and try this patch. It will crash whenever GetScaledTileLength() returns 0 which leads to the subsequent divide-by-0 exception.
Attachment #779812 - Flags: review?(chrislord.net)
Device data for the last week of Aurora: mozilla::layers::floor_div 48 Asus Transformer TF101 12 TOSHIBA AT100 11 Sony Tablet S 6 LENOVO ThinkPad Tablet 6 Motorola Xoom 5 LENOVO ThinkPadTablet 3 Asus Transformer 3 Motorola MZ604 2
Flags: needinfo?(twalker)
Flags: needinfo?(kairo)
Comment on attachment 779812 [details] [diff] [review] Crash when TiledLayerBuffer::GetScaledTileLength returns 0 (v1) Review of attachment 779812 [details] [diff] [review]: ----------------------------------------------------------------- Sure.
Attachment #779812 - Flags: review?(chrislord.net) → review+
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #37) > Device data for the last week of Aurora: > > mozilla::layers::floor_div 48 > Asus Transformer TF101 12 > TOSHIBA AT100 11 > Sony Tablet S 6 > LENOVO ThinkPad Tablet 6 > Motorola Xoom 5 > LENOVO ThinkPadTablet 3 > Asus Transformer 3 > Motorola MZ604 2 Semi-interesting to note that these are all devices that could be running Honeycomb... Coincidence?
(In reply to Chris Lord [:cwiiis] from comment #39) > Semi-interesting to note that these are all devices that could be running > Honeycomb... Coincidence? They could, but aren't. I just looked at a few of the reports, and I see API versions 10 (GB), 15 (ICS), and 17 (JB), most have 15, though. What I find interesting though is that all of those are tablets.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #40) > (In reply to Chris Lord [:cwiiis] from comment #39) > > Semi-interesting to note that these are all devices that could be running > > Honeycomb... Coincidence? > > They could, but aren't. I just looked at a few of the reports, and I see API > versions 10 (GB), 15 (ICS), and 17 (JB), most have 15, though. > > What I find interesting though is that all of those are tablets. Are they all Tegra 2 tablets? Looks like it, but again, could just be coincidence...
(In reply to Chris Lord [:cwiiis] from comment #41) > (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #40) > > (In reply to Chris Lord [:cwiiis] from comment #39) > > > Semi-interesting to note that these are all devices that could be running > > > Honeycomb... Coincidence? > > > > They could, but aren't. I just looked at a few of the reports, and I see API > > versions 10 (GB), 15 (ICS), and 17 (JB), most have 15, though. > > > > What I find interesting though is that all of those are tablets. > > Are they all Tegra 2 tablets? Looks like it, but again, could just be > coincidence... Seems like most are Tegras, but not all. Also, most are running on 2.6.* kernels, but again not all. The "not all" parts are what confuse me :)
There have been no crashes with this signature since the landing of the investigation patch! There are no new signatures appeared around July 25 either. In case it's caused by another patch landed along the investigation one, here is the working range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2983ca6d4d1a&tochange=a4c1961bf723
It seems a compiler bug. Can someone fix it in a proper way instead of the investigation patch?
Assuming inlining causes GetScaledTileLength to return a wrong result, this patch should avoid that.
Attachment #783800 - Flags: review?(chrislord.net)
Comment on attachment 783800 [details] [diff] [review] Never inline TiledLayerBuffer::GetScaledTileLength to avoid crash (v1) Review of attachment 783800 [details] [diff] [review]: ----------------------------------------------------------------- LGTM, let's hope this is it.
Attachment #783800 - Flags: review?(chrislord.net) → review+
You should backout the investigation patch to know whether it will fix it.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Uplift to Aurora 24 before the merge?
Comment on attachment 783800 [details] [diff] [review] Never inline TiledLayerBuffer::GetScaledTileLength to avoid crash (v1) [Approval Request Comment] Bug caused by (feature/regressing bug #): N/A User impact if declined: Random crash (#1 top crash in Aurora 24) Testing completed (on m-c, etc.): m-c Risk to taking this patch (and alternatives if risky): None; patch has no impact on functionality String or IDL/UUID changes made by this patch: None
Attachment #783800 - Flags: approval-mozilla-aurora?
Comment on attachment 783800 [details] [diff] [review] Never inline TiledLayerBuffer::GetScaledTileLength to avoid crash (v1) \
Attachment #783800 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Crash Signature: [@ mozilla::layers::floor_div] → [@ mozilla::layers::floor_div] [@ libpvrANDROID_WSEGL_SGX540_120.so@0xb0c ]
Keywords: steps-wanted
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: