Closed Bug 847002 Opened 7 years ago Closed 7 years ago

crash in AlignWithLayerPixels @ PresShell::Paint

Categories

(Core :: Widget: Android, defect, critical)

22 Branch
ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla22
Tracking Status
firefox20 --- unaffected
firefox21 --- fixed
firefox22 + fixed

People

(Reporter: scoobidiver, Assigned: kats)

References

Details

(4 keywords, Whiteboard: [native-crash])

Crash Data

Attachments

(2 files, 1 obsolete file)

With the below stack trace, it first showed up in 22.0a1/20130301 and is now #2 top crasher in that build (currently hit by 6 users). The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b0e08db3bc2a&tochange=993d7aff3109
I suspect bug 844275.

Signature 	PresShell::Paint(nsView*, nsRegion const&, unsigned int) More Reports Search
UUID	4d56fe7c-3c63-43ec-bf76-9c1792130301
Date Processed	2013-03-01 23:08:42
Uptime	74
Last Crash	1.4 minutes before submission
Install Age	8.1 hours since version was first installed.
Install Time	2013-03-01 15:03:38
Product	FennecAndroid
Version	22.0a1
Build ID	20130301030909
Release Channel	nightly
OS	Android
OS Version	0.0.0 Linux 2.6.32.9-perf #1 PREEMPT Mon Apr 2 10:39:26 KST 2012 armv7l KDDI/PTI06/jmasai/8x50:2.2.1/FRG83/sirius_alpha_ver4_0402:user/release-keys
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0x0
App Notes 	
AdapterDescription: 'Qualcomm -- Adreno 200 -- OpenGL ES 2.0 1309647 -- Model: IS06, Product: PTI06, Manufacturer: PANTECH, Hardware: qcom'
EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ 
PANTECH IS06
KDDI/PTI06/jmasai/8x50:2.2.1/FRG83/sirius_alpha_ver4_0402:user/release-keys
Processor Notes 	sp-processor01.phx1.mozilla.com_16743:2008; exploitablity tool: ERROR: unable to analyze dump
EMCheckCompatibility	True
Adapter Vendor ID	Qualcomm
Adapter Device ID	Adreno 200
Device	PANTECH IS06
Android API Version	8 (REL)
Android CPU ABI	armeabi-v7a

Frame 	Module 	Signature 	Source
0 	libxul.so 	PresShell::Paint 	obj-firefox/dist/include/Layers.h:262
1 	libxul.so 	nsView::GetNearestWidget const 	view/src/nsView.cpp:852
2 	libxul.so 	AlignWithLayerPixels 	layout/generic/nsGfxScrollFrame.cpp:1787

More reports at:
https://crash-stats.mozilla.com/report/list?product=FennecAndroid&signature=PresShell%3A%3APaint%28nsView*%2C%20nsRegion%20const%26%2C%20unsigned%20int%29
The stack in these crashes looks messed up, frame 0 is the only reasonable thing. That being said, it does look like it could be caused by bug 844275, because if GetLayerManager returns null in PresShell::Paint, then the function could end up calling HasShadowManager() on a null object, which would explain the Layers.h:262 crash source. I will investigate.
Assignee: nobody → bugmail.mozilla
Blocks: 844275
Crash Signature: [@ PresShell::Paint(nsView*, nsRegion const&, unsigned int)] → [@ PresShell::Paint(nsView*, nsRegion const&, unsigned int) ]
I stuck a breakpoint at nsPresShell::Paint to see where it gets called from. The main codepaths I found are these:

#0  PresShell::Paint (this=0x5eef6540, aViewToPaint=0x621fe9c0, aDirtyRegion=..., aFlags=1) at /Users/kats/zspace/mozilla-git/layout/base/nsPresShell.cpp:5513
#1  0x6788d878 in nsViewManager::ProcessPendingUpdatesForView (this=0x62402a00, aView=0x621fe9c0, aFlushDirtyRegion=true) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:400
#2  0x6788f3ee in nsViewManager::ProcessPendingUpdates (this=0x62402a00) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:1120
#3  0x67250d6a in nsRefreshDriver::Tick (this=0x61299200, aNowEpoch=1362398490480590, aNowTime=...) at /Users/kats/zspace/mozilla-git/layout/base/nsRefreshDriver.cpp:960

#0  PresShell::Paint (this=0x5eef6540, aViewToPaint=0x621fe9c0, aDirtyRegion=..., aFlags=2) at /Users/kats/zspace/mozilla-git/layout/base/nsPresShell.cpp:5513
#1  0x6788d5a4 in nsViewManager::Refresh (this=0x62402a00, aView=0x621fe9c0, aRegion=...) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:335
#2  0x6788e1b4 in nsViewManager::PaintWindow (this=0x62402a00, aWidget=0x612b5140, aRegion=..., aFlags=0) at /Users/kats/zspace/mozilla-git/view/src/nsViewManager.cpp:650
#3  0x6788c6aa in nsView::PaintWindow (this=0x621fe9c0, aWidget=0x612b5140, aRegion=..., aFlags=0) at /Users/kats/zspace/mozilla-git/view/src/nsView.cpp:972
#4  0x681cf6cc in nsWindow::DrawTo (this=0x612b5140, targetSurface=0x64ebf220, invalidRect=...) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:981
#5  0x681cf92e in nsWindow::DrawTo (this=0x612b4c00, targetSurface=0x64ebf220, invalidRect=...) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:1021
#6  0x681cfb9a in nsWindow::OnDraw (this=0x612b4c00, ae=0x61f67050) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:1072
#7  0x681cf1f4 in nsWindow::OnGlobalAndroidEvent (ae=0x61f67050) at /Users/kats/zspace/mozilla-git/widget/android/nsWindow.cpp:871

The first codepath is guarded by a call to widget->NeedsPaint [1] which should return false because sCompositorPaused will be true while we don't have a layer manager. The second codepath is guarded by a direct check of sCompositorPaused at [2] so that also shouldn't get hit when we don't have a layer manager. There must be some other way this codepath is getting, so I'd like to turn the NS_ASSERTION into something that actually aborts in order to get a stack from it.

[1] http://mxr.mozilla.org/mozilla-central/source/view/src/nsViewManager.cpp#375
[2] http://mxr.mozilla.org/mozilla-central/source/widget/android/nsWindow.cpp#1047
I'm hoping crashing when layer manager is null will get us a more reliable stack as to when this condition is getting hit. The NS_ASSERTION just above doesn't do anything in release builds.
Attachment #720674 - Flags: review?(matt.woodrow)
Whiteboard: [native-crash] → [native-crash][leave open]
Attachment #720674 - Flags: review?(matt.woodrow) → review+
Attachment #720674 - Flags: checkin+
Well. It's crashing at the MOZ_CRASH, but the stack is still bananas.

https://crash-stats.mozilla.com/report/index/fc6e630e-5fc9-4f36-9bba-8ca542130306
I'm able to reproduce this crash using the following steps:

1. Open Fennec
2. Install Adblock Plus
3. Press "Filter subscriptions" list button in Adblock settings

-build: Firefox for Android 22.0a1 (2013-03-12)
-device: Samsung Galaxy Nexus
-OS: Android 4.1.1
Keywords: reproducible
Blocks: 840703
No longer blocks: 840703
(In reply to Andreea Pod from comment #7)
> I'm able to reproduce this crash using the following steps:
> 
> 1. Open Fennec
> 2. Install Adblock Plus
> 3. Press "Filter subscriptions" list button in Adblock settings
> 

Awesome, thanks for the STR! I'm able to reproduce this crash as well now.

It looks like the code is trying to paint a eWindowType_popup window with mPopupType of ePopupTypeMenu. For this kind of window, ShouldUseOffMainThreadCompositing returns false, and so we don't use the global/static OMTC layer manager. However nsWindow::NeedsPaint() still returns true for this window because it is in the top window's window hierarchy and sCompositorPaused is false.

I think the correct fix here is to return false from nsWindow::NeedsPaint() if GetLayerManager() returns null. I had debated doing that in bug 844275 but decided not to since I didn't see how this condition could be hit. Well, now I know.
Slight cosmetic fixup
Attachment #724724 - Attachment is obsolete: true
Attachment #724724 - Flags: review?(chrislord.net)
Attachment #724725 - Flags: review?(chrislord.net)
Component: Layout → Widget: Android
Comment on attachment 724725 [details] [diff] [review]
Guard codepaths that assume GetLayerManager doesn't return null

Review of attachment 724725 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine. I guess we're none the wiser as to when this was happening?

::: widget/android/nsWindow.cpp
@@ +2329,5 @@
>  
>  bool
>  nsWindow::NeedsPaint()
>  {
> +  if (sCompositorPaused || FindTopLevel() != nsWindow::TopWindow() || !GetLayerManager(nullptr)) {

Can there be a null layer manager when the compositor isn't paused?
Attachment #724725 - Flags: review?(chrislord.net) → review+
(In reply to Chris Lord [:cwiiis] from comment #11)
> Looks fine. I guess we're none the wiser as to when this was happening?

Well it was trying to paint a nsWindow that wasn't the main window. On Android we don't really support that, and in this case the "undefined behaviour" resulting was a crash. But we shouldn't ever crash so this fixes the crash.

> 
> Can there be a null layer manager when the compositor isn't paused?

For the non-main window, yes. For the main window there will always be a non-null layer manager if the compositor isn't paused.

https://hg.mozilla.org/integration/mozilla-inbound/rev/cf64837a7dd7
I also backed out the patch from attachment 720674 [details] [diff] [review] since it didn't really help with getting better stack traces and it clutters the code.

https://hg.mozilla.org/integration/mozilla-inbound/rev/7d91d21471d0
Whiteboard: [native-crash][leave open] → [native-crash]
Attachment #720674 - Flags: checkin+ → checkin-
Comment on attachment 724725 [details] [diff] [review]
Guard codepaths that assume GetLayerManager doesn't return null

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 844275 and friends
User impact if declined: sometimes crashes when multiple XUL windows are present, e.g. in the STR in comment 7
Testing completed (on m-c, etc.): locally only. this should bake on m-c to verify it fixes the crashes before being uplifted.
Risk to taking this patch (and alternatives if risky): low risk, fennec only
String or UUID changes made by this patch: none
Attachment #724725 - Flags: approval-mozilla-aurora?
Comment 15 corresponds to the backout patch in comment 13. The m-c cset for the fix from comment 12 is:

https://hg.mozilla.org/mozilla-central/rev/cf64837a7dd7
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Attachment #724725 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Target Milestone: --- → mozilla22
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #8)
> It looks like the code is trying to paint a eWindowType_popup window with
> mPopupType of ePopupTypeMenu.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #12)
> Well it was trying to paint a nsWindow that wasn't the main window. On
> Android we don't really support that, and in this case the "undefined
> behaviour" resulting was a crash. But we shouldn't ever crash so this fixes
> the crash.

Why do we create these popup windows on android then? Can we just not create them in the first place?
That would be the correct solution, yes. But I don't know who's creating the popup windows. In the general case it might even be addons, which we can't fix, so having this guard is good.
You need to log in before you can comment on or make changes to this bug.