Closed Bug 785626 Opened 12 years ago Closed 12 years ago

crash in mozilla::FrameLayerBuilder::BuildContainerLayerFor

Categories

(Core :: Layout, defect)

17 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla18
Tracking Status
firefox17 + verified

People

(Reporter: scoobidiver, Assigned: cwiiis)

References

Details

(4 keywords)

Crash Data

It first appeared in 17.0a1/20120825. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1c0ac073dc65&tochange=f077de66e52d

Signature 	mozilla::`anonymous namespace''::ContainerState::Finish(unsigned int*) More Reports Search
UUID	113744a5-686b-45a4-9e45-128692120825
Date Processed	2012-08-25 14:10:34
Uptime	970
Last Crash	6.5 days before submission
Install Age	16.2 minutes since version was first installed.
Install Time	2012-08-25 14:28:10
Product	Firefox
Version	17.0a1
Build ID	20120825030541
Release Channel	nightly
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	AuthenticAMD family 18 model 1 stepping 0
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x14
User Comments	Looking at some medical forum...and CRASH
App Notes 	
AdapterVendorID: 0x1002, AdapterDeviceID: 0x9640, AdapterSubsysID: 2acf103c, AdapterDriverVersion: 8.982.0.0
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
EMCheckCompatibility	True
Adapter Vendor ID	0x1002
Adapter Device ID	0x9640
Total Virtual Memory	4294836224
Available Virtual Memory	3812274176
System Memory Use Percentage	25
Available Page File	13676478464
Available Physical Memory	5957169152

Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::`anonymous namespace'::ContainerState::Finish 	layout/base/FrameLayerBuilder.cpp:2101
1 	xul.dll 	mozilla::FrameLayerBuilder::BuildContainerLayerFor 	layout/base/FrameLayerBuilder.cpp:2456
2 	xul.dll 	PL_DHashTableOperate 	obj-firefox/xpcom/build/pldhash.cpp:586
3 	xul.dll 	mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems 	layout/base/FrameLayerBuilder.cpp:1806
4 	xul.dll 	mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems 	layout/base/FrameLayerBuilder.cpp:1751
5 	xul.dll 	mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems 	layout/base/FrameLayerBuilder.cpp:1751
6 	xul.dll 	mozilla::FrameLayerBuilder::BuildContainerLayerFor 	layout/base/FrameLayerBuilder.cpp:2451
7 	mozglue.dll 	je_malloc 	memory/mozjemalloc/jemalloc.c:6291
8 	xul.dll 	nsDisplayList::PaintRoot 	layout/base/nsDisplayList.cpp:943
9 	xul.dll 	nsLayoutUtils::PaintFrame 	layout/base/nsLayoutUtils.cpp:1868
10 	xul.dll 	nsViewManager::FlushDirtyRegionToWidget 	view/src/nsViewManager.cpp:465
11 	xul.dll 	nsViewManager::ProcessPendingUpdates 	view/src/nsViewManager.cpp:1217
12 	xul.dll 	nsRefreshDriver::Notify 	layout/base/nsRefreshDriver.cpp:421
13 	xul.dll 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:476
14 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:624
15 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:82
16 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:201
17 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:175
18 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:163
19 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:232
20 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:273
21 	xul.dll 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3800
22 	xul.dll 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3877
23 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3953
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3A%60anonymous+namespace%27%27%3A%3AContainerState%3A%3AFinish%28unsigned+int*%29
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3AFrameLayerBuilder%3A%3ABuildContainerLayerFor
Bug 785333 is the only thing that pops out within that range.
I can pretty easily reproduce this locally, and backing out bug 785333 does fix it. Unfortunately I'm having a hard time coming up with a testcase - it involves using a custom provider for the social feature in Firefox.
Blocks: 785333
I am getting this crash consistently in 17.0a1 when I go 'offline' on facebook chat and then scroll up/down my timeline. Going back 'online' with chat resolves the problem.

One of the several crash reports:

https://crash-stats.mozilla.com/report/index/bp-5ce86f2c-287c-492c-bce4-b9e572120826
There are about 800 crashes in less than one day.
Assignee: nobody → chrislord.net
Bank holiday, will see if this is a quick fix.
Status: NEW → ASSIGNED
I can't reproduce this in Linux - judging by the reports, it seems this may be Windows-specific?

I guess I've broken some kind of invariant to do with mNewChildLayers... Looking but will be away for the day so won't have a fix until the evening/night (GMT), assuming it's relatively simple.
For anyone that can easily reproduce this crash, it'd be super-useful if they could post if there are any assertions leading up to the crash, and what they are.
    // An invariant of this loop is that the layers in mNewChildLayers
    // with index < i are the first i child layers of mContainerLayer.

My current guess is that this would happen if a layer ended up being represented by an item in a different part of the display-list than before and caused layers to swap positions, thus breaking this invariant.

I don't think this is an unreasonable thing to happen though, so perhaps we should pick a better way of pruning old child layers... Couldn't we just ref and remove all child layers of a container layer in BuildContainerLayerFor, then unref them after ContainerState::Finish? I think this would let us simplify this Finish method quite a bit anyway.

Will try this tonight after I reproduce this in Windows.
> I can't reproduce this in Linux - judging by the reports, it seems this may
> be Windows-specific?

Happening on OS X for me.

I can easily reproduce the crash but not being well versed in this area, you'd need to tell me specifically what information you'd want me to provide.
I can reproduce it every time on Facebook, while scrolling to top. I think it has something to do with index less than 0. If anything specific is needed, I can help.
(I will need to build Firefox to get assertions, please help me to do that)
Same reproducibility here, but only if the chat side-pane is closed. If it's active, the problem does not re-occur.
maybe the chat panel being "fixed" position somehow prevents the assertion.
Bug 785816 has STR but I can't reproduce.
Can we back out the offending change and re-land when this is solved. This is blocking some pretty important testing and crashing a lot of our nightly users. Going multiple days with a major crash at Facebook is probably costing us a non-trivial number of nightly testers.
Using the STR in Bug 785816 Comment 0 + Build of http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-08-27-mozilla-central-debug/ (i.e. http://hg.mozilla.org/mozilla-central/rev/8af6a22827ec) I get the following Assertion when crashing:

###!!! ASSERTION: Layer already in list???: '!mNewChildLayers.Contains(ownLayer)', file e:/builds/moz2_slave/m-cen-w32-dbg/build/layout/base/FrameLayerBuilder.cpp, line 1859
(In reply to Asa Dotzler [:asa] from comment #16)
> Can we back out the offending change and re-land when this is solved. This
> is blocking some pretty important testing and crashing a lot of our nightly
> users. Going multiple days with a major crash at Facebook is probably
> costing us a non-trivial number of nightly testers.

I didn't get the time today to work on a fix and it isn't trivial, so I think this would be a good idea. I'll likely have a fix tomorrow, but it'd be better to back this out now and re-land than sacrifice users.
Fixed by the backout of bug 785333:
http://hg.mozilla.org/mozilla-central/rev/ee5484eb9f4a

Still need to back this out on Aurora also, once that merge is done.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Backed out on aurora post-merge too:
https://hg.mozilla.org/releases/mozilla-aurora/rev/b766865d2857
Group: mozilla-corporation-confidential
Group: mozilla-corporation-confidential
Is anyone that could easily reproduce this crash in a position that they could test out patches? I'll have a patch for this later today, but I wasn't easily able to reproduce the crash, so it'd be good to have someone that was help out.
Sorry for flags change, Bugzilla being unhelpful...
(In reply to XtC4UaLL [:xtc4uall] from comment #17)
> Using the STR in Bug 785816 Comment 0 + Build of
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-08-27-mozilla-
> central-debug/ (i.e. http://hg.mozilla.org/mozilla-central/rev/8af6a22827ec)
> I get the following Assertion when crashing:
> 
> ###!!! ASSERTION: Layer already in list???:
> '!mNewChildLayers.Contains(ownLayer)', file
> e:/builds/moz2_slave/m-cen-w32-dbg/build/layout/base/FrameLayerBuilder.cpp,
> line 1859

argh, I missed this and this is not good... I've got patches that change the layer iteration so that the crash doesn't get hit, but this shouldn't happen and is much more likely what is causing the problem :(

I'm able to reproduce this on the page listed in bug 785816, so looking at this now.
I think this assertion would only be hit if a frame appeared twice in the same list with the same item (including merged frames) - I don't think this should happen, as frame + item should always be unique... Still investigating.
ok, I think this is happening because a merged item is being merged into a different frame between draws, causing a layer to be picked up twice... Maybe storing data against merged frames is just not a valid thing to do...
Is a valid thing to do, but requires an extra flag so as not to pick up the same layer more than once - fixes incoming to bug #785333, hopefully.
Depends on: 787818
Target Milestone: --- → mozilla18
Keywords: verifyme
No longer depends on: 787818
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0 beta 4

http://bit.ly/SNVobv

Only 6 crashes for 17beta for the last two weeks. 
Couldn't reproduce any issues with facebook mentioned in comments 3 and 11.
Setting this to verified.
You need to log in before you can comment on or make changes to this bug.