Closed Bug 1228716 Opened 4 years ago Closed 4 years ago

crash in mozilla::layers::LayerProperties::ClearInvalidations

Categories

(Core :: Graphics: Layers, defect, critical)

42 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla46
Tracking Status
firefox43 --- wontfix
firefox44 --- wontfix
firefox45 --- fixed
firefox46 --- verified

People

(Reporter: bc, Assigned: sinker)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, reproducible)

Crash Data

Attachments

(2 files, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-836ee4e1-5902-4676-8a67-2bdd32151128.
=============================================================

1. http://syvogfirs.dk/
2. Crash OS X 10.8 and 10.9 on beta/43 and aurora/44. It appears to be older than this however. I can not reproduce with a saved version of the page.

bp-836ee4e1-5902-4676-8a67-2bdd32151128
mozilla::layers::LayerProperties::ClearInvalidations

Bughunter also sees crashes with _moz_pixman_region32_clear on beta and aurora.

Nightly has different signatures.

mozilla::layers::BasicContainerLayer::ComputeEffectiveTransforms or
mozilla::gfx::Matrix4x4::operator*

There was no option to submit the crash when using Nightly.

Researching crash-stats shows similar crashes with EXCEPTION_STACK_OVERFLOW on Windows.

mozregression points to around [2014-05-25, 2014-05-27] Firefox 32, but the crash changes behavior and I'm not entirely sure that is accurate or that it matters at this point.
Duplicate of this bug: 1230113
I was able to reproduce on both OSX 10.11 and Win10. Both gave me the same regression range.

Regression range: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1204667a2935&tochange=81651ad5e43c

Too old to bisect on inbound, unfortunately. Markus, I'm wondering if this is a regression from bug 1000875?
Flags: needinfo?(mstange)
OS: Mac OS X → All
Hardware: Unspecified → All
Unlikely, but hard to say without a full stack.
Flags: needinfo?(mstange)
Bug 613659 looks like another possibility. It's an insta-crash, so I'd imagine debugging shouldn't be too difficult?
Got it in a debugger locally.

Unhandled exception at 0x00007FFC60D2F1B7 (xul.dll) in firefox.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000005723203FF0).

xul.dll!mozilla::gfx::Matrix4x4Typed<mozilla::gfx::UnknownUnits,mozilla::gfx::UnknownUnits>::Determinant() Line 1122	C++
xul.dll!mozilla::layers::Layer::SnapTransformTranslation(const mozilla::gfx::Matrix4x4Typed<mozilla::gfx::UnknownUnits,mozilla::gfx::UnknownUnits> & aTransform, mozilla::gfx::Matrix * aResidualTransform) Line 666	C++
xul.dll!mozilla::layers::BasicContainerLayer::ComputeEffectiveTransforms(const mozilla::gfx::Matrix4x4Typed<mozilla::gfx::UnknownUnits,mozilla::gfx::UnknownUnits> & aTransformToSurface) Line 93	C++
xul.dll!mozilla::layers::BasicContainerLayer::ComputeEffectiveTransforms(const mozilla::gfx::Matrix4x4Typed<mozilla::gfx::UnknownUnits,mozilla::gfx::UnknownUnits> & aTransformToSurface) Line 98	C++

And the last line is repeated ad infinitum.
Actually, the page is using "mix-blend-mode: overlay" and I can confirm that the page doesn't crash with "layout.css.mix-blend-mode.enabled" set to false. So it was bug 952643 that first "caused" the crash.

Forcing the pref on, I get the following regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f003c386c77a&tochange=9e571ad29946

Maybe bug 880031 or bug 841601?
Local builds confirm this as a regression from bug 841601.
Blocks: 841601
Flags: needinfo?(olaru)
Thinker, since you touched this transform-computing code recently, do you know why we can get the infinite recursion in comment 5?  Does it mean we've set up the layer tree with a cycle in it?
Flags: needinfo?(tlee)
Ryan, could you give me what exact changeset you built, and a dump of the layer tree?
It would be helpful, thank you!
Flags: needinfo?(tlee)
I was just testing on regular release and nightly builds, nothing fancy. You should be easily able to reproduce on m-c tip. I'm about to go on PTO for a week, so I may not be able to get you the layers dump for awhile otherwise.
Attached file syvogfirs.log.bz2
layers.dump true
Bob, please try this patch to see if it works.
Flags: needinfo?(bob)
Possible to get a test for this too?
Assignee: nobody → tlee
Flags: needinfo?(olaru) → in-testsuite?
Looks like my macbook is in need of disk repair so it may be sometime before I am able to do a build. :sinker, can you do a try build with this patch and I'll test that.
I can confirm that the attached patch fixes the crash and gives us behavior consistent with Chrome on Win10. Would still love if we could get a crashtest for this too :)
Flags: needinfo?(bob)
Thinker, can we please get this reviewed and landed? Would also be nice to get it uplifted if possible :)
Flags: needinfo?(tlee)
Comment on attachment 8702792 [details] [diff] [review]
Remove installation of nsDisplayBlendContainer

Hi robert, could you review this patch?

Since nsIFrame::BuildDisplayListForStackingContext() already handle this, we just ask it to build blendcontainer.
Flags: needinfo?(tlee)
Attachment #8702792 - Flags: review?(roc)
--
Attachment #8702792 [details] [diff] - Attachment is obsolete: true
Attachment #8702792 - Attachment is obsolete: true
Comment on attachment 8706123 [details] [diff] [review]
Give different frame keys for nsDisplayBlendContainer, v2

I change a way to fix the problem.  The major problem is both BuildDisplayListForStackingContext() and AppendBackgroundItemsToTop() create nsDisplayBlendContainer.  It is possible that one frame has two nsDisplayBlendContainer, and try to reuse the same layer for both items.  It make a loop in layer tree, and cause infinite recursive calls at compositor.

The solution here is to give different frame keys according what an item is created for.
Attachment #8706123 - Flags: review?(roc)
Keywords: checkin-needed
failed to apply:

renamed 1228716 -> rm-blend-container.diff
applying rm-blend-container.diff
patching file layout/base/nsDisplayList.h
Hunk #1 FAILED at 3424
1 out of 1 hunks FAILED -- saving rejects to file layout/base/nsDisplayList.h.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working directory
errors during apply, please fix and qrefresh rm-blend-container.diff
Flags: needinfo?(tlee)
I have tried to apply it on latest m-c (6020a4cb41a7), there is no fault.
Could you tell me what version you are trying to apply on?
Flags: needinfo?(tlee) → needinfo?(cbook)
(In reply to Thinker Li [:sinker] from comment #25)
> I have tried to apply it on latest m-c (6020a4cb41a7), there is no fault.
> Could you tell me what version you are trying to apply on?

Hi, i tied to apply this to mozilla-inbound, maybe there is another change that conflicts with yours ?
Flags: needinfo?(cbook) → needinfo?(tlee)
Attachment #8706123 - Attachment is obsolete: true
I had tried the latest patch against inbound (51e26ad49ed2).  It works.
Flags: needinfo?(tlee) → needinfo?(cbook)
Flags: needinfo?(cbook)
https://hg.mozilla.org/mozilla-central/rev/a9bcb4b18729
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Verified fixed on m-c tip.

My inclination is that this isn't worth backporting to Beta given where we are in the cycle and how long we've lived with this crash seemingly without complaints. However, I think it might make sense to backport this to Aurora since Gecko 45 will the basis for our next ESR release if the risk isn't high. What do you think, Thinker? If yes, please nominate :)
Status: RESOLVED → VERIFIED
Flags: needinfo?(tlee)
Ryan, I think the risk is low since it changes only frame keys used for tracking reused layers.  The frame key of an item can be anything as long as no conflict with others.  I would suggest to back port to 45 too to keep people from crash.
Flags: needinfo?(tlee)
Is it possible land a crashtest for this?
Flags: needinfo?(tlee)
Comment on attachment 8706286 [details] [diff] [review]
Give different frame keys for nsDisplayBlendContainer, v3

Approval Request Comment
[Feature/regressing bug #]: bug 841601
[User impact if declined]: crashes when loading website
[Describe test coverage new/current, TreeHerder]: Verified to fix crashing website. Could probably use a crashtest, though.
[Risks and why]: Low risk - see comment 32.
[String/UUID change made/needed]: None
Attachment #8706286 - Flags: approval-mozilla-aurora?
(In reply to Ryan VanderMeulen [:RyanVM] from comment #33)
> Is it possible land a crashtest for this?

Annoyingly, I haven't been able to get the page to crash when saved locally either (same as c0).
Comment on attachment 8706286 [details] [diff] [review]
Give different frame keys for nsDisplayBlendContainer, v3

Fix a crash, taking it.
Attachment #8706286 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Ryan VanderMeulen [:RyanVM] from comment #33)
> Is it possible land a crashtest for this?

Maybe we should land a code to check layer tree integrity to detect any loop for debug build.
Flags: needinfo?(tlee) → needinfo?(roc)
We should actually check this in all builds when the compositor receives layer tree updates. Corrupted child processes should not be able to crash the compositor by constructing an invalid layer tree!
Flags: needinfo?(roc)
You need to log in before you can comment on or make changes to this bug.