crash in SelectorMatches due to excessive recursion in frame construction

RESOLVED FIXED in Firefox 23

Status

()

--
critical
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: bomfog, Assigned: mats)

Tracking

({crash, regression, reproducible})

22 Branch
mozilla23
crash, regression, reproducible
Points:
---
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(firefox21 unaffected, firefox22 affected, firefox23 fixed)

Details

(crash signature, URL)

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

6 years ago
This bug was filed from the Socorro interface and is 
report bp-db0ee04b-2150-4520-9218-131d42130328 .
============================================================= 
PGO'd ionmonkey builds since changeset 6cac55064722 (merge from m-c noon-ish 2013-03-27)
crash on the descendant selector test in Robohornet, with EXCEPTION_STACK_OVERFLOW as the reason.

Non-pgo builds don't crash.

Clean profile, but didn't think to try safe mode yet.

bp-08446e55-779b-4009-8fbb-0b5ca2130328
bp-41e09c43-81f0-4875-9ee9-08bfd2130328
bp-17bfd3e6-84ff-4182-94d3-435d22130328
bp-6217ba12-8c9c-4dfa-aa5c-8dab82130328
(Assignee)

Comment 1

6 years ago
From the crash data: "AdapterVendorID: 0x1002, AdapterDeviceID: 0x94c1"
which google tells me is a "ATI Radeon HD 2400 Pro".

Given the PGO/non-PGO difference it might be a variation of bug 772330?
Component: General → Style System (CSS)
Product: Firefox → Core
(Reporter)

Comment 2

6 years ago
(In reply to Mats Palmgren [:mats] from comment #1)
> From the crash data: "AdapterVendorID: 0x1002, AdapterDeviceID: 0x94c1"
> which google tells me is a "ATI Radeon HD 2400 Pro".
> 
> Given the PGO/non-PGO difference it might be a variation of bug 772330?

Also happens on a laptop with the following crash report/about:support info:

bp-d2c858c1-e7a8-419b-a03f-2d79c2130329

  Graphics

        Adapter Description
        Mobile Intel(R) 4 Series Express Chipset Family

        Adapter Drivers
        igdumd64 igd10umd64 igdumdx32 igd10umd32

...
        Device ID
        0x2a42

        DirectWrite Enabled
        false (6.2.9200.16492)

        Driver Date
        2-11-2011

        Driver Version
        8.15.10.2302

        GPU #2 Active
        false

        GPU Accelerated Windows
        0/2 Basic

        Vendor ID
        0x8086

        WebGL Renderer
        Google Inc. -- ANGLE (Mobile Intel(R) 4 Series Express Chipset Family)

        AzureCanvasBackend
        skia

        AzureContentBackend
        none

        AzureFallbackCanvasBackend
        cairo
(Reporter)

Comment 3

6 years ago
Ignore "non-pgo" comment in bp-bd4bf9b9-9c61-4fb1-a061-d59d72130329

Still pgo only.
(Reporter)

Comment 4

6 years ago
PGO'd hourly Inbound build now also crashes on the descendant selector test.

Built from http://hg.mozilla.org/integration/mozilla-inbound/rev/753f285620fe

bp-bef29686-bef6-4b60-bbf2-0aa6b2130329

Non-PGO inbound still doesn't crash.
(Reporter)

Comment 5

6 years ago
And m-c nightly, also.

bp-89d5144c-9ccc-4244-99ea-d0c9a2130329
http://hg.mozilla.org/mozilla-central/rev/8693d1d4c86d

I think I'm done now.
(Assignee)

Comment 6

6 years ago
(In reply to Mats Palmgren [:mats] from comment #1)
> Given the PGO/non-PGO difference it might be a variation of bug 772330?

Ignore that.  This crash is a "just" a stack overflow from to much recursion
constructing frames for the very deep content tree in that test.
The cycle that repeats is:

#19 nsCSSFrameConstructor::ConstructBlock layout/base/nsCSSFrameConstructor.cpp:10945
#20 nsCSSFrameConstructor::ConstructNonScrollableBlock layout/base/nsCSSFrameConstructor.cpp:4407
#21 nsCSSFrameConstructor::ConstructFrameFromItemInternal layout/base/nsCSSFrameConstructor.cpp:3530
#22 nsCSSFrameConstructor::ConstructFramesFromItem layout/base/nsCSSFrameConstructor.cpp:5480
#23 nsCSSFrameConstructor::ConstructFramesFromItemList layout/base/nsCSSFrameConstructor.cpp:9811
#24 nsCSSFrameConstructor::ProcessChildren layout/base/nsCSSFrameConstructor.cpp:9953
...

"ulimit -s 2400" crashes for me on Linux64.

Maybe frame construction should have a frame depth limit as we have in reflow?
(it's no much point creating them if we're not going to render them ;-) )
Component: Style System (CSS) → Layout
OS: Windows 7 → All
Hardware: x86 → All
(Assignee)

Comment 7

6 years ago
Created attachment 731419 [details] [diff] [review]
wip

Would something as simple as this work to prevent at least some crashes?
It seems to prevent the crash for this test.

Fwiw, this makes the test run in ~400ms compared to ~3700ms without it,
in a Linux64 debug build.
Attachment #731419 - Flags: feedback?(bzbarsky)
It would prevent some crashes, yeah.  Obviously if you keep doing appendChild you can work around this check...

We may also want a somewhat bigger multiplier.  Not sure.

Past that, I'm not sure what we want to do here.  :(
Flags: needinfo?(dbaron)
Comment on attachment 731419 [details] [diff] [review]
wip

Feedback+, I guess, though if we do this I'd prefer an autorestore.
Attachment #731419 - Flags: feedback?(bzbarsky) → feedback+
(Reporter)

Comment 10

6 years ago
As of at least IM nightly built from:

http://hg.mozilla.org/projects/ionmonkey/rev/61f7ebb9f3d9

this seems to have become WFM.

Also WFM on a PGO'd Inbound hourly of recent vintage.
(Reporter)

Comment 11

6 years ago
Scratch the WFM. Crashes still/again.

Current Ion nightly, dirty profile:

bp-64a0d0cb-bae5-4422-a997-a65a42130402

Same nightly, clean profile:

bp-b328eb01-a74a-4944-9e4d-2b3782130402

in which the stack is different:

Frame 	Module 	Signature 	Source
0 	xul.dll 	SelectorMatches 	layout/style/nsCSSRuleProcessor.cpp:2108
1 	xul.dll 	SelectorMatches 	layout/style/nsCSSRuleProcessor.cpp:2198
2 	xul.dll 	AtomSelector_InitEntry 	layout/style/nsCSSRuleProcessor.cpp:867

but was back to the long version when I reran with another clean profile (clean profile = "Use temporary profile" option in the Profile Manager app thingy)

bp-65b5881d-bdf0-4a17-b363-8bb0f2130402

Haven't tried inbound or central yet.
(Reporter)

Comment 12

6 years ago
Current central nightly crashes

bp-c74b124e-7aad-41a3-a2b6-4ec882130402

Non-pgo central hourly, same changeset, doesn't.

Done again....
(Assignee)

Comment 13

6 years ago
Thanks bomfog, we understand the problem, there's no need for more crash data.
(Assignee)

Comment 14

6 years ago
Created attachment 732514 [details] [diff] [review]
fix

https://tbpl.mozilla.org/?tree=Try&rev=c51fc974a816
Assignee: nobody → matspal
Attachment #732514 - Flags: review?(bzbarsky)
(Assignee)

Updated

6 years ago
Attachment #731419 - Attachment is obsolete: true
Comment on attachment 732514 [details] [diff] [review]
fix

So looking at this more carefully, we can't do it this way.

And the reason is that this will also fail to construct anonymous kids for the frame, which is just bad.

What you probably want to do instead is if the depth is exceeded skip the block that's skipped right now in the IsLeaf() case, but do everything else in here as it's done right now.
Attachment #732514 - Flags: review?(bzbarsky) → review-
(Assignee)

Comment 16

5 years ago
Created attachment 732574 [details] [diff] [review]
fix, v2

Like so?
Attachment #732514 - Attachment is obsolete: true
Attachment #732574 - Flags: review?(bzbarsky)
Comment on attachment 732574 [details] [diff] [review]
fix, v2

No, you need to clear the lazy bits if you're not going to construct frames for those kids; why wouldn't you clear those?
(Assignee)

Comment 18

5 years ago
Created attachment 732629 [details] [diff] [review]
fix, v3
Attachment #732574 - Attachment is obsolete: true
Attachment #732574 - Flags: review?(bzbarsky)
Attachment #732629 - Flags: review?(bzbarsky)
Comment on attachment 732629 [details] [diff] [review]
fix, v3

I guess it's better to construct the ::before and ::after bits too, yeah...

>+        ClearLazyBits(child, nullptr);

This should be ClearLazyBits(child, child->GetNextSibling()).

r=me with that fixed, and the warning moved so it only happens once, not once per child, perhaps?
Attachment #732629 - Flags: review?(bzbarsky) → review+
(Assignee)

Comment 20

5 years ago
Created attachment 732644 [details] [diff] [review]
fix, v4

> This should be ClearLazyBits(child, child->GetNextSibling()).

fixed
Attachment #732629 - Attachment is obsolete: true
Attachment #732644 - Flags: review+

Comment 21

5 years ago
It crashes also on http://ie.microsoft.com/testdrive/Performance/RoboHornetPro
status-firefox21: --- → unaffected
status-firefox22: --- → affected
status-firefox23: --- → affected
Keywords: regression, reproducible
Version: Trunk → 22 Branch
(Reporter)

Comment 23

5 years ago
WFM.
https://hg.mozilla.org/mozilla-central/rev/a97e447e504f
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Sorry for not responding sooner.

Having a frame construction recursion limit seems fine to me.  (Maybe we should get rid of the reflow recursion limitation while we're at it?  There doesn't seem much point to constructing frames if we're not going to reflow them, so it makes sense to use a single limit and have it in frame construction.)  Though we should probably be careful that the limit is about depth of the frame tree so we don't get weird appearing/disappearing effects for dynamic manipulation, like we used to with the reflow depth limit before we fixed it.  Maybe file a followup bug on that?
Flags: needinfo?(dbaron)
Summary: crash in SelectorMatches → crash in SelectorMatches due to excessive recursion in frame construction

Updated

5 years ago
status-firefox23: affected → fixed

Updated

5 years ago
Duplicate of this bug: 889194
I wasn't able to reproduce this issue on an Ubuntu 12.10 32bit machine, while connecting to neither http://www.robohornet.org/tests/descendantselector.html nor http://ie.microsoft.com/testdrive/Performance/RoboHornetPro/, with Nightly builds from 27 and 28 march 2013 and 2013-04-02.

There seem to be some crash reports in Socorro, within last month, regarding Firefox 23 beta 1, 2, 3, 4 and 5, and 23.0a2:

https://crash-stats.mozilla.com/report/list?signature=SelectorMatches&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2013-07-16+13%3A00%3A00&range_value=4

Does anyone have any thoughts/suggestions?
Flags: needinfo?
You need to log in before you can comment on or make changes to this bug.