Closed Bug 1520682 Opened 9 months ago Closed 9 months ago

Long css filter crash the whole firefox process group

Categories

(Core :: Graphics: WebRender, defect, P3, critical)

66 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox-esr60 --- unaffected
firefox64 --- unaffected
firefox65 --- disabled
firefox66 --- fixed

People

(Reporter: mmis1000, Assigned: nical)

References

(Blocks 2 open bugs, )

Details

(4 keywords, Whiteboard: 66b-qatriage-p2)

Crash Data

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0

Steps to reproduce:

https://codepen.io/mmis1000/pen/WLPzpm

  1. WebRender enabled

  2. make a very long filter consists of many many blur pass

.test {
  /*repeat over 1000x*/
  filter: blur(1px) blur(1px) blur(1px)....... blur(1px);
}
  1. apply it on some element
<div class="test">123</div>

Actual results:

The whole firefox crashed (include the main process and the ui, literally everything)

Expected results:

If firefox really can't handle such malformed filter, it at least should limit the crash to content process instead of the whole process group.
Or if it should be able to handle that, it will render them without problem.

Summary: css filter crash → Long css filter crash the whole firefox process group
Blocks: wr-stability
Status: UNCONFIRMED → NEW
Crash Signature: [@ webrender::picture::PicturePrimitive::take_context ]
Component: Untriaged → Graphics: WebRender
Ever confirmed: true
OS: Unspecified → All
Product: Firefox → Core
Hardware: Unspecified → All

Command used

mozregression --pref gfx.webrender.all:true --bad=2019-01-17 --good=2017-11-01

Thank you for finding the regression range!
Confirmed on Win10 (GTX1060) and Debian Testing (Macbook Pro; with layers.gpu-process.enabled:true).

Win10: bp-b767ef29-b312-4c27-b7aa-95d280190117

Last good: Nightly just gets slow, but shows blur. The graphics card is bleeping.
mozregression --repo autoland --launch 5a1beccacebf371da00b6622586396df368bb245 --pref gfx.webrender.all:true -a https://codepen.io/mmis1000/pen/WLPzpm

First bad: Crash.
mozregression --repo autoland --launch 68aa8f295bd5bd25ad05b003972cedb977e5a570 --pref gfx.webrender.all:true -a https://codepen.io/mmis1000/pen/WLPzpm

https://github.com/servo/webrender/pull/3249
https://github.com/servo/webrender/pull/3252
https://github.com/servo/webrender/pull/3244 Most likely this one?

Crash Signature: [@ webrender::picture::PicturePrimitive::take_context ] → [@ webrender::picture::PicturePrimitive::take_context ] [@ webrender::prim_store::SpaceMapper<T>::unmap<T> ]
Flags: needinfo?(gwatson)
Keywords: regression
Priority: -- → P3
Severity: normal → critical

MacOS: bp-a6b8ff03-3bad-4875-8a83-e00fe0190117 "OS: Unknown" [@ EMPTY: no crashing thread identified; ERROR_NO_THREAD_LIST ]

Crash Signature: [@ webrender::picture::PicturePrimitive::take_context ] [@ webrender::prim_store::SpaceMapper<T>::unmap<T> ] → [@ webrender::picture::PicturePrimitive::take_context ] [@ webrender::prim_store::SpaceMapper<T>::unmap<T> ] [@ EMPTY: no crashing thread identified; ERROR_NO_THREAD_LIST ]
Has Regression Range: --- → yes
Has STR: --- → yes
Keywords: testcase

None of those referenced patches seem likely to be the specific cause of this - it might just be chance that it starts to fall over around then for unrelated reasons.

Matt, Bobby, do you know if Gecko starts to drop filters if there are a huge number on an element (> 1000 in this case)? Is there a reasonable constant we could pick, and start to drop filters beyond that?

Flags: needinfo?(matt.woodrow)
Flags: needinfo?(gwatson)
Flags: needinfo?(bobbyholley)

I'm not aware of any Gecko code that drops parts of long filter chains. But, looking at that page in non-WebRender Firefox, it looks like non-WebRender Firefox hangs the content process completely, and sometimes even crashes it. So the big difference is that WebRender moves the bad behavior into the parent process. This bug was reported on macOS where we don't have a GPU process.

Chrome doesn't seem to handle this page very well either, but it does stop hanging at some point and it doesn't crash on my machine.

Maybe the expected behavior here should be "crash and restart the GPU process without taking down the browser", or in other words, ship a GPU process on macOS.

I'm not familiar with filter management on the Gecko side. I'll defer to Markus and Matt.

Flags: needinfo?(bobbyholley)
Win10, GTX1060 Behavior
Chrome Dev 73 Clicking on "Destroy my firefox" instantly shows the blur, it may flicker once or twice in red, coil whine. Resizing preview area or window is slow and causes coil whine. Scrolling the "CSS" or "JS" text fields, switching tabs, etc. is smooth.
Direct3D 11 (Advanced Layers) Tab crash: bp-d5842d26-720f-48c8-a4a4-22e710190118
WebRender, "Last good" Memory usage explosion: Whole browser + Win10 desktop are slowed down, coil whine, but blur is shown.
WebRender, today GPU process crash: bp-e06ee825-1ccb-4120-a737-2753e0190118, bp-9ef7a1bd-aba1-46af-b2a9-be22b0190118, Fallback to Basic compositing

Windows users only get a fallback to Basic, but Linux and Mac users a browser crash.
non-WR does not fall back to Basic. But the current behavior might be better as Firefox keeps usable, so this could be even P4 or stage-wr-next if falling back to Basic is acceptable?

Since bug 1415020 is fixed, you could enable layers.gpu-process.enabled for MOZ_X11. (Wayland looked bad 5 months ago: bug 1481405 comment 10)
A GPU process on Mac would be nice.

Crash Signature: [@ webrender::picture::PicturePrimitive::take_context ] [@ webrender::prim_store::SpaceMapper<T>::unmap<T> ] [@ EMPTY: no crashing thread identified; ERROR_NO_THREAD_LIST ] → [@ webrender::picture::PicturePrimitive::take_context ] [@ webrender::prim_store::SpaceMapper<T>::unmap<T> ] [@ EMPTY: no crashing thread identified; ERROR_NO_THREAD_LIST ] [@ webrender::picture::create_raster_mappers ]
Blocks: stage-wr-next
No longer blocks: stage-wr-trains
Flags: needinfo?(jmuizelaar)
Blocks: stage-wr-trains
No longer blocks: stage-wr-next
Flags: needinfo?(jmuizelaar)

I'm tempted to add a limit to WR which limits the depth of the render task tree, dropping any elements after that.

A new tree level is only introduced when there is an off-screen surface involved (e.g. mix-blend-mode, CSS filters, perspective rasterization roots).

I think it's probably reasonable, even if not technically spec compliant, to drop any elements with a render task tree of > 64 depth. I can't imagine any real world web content needing more than that, and if it does, it will render very slowly and / or exhaust memory anyway.

If we do add a limit for this, we could potentially revisit this later (esp. once we support full SVG filters), but it seems like a reasonable workaround for now.

Thoughts?

Flags: needinfo?(mstange)
Flags: needinfo?(jmuizelaar)

Adding a limit sounds good to me. Not sure what the number should be but making it configurable through a pref sounds good (to be able to quickly check whether the limit is causing a page to render incorrectly). If we hit this limit in meaningful ways we'll probably have to rethink how we render long filter chains anyway.

Actually there was this test case with a heap of drop shadows to create that "long shadow" effect (I don't have the bug number handy) which would break with a "small" limit on the number of filters, so we may want to pick a more conservative number.
If web authors really do chain filters such as blurs in a way similar to this test case, we could also look into simplifying filter chains by collapsing filters when possible.

Luckily that test case uses text-shadow and not filter: drop-shadow. So we don't need to take it into consideration here.

(In reply to Glenn Watson [:gw] from comment #10)

I think it's probably reasonable, even if not technically spec compliant, to drop any elements with a render task tree of > 64 depth. I can't imagine any real world web content needing more than that, and if it does, it will render very slowly and / or exhaust memory anyway.

I agree, such a limit sound reasonable. And as Nical said, let's have it be controlled by a pref so that we can identify visual issues caused by it more easily.

Flags: needinfo?(mstange)

Yes. That's sounds reasonable.

Flags: needinfo?(jmuizelaar)
Assignee: nobody → nical.bugzilla

(In reply to Markus Stange [:mstange] from comment #7)

I'm not aware of any Gecko code that drops parts of long filter chains. But, looking at that page in non-WebRender Firefox, it looks like non-WebRender Firefox hangs the content process completely, and sometimes even crashes it. So the big difference is that WebRender moves the bad behavior into the parent process. This bug was reported on macOS where we don't have a GPU process.

Chrome doesn't seem to handle this page very well either, but it does stop hanging at some point and it doesn't crash on my machine.

Maybe the expected behavior here should be "crash and restart the GPU process without taking down the browser", or in other words, ship a GPU process on macOS.

Having a GPU process is definitely something we should aim for on all platforms for reasons like this.

That said, if we restart the GPU process, then we'll try to render the filter again, and crash again. If the eventual fallback to software (regardless of which process that happens is) then causes a hang, then we're not much better off.

Adding the pref controlled limit sounds like a great idea :)

Flags: needinfo?(matt.woodrow)

(In reply to Matt Woodrow (:mattwoodrow) from comment #16)

That said, if we restart the GPU process, then we'll try to render the filter again, and crash again.

Hmm, unless we also crash the content process. That was the original idea in bug 1510900 - not sure if that's actually what landed?

(In reply to Markus Stange [:mstange] from comment #17)

(In reply to Matt Woodrow (:mattwoodrow) from comment #16)

That said, if we restart the GPU process, then we'll try to render the filter again, and crash again.

Hmm, unless we also crash the content process. That was the original idea in bug 1510900 - not sure if that's actually what landed?

It isn't, we just report URLs from the GPU process directly.

Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/768f85969239
Limit the amount of filter ops per stacking context in WebRender. r=jrmuizel
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Attachment #9037994 - Attachment is obsolete: true
See Also: → 1565039
You need to log in before you can comment on or make changes to this bug.