Crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | SkChunkAlloc::newBlock

RESOLVED FIXED in Firefox 52

Status

()

defect
P5
critical
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: mccr8, Assigned: lsalzman)

Tracking

({crash})

Trunk
mozilla54
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox51 wontfix, firefox52 fixed, firefox-esr52 fixed, firefox53 fixed, firefox54 fixed)

Details

(Whiteboard: [gfx-noted], crash signature)

Attachments

(4 attachments, 4 obsolete attachments)

2.72 KB, patch
mchang
: review+
Details | Diff | Splinter Review
3.54 KB, patch
mchang
: review+
Details | Diff | Splinter Review
11.25 KB, patch
lsalzman
: review+
Details | Diff | Splinter Review
11.28 KB, patch
lsalzman
: review+
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is 
report bp-9880d167-301e-4a02-96af-f16552160915.
=============================================================

This isn't a super common crash, but it looks like it is trying to allocate more than 400MB. Ideally this would be a fallible crash.
Sorry, I meant "a fallible allocation" not "a fallible crash".
Whiteboard: [gfx-noted]
Right now, when Skia asks for a "throw on fail" allocation, we call moz_xmalloc, which stops everything on OOMs.  It seems that we'd have to change that.

This is coming from DrawTargetSkia::StrokeRect - the question is, would DrawTargetCairo::StrokeRect also crash in the similar OOM situation, or is this a regression because of a Skia backend.  A quick search doesn't suggest we deal with OOM situations in the Cairo case either (e.g., https://crash-stats.mozilla.com/report/index/d0deec33-ce76-4750-817d-217e12160928).
Priority: -- → P5
crashes with the signature on the beta channel are significantly increasing after the switch to 52.0b.
OS: Windows XP → All
Hardware: x86 → All
Duplicate of this bug: 1334456
Crash Signature: [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | SkChunkAlloc::newBlock] → [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | SkChunkAlloc::newBlock] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::LookAndFeel::GetInt ]
Lee, can we explain the increase in the volume, outside of "more people are now using Skia"?
Flags: needinfo?(lsalzman)
when i compare 51beta with 52beta, there weren't any other crashes with a large memory allocation size trending downward at the same rate, as we see these skia-related crashes showing up now:
https://mozilla.github.io/stab-crashes/scomp.html?common=product%3DFirefox%26date%3D%3E2016-10-01%26submitted_from_infobar%3D%21__true__%26oom_allocation_size%3D%3E%253D10000000&p1=version%3D51.0b&p2=version%3D52.0b
(not sure if this is a valid query/comparison though)
(In reply to Milan Sreckovic [:milan] from comment #5)
> Lee, can we explain the increase in the volume, outside of "more people are
> now using Skia"?

The fact that we enabled Skia content for Windows in 52 would probably be sufficient to explain everything, hypothetically. See bug 1314090
Flags: needinfo?(lsalzman)
See Also: → 1314090
Crash Signature: [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | SkChunkAlloc::newBlock] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::LookAndFeel::GetInt ] → [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | SkChunkAlloc::newBlock] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::LookAndFeel::GetInt ] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | mo…
Is there anything we can do to help mitigate these crashes? They're about 3% of all content process crashes on 52 at the moment.
Flags: needinfo?(lsalzman)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #8)
> Is there anything we can do to help mitigate these crashes? They're about 3%
> of all content process crashes on 52 at the moment.

The problem here is this species of crash looks to be an OOM that just happens to hit us when we're in the path rendering code. This particular Skia code was not architected with the concern of being fallible, so it would require quite some invasive changes down the stack to fail gracefully. And even then, we'd end up with paths just failing to render, ultimately before we just OOM somewhere else or just render a bunch of black when images run out of memory too.

Basically, I don't know at this moment what a good solution for this would be, or if it wouldn't just push the OOMs to trigger elsewhere.
Flags: needinfo?(lsalzman)
SkData by default uses infallible allocation, which is bothersome. We can work around this by using sk_malloc_flags(size, 0), which is fallible.

This won't fix the SkChunkAlloc crashes with path allocation, but it will at least fix the ones coming from SourceSurfaceSkia.
Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Attachment #8838740 - Flags: review?(mchang)
Comment on attachment 8838740 [details] [diff] [review]
make SourceSurfaceSkia allocation fallible by using sk_malloc

Review of attachment 8838740 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/SourceSurfaceSkia.cpp
@@ +52,5 @@
> +        memcpy(mem, aData, size.value());
> +      }
> +      return SkData::MakeFromMalloc(mem, size.value());
> +    }
> +  }

nit: Maybe an NSWarning about that this failed?
Attachment #8838740 - Flags: review?(mchang) → review+
    These are the OOM crashes during our A-B testing for Skia vs. Cairo:
    https://crash-stats.mozilla.com/search/?ActiveExperiment=~skia&signature=~OOM&date=%3E%3D2016-09-18T21%3A03%3A00.000Z&date=%3C2017-02-17T21%3A03%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=ActiveExperimentBranch#facet-signature

    This crash shows up as #10 (3/358), but #2 and #3 are fixed and #1 is "small OOMs", leaving us with a 3/73, which is close enough to the 3% we're seeing now (though I appreciate we're in very small numbers territory to do statistical analysis.)
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/da9836d57e73
make SourceSurfaceSkia allocation fallible by using sk_malloc. r=mchang
Keywords: leave-open
(In reply to Pulsebot from comment #13)
> Pushed by lsalzman@mozilla.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/da9836d57e73
> make SourceSurfaceSkia allocation fallible by using sk_malloc. r=mchang

Clarification: this is meant to address this crash signature: https://crash-stats.mozilla.com/signature/?signature=OOM%20%7C%20large%20%7C%20mozalloc_abort%20%7C%20mozalloc_handle_oom%20%7C%20moz_xmalloc%20%7C%20sk_malloc_flags

The ones coming from SourceSurfaceSkia usage like inside Canvas2D should be addressed by this. Other ones coming deep from in SkiaGL stuff unfortunately won't.
Comment on attachment 8838740 [details] [diff] [review]
make SourceSurfaceSkia allocation fallible by using sk_malloc

Approval Request Comment
[Feature/Bug causing the regression]: bug 1299435, 52+
[User impact if declined]: Crashes on OOM situations such as putting image data to canvas or reading from a SkiaGL canvas.
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no 
[List of other uplifts needed for the feature/fix]: 53, 52, 52-esr
[Is the change risky?]: no
[Why is the change risky/not risky?]: Just allocates data for Skia source surfaces using a fallible allocator, so that the code that uses it can now actually properly handle the failure rather than crash before it ever gets there.
[String changes made/needed]: None
Attachment #8838740 - Flags: approval-mozilla-esr52?
Attachment #8838740 - Flags: approval-mozilla-beta?
Attachment #8838740 - Flags: approval-mozilla-aurora?
Comment on attachment 8838740 [details] [diff] [review]
make SourceSurfaceSkia allocation fallible by using sk_malloc

make some large skia allocations faillible to avoid OOM crashes, aurora53+, beta52+

Clearing esr52 flag as that will be synced from beta.
Attachment #8838740 - Flags: approval-mozilla-esr52?
Attachment #8838740 - Flags: approval-mozilla-beta?
Attachment #8838740 - Flags: approval-mozilla-beta+
Attachment #8838740 - Flags: approval-mozilla-aurora?
Attachment #8838740 - Flags: approval-mozilla-aurora+
I noticed SourceSurfaceSkia::DrawTargetWillChange was also still using MakeRasterCopy, which will crash on OOM, evidenced by some crash reports with the sk_malloc_flags signature (i.e. https://crash-stats.mozilla.com/report/index/91ad2dab-12c4-46d2-893e-f4e052170222).

... Thus, sadly, this also needs to be modified to use MakeSkData like the other places in SourceSurfaceSkia.

Cleaned up some mess while I was at it.
Attachment #8840296 - Flags: review?(mchang)
Milan and I noticed that regarding the ChunkAlloc signature, a good portion of these crashes seem to be triggered by drawing dotted borders in nsCSSBorderRenderer via a call to DrawTarget::StrokeRect. If this gets broken up into 4 StrokeLine calls instead, one per side, this will limit the memory allocation to be on the order of 1/4th in each call to StrokeLine what it was in StrokeRect.

Note that due to the complexity of calculating where exactly to resume the drawing of dots along each side as you start and stop the side, it made a lot more sense to just use the existing code that already exists in nsCSSBorderRenderer to do just that. All I had to do was not make the "fast path" trigger here if the estimated number of dots would be above a threshold. Below this threshold, memory allocation should be reasonable and we just go through the normal StrokeRect fast path.

While this will not "fix" the problem of the allocation being infallible, it should make it much less likely to hit an OOM crash in that rendering path.
Attachment #8840304 - Flags: review?(milan)
Comment on attachment 8840304 [details] [diff] [review]
limit the amount of dots that nsCSSBorderRenderer will draw with DrawTarget::StrokeRect

Review of attachment 8840304 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/painting/nsCSSRenderingBorders.cpp
@@ +3222,5 @@
> +      // Ensure that dotting the border will not cause excessively large memory
> +      // allocations in the content backend. Otherwise, use the fallback path
> +      // below that will break up the drawing into individual StrokeLine calls
> +      // that should allocate about 1/4th as much memory at once.
> +      mOuterRect.width + mOuterRect.height <= mBorderWidths[0] * 65536)

I would feel better if we had an extra set of parentheses even with the precedence rules making this code correct :)
Attachment #8840304 - Flags: review?(milan) → review+
Attachment #8840296 - Flags: review?(mchang) → review+
Given that Skia may generally handle rendering of dotted borders better via StrokeLine, there is not that much reason to try to do it with StrokeRect in the first place. So let's just remove the StrokeRect path and get rid of the headache of having to maintain one OOMy and possibly slow "fast path" rather than just dispatching to the general patch for dots.
Attachment #8840555 - Flags: review?(milan)
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c69c880f8862
avoid OOM crashes in SourceSurfaceSkia::DrawTargetWillChange. r=mchang
Blocks: 588271
Fixed parentheses and amended dot limit downward based on discussion with Milan and further investigation of how Skia is behaving in this case. Just keeping this around as a backup in case removing the StrokeRect dotted border path does not work out.
Attachment #8840304 - Attachment is obsolete: true
Attachment #8840563 - Flags: review+
Setting qe-verify- based on Lee's assessment on manual testing needs (see Comment 16) and the fact that this fix has automated coverage.
Flags: qe-verify-
Comment on attachment 8840555 [details] [diff] [review]
remove the StrokeRect path for drawing dotted borders

Review of attachment 8840555 [details] [diff] [review]:
-----------------------------------------------------------------

If the non-very-simple path works, this looks reasonable.
Attachment #8840555 - Flags: review?(milan) → review+
It turns out the fallback path for rendering stroked dots was somewhat broken. The problem is GetStraightBorderPoint, which is used to calculate the start/end points of a StrokeLine, calculates the center of these dots. However, StrokeLine interprets its parameters as dashes, which work in terms of boundaries. So we need to adjust the results to actually be in terms of boundaries for dots to render at their proper position.

Complicating this further is that GetStraightBorderPoint still needs to sometimes return dot centers for doing larger filled dots. So, I needed to add a parameter for how much to offset dots to work around this.

The other problem was that apparently we weren't keeping track of the proper dash offset for these stroked dots either, so when we would go to render a side, we'd either put two dots next to each other or two spaces. That needed to be fixed as well.

Next problem is one of the reftests for dotting baked the broken results into an image as a ref, but due to the fact this just seems like a really bad idea in general, even if the image ref were updated, I surgically removed those from the test without disabling the entire test (so at least filled dots are still tested) since otherwise this will just cause us continual pain.

It is also worth noting that whether or not we just entirely removed the fast path or selectively disabled it with a threshold check, these underlying problems with StrokeLine-ing dots would have still existed, so they need to be addressed regardless.
Attachment #8840555 - Attachment is obsolete: true
Attachment #8841037 - Flags: review?(milan)
Attachment #8841037 - Flags: review?(milan) → review?(mstange)
Comment on attachment 8841037 [details] [diff] [review]
remove the StrokeRect path for drawing dotted borders

Review of attachment 8841037 [details] [diff] [review]:
-----------------------------------------------------------------

This wasn't easy to review, but I think I understand most of it now, and it makes sense.
Attachment #8841037 - Flags: review?(mstange) → review+
Beefed up the comments based on review discussion with Markus.

Fixed one small bug regarding the starting point of merged sides.

Rough plan is now to get some testing on this for a few days in central, and if it sticks, then we can maybe get this uplifted to beta before it rolls into release so that we can address this 400MB allocation silliness.
Attachment #8841037 - Attachment is obsolete: true
Attachment #8841087 - Flags: review+
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/87cfe603d44d
remove the StrokeRect path for drawing dotted borders. r=mstange
See Also: → 1342571
Attachment #8840563 - Attachment is obsolete: true
Comment on attachment 8841087 [details] [diff] [review]
remove the StrokeRect path for drawing dotted borders. r=mstange

Approval Request Comment
[Feature/Bug causing the regression]: This was a problem that has largely always existed with Skia, but was aggravated by our switch to Skia content rendering for Mac/Android/Linux in 51 and Windows in 52.
[User impact if declined]: HTML that use CSS dotted border style can cause OOM crashes due to allocating up to 400MB at a time. This will impact all users who do not use our D2D content path - which means basically about half of our users. 
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no 
[List of other uplifts needed for the feature/fix]: This patch is only upliftable to 53.
[Is the change risky?]: no
[Why is the change risky/not risky?]: Just makes us use pre-existing fallback code for rendering dotted borders while also fixing some small visual bugs that said fallback code contained. At the same time, we will use much less memory drawing these dotted borders, which will substantially reduce the occurrence of these OOMs.
[String changes made/needed]: None
Attachment #8841087 - Flags: approval-mozilla-aurora?
This makes the patch compatible with beta 52. Some file locations and type names changed as of 53, so I just had to fix those up, but otherwise nothing else.

Approval Request Comment
[Feature/Bug causing the regression]: This was a problem that has largely always existed with Skia, but was aggravated by our switch to Skia content rendering for Mac/Android/Linux in 51 and Windows in 52.
[User impact if declined]: HTML that use CSS dotted border style can cause OOM crashes due to allocating up to 400MB at a time. This will impact all users who do not use our D2D content path - which means basically about half of our users. 
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no 
[List of other uplifts needed for the feature/fix]: This patch is only upliftable to 52.
[Is the change risky?]: no
[Why is the change risky/not risky?]: Just makes us use pre-existing fallback code for rendering dotted borders while also fixing some small visual bugs that said fallback code contained. At the same time, we will use much less memory drawing these dotted borders, which will substantially reduce the occurrence of these OOMs.
[String changes made/needed]: None
Attachment #8841296 - Flags: review+
Attachment #8841296 - Flags: approval-mozilla-beta?
Comment on attachment 8841296 [details] [diff] [review]
remove the StrokeRect path for drawing dotted borders (52). r=mstange

survive big memory allocations from skia, beta52+

Should be in 52 rc1
Attachment #8841296 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8841087 [details] [diff] [review]
remove the StrokeRect path for drawing dotted borders. r=mstange

... and let's get this on aurora53 as well.
Attachment #8841087 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-esr52/rev/4bd2e5d2ac0d

Given that a not-insignificant amount of work has landed in this bug at this point and we're near the end of a release cycle, I think we should close it out and track any future work in new bugs.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in before you can comment on or make changes to this bug.