Closed Bug 1604535 Opened 4 years ago Closed 4 years ago

Crash in [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::gfx::SourceSurfaceRawData::GuaranteePersistance]

Categories

(Core :: Graphics: ImageLib, defect, P2)

All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox-esr68 --- wontfix
firefox71 --- wontfix
firefox72 --- wontfix
firefox73 --- wontfix
firefox74 --- fixed

People

(Reporter: philipp, Assigned: aosmond)

References

Details

(Keywords: crash, perf, perf-alert)

Crash Data

Attachments

(3 files)

This bug is for crash report bp-3a13251a-bfd9-4ec0-a9b3-20da40191217.

Top 10 frames of crashing thread:

0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 mozglue.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:51
2 mozglue.dll moz_xmalloc memory/mozalloc/cxxalloc.h:33
3 xul.dll mozilla::gfx::SourceSurfaceRawData::GuaranteePersistance gfx/2d/SourceSurfaceRawData.cpp:39
4 xul.dll void mozilla::gfx::DrawTargetCaptureImpl::FillRect gfx/2d/DrawTargetCapture.cpp:196
5 xul.dll mozilla::layers::FillRectWithMask gfx/layers/basic/BasicLayersImpl.cpp:148
6 xul.dll void mozilla::layers::BasicImageLayer::Paint gfx/layers/basic/BasicImageLayer.cpp:81
7 xul.dll void mozilla::layers::BasicLayerManager::PaintSelfOrChildren gfx/layers/basic/BasicLayerManager.cpp:703
8 xul.dll void mozilla::layers::BasicLayerManager::PaintLayer gfx/layers/basic/BasicLayerManager.cpp
9 xul.dll void mozilla::layers::BasicLayerManager::PaintSelfOrChildren gfx/layers/basic/BasicLayerManager.cpp:723

this oom crash signature is on the rise since mid-november 2019 - commonly hitting 32bit versions of the browser. most of the time urls submitted with these reports are pointing to facebook.com.

@andrew: This seems to have been occuring for a long time and I'm guessing some facebook site changes have increased the frequency as beta and release increased in frequency at the same date. Is there anything else you can see here?

Flags: needinfo?(aosmond)
Priority: -- → P3

We could also try contacting facebook if there's any useful info we could give them.

SourceSurfaceRawData is used in a lot of corner cases. Whenever we call GuaranteePersistence, we do a copy of the buffer to ensure the data remains unchanged.

Image surfaces may be wrapped by SourceSurfaceRawData, especially when the basic compositor is in use on Windows (I notice almost all of the crashes are for non-WebRender). This is because an image will be stored in volatile memory, and when we "optimize" the image after we finish decoding, we will likely just leave it in a SourceSurfaceVolatileBuffer. When imgFrame::Draw is called with a DrawTargetCapture, we end up calling CreateLockedSurface and wrapping the data with a SourceSurfaceRawData.

For images, GuaranteePersistence should actually do nothing, so we should use a new source surface type to avoid that.

Assignee: nobody → aosmond
Component: Graphics → ImageLib
Flags: needinfo?(aosmond)
Priority: P3 → P2

I'm assuming this is mostly images fault, because that seems like the most common scenario with Facebook. There are many uses of SourceSurfaceRawData.

~4/5 crashes are for non-D2D and non-WR users which should be hitting the scenario I am concerned about.

This is also a performance issue with basic compositor on Windows, since we end up copying every large image when we draw with OMTP.

Keywords: perf

When one uses SourceSurfaceRawData to wrap a data pointer, it will
perform a copy of said data if GuaranteePersistence is called. This is
done for DrawTargetCapture, which is used with OMTP. Prior to this
patch, image surfaces would be wrapped by a SourceSurfaceRawData when
using the basic compositor on any non-Linux platform (since Linux does
not support volatile memory). This means every time imgFrame::Draw is
called with OMTP, a copy of the image will be made. Images don't need
this property since the data is already going to persist, so this patch
adds a new class SourceSurfaceMappedData, which takes a ScopedMap
keeping the underlying surface open and readable.

Pushed by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5f648569039b
Avoid copying image surfaces when used with OMTP. r=tnikkel
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73

we got another crash from today's nightly including the fix: bp-ce6a6787-74c7-426b-bd1a-e184a0191227

(In reply to [:philipp] from comment #10)

we got another crash from today's nightly including the fix: bp-ce6a6787-74c7-426b-bd1a-e184a0191227

I don't expect it to fix it entirely, but hopefully it will reduce the rate.

I'm not seeing any obvious improvement in the rates for Beta73 vs. Beta72. Should we reopen?

Flags: needinfo?(aosmond)

The vast majority of these crashes are on x86. OMTP is known to have a higher memory footprint than non-OMTP due to how capturing off the main thread works, and OOM is precisely the problem. Our current plan is just to disable OMTP on that architecture.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Disabling the pref by default allows users who really want it to still get it, rather than forcing it off in gfxPlatform::InitOMTPConfig.

Flags: needinfo?(aosmond)
Attachment #9121148 - Attachment description: Bug 1604535 - Disable OMTP on 32-bit systems by default due to memory pressure. → Bug 1604535 - Disable OMTP on 32-bit systems due to memory pressure.
Pushed by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f36e74d8c86c
Disable OMTP on 32-bit systems due to memory pressure. r=jrmuizel
Pushed by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b371f612cd60
Disable OMTP on 32-bit systems due to memory pressure. r=jrmuizel
Status: REOPENED → ASSIGNED
Target Milestone: mozilla73 → ---

Looking good for uplift potential assuming it sticks on nightly.

try on beta: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f113cbedf075df9ec4fa0ac1f70c1d83ec9fdb40

Status: ASSIGNED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74
Regressions: 1611009
Regressions: 1611098
Regressions: 1611144
Backout by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f97c48da9cee
Backed out changset b371f612cd60 for performance regressions.

Let's see if the images missing fixes related to OMTP I've landed somehow are related to these OOMs. The reports came in around the same time. If not, I will reland near the start of the cycle, and also limiting it by CPU count (<= 2 && 32-bit, no OMTP). Our CI infrastructure has more cores than that, so won't trip the perf regressions for most users with a higher end machine that just happened to be using the 32-bit binary, and there is unlikely to be much benefit to OMTP with 2 or fewer cores given how many threads/processes we already have. This would represent 75% of the crashes.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #9121148 - Attachment description: Bug 1604535 - Disable OMTP on 32-bit systems due to memory pressure. → Bug 1604535 - Disable OMTP on some 32-bit systems due to memory pressure.

== Change summary for alert #24827 (as of Fri, 31 Jan 2020 18:18:53 GMT) ==

Regressions:

4% tscrollx windows7-32-shippable opt e10s stylo 0.69 -> 0.72

Improvements:

37% tsvgx windows7-32-shippable opt e10s stylo 196.87 -> 123.82
25% tsvgr_opacity windows7-32-shippable opt e10s stylo 159.90 -> 119.59
17% tsvg_static windows7-32-shippable opt e10s stylo 62.45 -> 51.62

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24827

== Change summary for alert #24834 (as of Mon, 03 Feb 2020 09:39:09 GMT) ==

Regressions:

7% Heap Unclassified windows7-32 opt 34,915,886.27 -> 37,403,325.66
7% Heap Unclassified windows7-32-shippable opt 34,979,821.17 -> 37,365,810.32
5% Heap Unclassified windows7-32 opt tp6 51,462,141.78 -> 53,909,844.11
4% Heap Unclassified windows7-32-shippable opt tp6 51,327,485.14 -> 53,386,403.55
4% Resident Memory windows7-32-shippable opt 590,927,739.53 -> 614,376,924.49
4% Resident Memory windows7-32 opt 592,328,328.40 -> 614,356,772.50
4% Resident Memory windows7-32 opt tp6 656,602,956.46 -> 679,811,951.59
3% Resident Memory windows7-32-shippable opt tp6 661,003,009.91 -> 678,726,640.96
3% Resident Memory windows7-32 opt tp6 659,576,248.55 -> 676,985,456.29

Improvements:

23% Images windows7-32 opt 6,059,289.41 -> 4,661,818.94
21% Images windows7-32-shippable opt 6,109,727.16 -> 4,856,316.79
12% Images windows7-32-shippable opt tp6 7,786,851.52 -> 6,874,368.23
11% Images windows7-32 opt tp6 7,644,862.43 -> 6,805,141.18

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24834

From 73 RC crash reports, we can see the images missing bugs I resolved related to OMTP do not appear related to this issue. I will reland disabling this, albeit for a much more selective population. This should not affect CI due to it having sufficient cores and memory available to pass the checks.

Pushed by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/94b179d476b7
Disable OMTP on some 32-bit systems due to memory pressure. r=jrmuizel
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74
Keywords: perf-alert

This decision was likely for the best.

But couldn't you reword the warning a bit better?
"Not supported on 32-bit with <= 2 cores" made me think to a transient bug like that time dav1d was pulled in.

Though.. Shouldn't the main criteria just be memory?
Like, even more so if I was running 64 bit firefox, 2GB of total RAM would be very little.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: