Closed Bug 1712047 (CVE-2021-29968) Opened 3 years ago Closed 3 years ago

Crash in [@ sse41::lowp::gather_8888]

Categories

(Core :: Graphics, defect)

All
Windows 10
defect

Tracking

()

RESOLVED FIXED
91 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox89 + fixed
firefox90 + fixed
firefox91 --- fixed

People

(Reporter: pascalc, Assigned: lsalzman)

References

(Depends on 1 open bug, Regression)

Details

(4 keywords)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/68d80abf-0878-437e-bd1e-6e79a0210520

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0 xul.dll sse41::lowp::gather_8888 gfx/skia/skia/src/opts/SkRasterPipeline_opts.h:3780
1 xul.dll sse2::lowp::start_pipeline gfx/skia/skia/src/opts/SkRasterPipeline_opts.h:3141
2 xul.dll std::_Func_impl_no_alloc<`lambda at /builds/worker/checkouts/gecko/gfx/skia/skia/src/core/SkRasterPipeline.cpp:355:12', void, unsigned int, unsigned int, unsigned int, unsigned int>::_Do_call 
3 xul.dll SkRasterPipelineBlitter::blitRect gfx/skia/skia/src/core/SkRasterPipelineBlitter.cpp:298
4 xul.dll static SkScan::FillIRect gfx/skia/skia/src/core/SkScan.cpp:28
5 xul.dll static SkScan::FillRect gfx/skia/skia/src/core/SkScan.cpp:111
6 xul.dll SkDraw::drawRect const gfx/skia/skia/src/core/SkDraw.cpp:679
7 xul.dll SkBitmapDevice::drawRect gfx/skia/skia/src/core/SkBitmapDevice.cpp:365
8 xul.dll SkBitmapDevice::drawBitmapRect gfx/skia/skia/src/core/SkBitmapDevice.cpp:543
9 xul.dll SkBaseDevice::drawImageRect gfx/skia/skia/src/core/SkDevice.cpp:141

This signature started showing up in 89 beta 13

Severity: -- → S2

Crashing on Windows version is 10.0.18363 (1909) so kinda getting long in the tooth (and is out of support as of May 11th 2021). But also that this crash shows an x86 machine with what seems like only 2GB RAM and an old 8.15.10.2900 driver (Sandy Bridge era or earlier?) and only 492MB free memory. If I had to guess based on that info alone, maybe GPU is sharing already low amount of RAM and the system got starved and did a "fainting goat".

From what I can see, it's affecting all versions of Win10. New in b13 also makes me wonder if it's tied to the end of the early beta period (b12 was the final early beta build). Also, hiding this bug for now because the crash addresses look wildptr-ish?

Group: gfx-core-security
Hardware: x86 → All

The only thing I see looking at the changelogs was the changeover from early beta. Software WebRender would have been enabled on Windows during the early beta, and then after early beta finished, we would go back to D3D11 Layers. It doesn't look like the crash reporters are qualifying for HW-WR in this case for some reason.

So that points to something in the 89 cycle that adversely affected the stability of the gfxFontMissingGlyphs code in the Layers path, but was otherwise obscured by SW-WR being enabled during the Nightly cycle and during early beta.

There were a couple changes I might suspect initially impacted this, but seem to have been in earlier versions, i.e. bug 1694248 and bug 1541472.

Not quite sure what's going on here yet as such.

I tried reproducing this on a similar configuration (WebRender disabled with D3D11 compositor + D2D) on some of the URLs and couldn't.

There are not going to be very many users in this configuration in 89 so the overall crash volume won't be so high, however for those users with this configuration it could be noticeably bad. That being said, it's planed that those users will be moving to a WebRender configuration in 90 so they won't have to put up with it for long.

Crash Signature: [@ sse41::lowp::gather_8888] → [@ sse2::lowp::bilerp_clamp_8888 ] [@ sse2::lowp::gather_8888] [@ sse41::lowp::bilerp_clamp_8888] [@ sse41::lowp::gather_8888]
Crash Signature: [@ sse2::lowp::bilerp_clamp_8888 ] [@ sse2::lowp::gather_8888] [@ sse41::lowp::bilerp_clamp_8888] [@ sse41::lowp::gather_8888] → [@ Sprite_D8_S32::blitRect] [@ sse2::lowp::bilerp_clamp_8888 ] [@ sse2::lowp::gather_8888] [@ sse41::lowp::bilerp_clamp_8888] [@ sse41::lowp::gather_8888]

Jeff, these new signatures total 3% of our crashes on release, which makes it into out top 5 of crashers for 89. Can we get somebody assigned to this bug so as that we may be able to have a fix we could uplift to a dot release? Thanks

Flags: needinfo?(jmuizelaar)

Lee pointed out https://phabricator.services.mozilla.com/D109437 which seems like it could be the cause.

Flags: needinfo?(jmuizelaar) → needinfo?(aosmond)
Assignee: nobody → lsalzman
Status: NEW → ASSIGNED

We believe this might have been regressed by bug 1699224.

Landed:

https://hg.mozilla.org/integration/autoland/rev/9e76c07577b15d9efebc2479e02052ece480cbaeBacked out for causing bustages in DrawTargetSkia.cpp:

https://hg.mozilla.org/integration/autoland/rev/b29d87f7f43e29e7c147a8bc65eb2d6a9e9f56e0

Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&searchStr=windows%2Copt&revision=9e76c07577b15d9efebc2479e02052ece480cbae
Failure log: https://treeherder.mozilla.org/logviewer?job_id=342011618&repo=autoland

[task 2021-06-07T16:46:49.353Z] 16:46:49 INFO - /builds/worker/checkouts/gecko/gfx/2d/DrawTargetSkia.cpp(276,41): error: no member named 'take' in 'RefPtr<mozilla::gfx::DataSourceSurface>'
[task 2021-06-07T16:46:49.353Z] 16:46:49 INFO - DataSourceSurface* surf = dataSurface.take();
[task 2021-06-07T16:46:49.353Z] 16:46:49 INFO - ~~~~~~~~~~~ ^
[task 2021-06-07T16:46:49.353Z] 16:46:49 INFO - 1 error generated.

Flags: needinfo?(lsalzman)
Flags: needinfo?(lsalzman)

We need forget() instead of take()

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 91 Branch
Group: gfx-core-security → core-security-release

Comment on attachment 9225645 [details]
Bug 1712047 - keep Skia temporary data surface alive. r?jrmuizel

Beta/Release Uplift Approval Request

  • User impact if declined: Potential crashes in non-WebRender configurations.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This fixes what we believe to be an obvious bug in the time range in nearby code. While we are not 100% sure this is the cause of the crashes due to an inability to fully reproduce this locally, but we have a strong suspicion this will fix it. Unfortunately, the only way to know if this actually fixes it will be to deploy the fix.
  • String changes made/needed:
Attachment #9225645 - Flags: approval-mozilla-release?
Attachment #9225645 - Flags: approval-mozilla-beta?
Regressed by: 1699224
Has Regression Range: --- → yes

Can we turn off WR for the affected population for one beta build in 90, along with the uplift, so we don't have to wait until this goes to late beta or release before we confirm it fixes the issue?

Flags: needinfo?(lsalzman)
Flags: needinfo?(jmuizelaar)

I was able to reproduce this when using similar hardware as we were seeing crashes on. I confirmed that the build with the fix doesn't crash anymore. The patch should be quite safe so I'm eager to deploy it ASAP. The fact that it's a top crash while only being used by a small number of users suggests that it's hit quite often. It also probably reliably triggers on some webcontent which will make those sites unusable for those users.

Flags: needinfo?(jmuizelaar)
Flags: needinfo?(lsalzman)

Comment on attachment 9225645 [details]
Bug 1712047 - keep Skia temporary data surface alive. r?jrmuizel

approved for 90.0b5

Attachment #9225645 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1715379
Depends on: 1715382

I've added a blocklist entry for D2D on the affected hardware which should reduce this crash.

Comment on attachment 9225645 [details]
Bug 1712047 - keep Skia temporary data surface alive. r?jrmuizel

Approved for an 89 dot release.

Attachment #9225645 - Flags: approval-mozilla-release? → approval-mozilla-release+
Alias: CVE-2021-29968

I wrote the follow advisory if it looks incorrect to anyone please advise.

## mfsa2021-27.yml
announced: June 14, 2021
impact: moderate
fixed_in:
- Firefox 89.0.1
title: Security Vulnerabilities fixed in Firefox 89.0.1
advisories:
  CVE-2021-29968:
    title: Out of bounds read when drawing text characters onto a Canvas
    impact: moderate
    reporter: Pascal Chevrel
    description: |
      When drawing text onto a canvas with WebRender disabled, an out of bounds read could occur. <br>*This bug only affects Thunderbird on Windows. Other operating systems are unaffected.*
    bugs:
      - url: 1712047

This bug only affects Thunderbird on Windows.

Should probably say Firefox, not Thunderbird

It's also not related to canvas. The problem can occur on regular html content. I would go with something like:
"Use after free when interoperating between Direct2D and Skia when WebRender is disabled"

Flags: needinfo?(aosmond)
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: