Closed Bug 1414389 Opened 8 years ago Closed 2 years ago

Crash in mozilla::gfx::DrawTargetSkia::FillRect

Categories

(Core :: Graphics: Layers, defect, P3)

Unspecified
Windows 7
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 + wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix

People

(Reporter: marcia, Assigned: rhunt)

References

Details

(5 keywords, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is report bp-ac95df4c-5c1d-4f78-a9a6-767ce0171102. ============================================================= Seen while looking at nightly crash stats: http://bit.ly/2zuw0q7 Crashes started in 58 using 20171031220132 (17 crashes/11 installs) and affect Win 7, Win 8 and Win 10. One crash also present in the later 57 betas, and 2 in Firefox 56. marking as sec sensitive due to the possible UAF addresses.
This may very well be the same root cause as bug 1413857, just some kind of a race condition. Even though it's in 56, possibly tickled by OMTP.
Flags: needinfo?(rhunt)
Flags: needinfo?(dvander)
Flags: needinfo?(bas)
Priority: -- → P3
Whiteboard: [gfx-noted]
Agreed.
Depends on: 1412850
Flags: needinfo?(bas)
Flags: needinfo?(rhunt)
Flags: needinfo?(dvander)
I see a few other signatures that look related: *mozilla::gfx::GetSkImageForSurface - http://bit.ly/2zypvls *mozilla::gfx::StoredPattern::Assign - http://bit.ly/2zypvls
59 also looks to be affected.
this is one of the top content crashes in 58.0b so far (nearly 2% of content crashes in 58.0b6). can we increase the priority based on this?
Flags: needinfo?(milan)
Lee, Ryan, this is a DrawTargetSkia::FillRect fast path (bug 1294337) with OMTP.
Flags: needinfo?(rhunt)
Flags: needinfo?(milan)
Flags: needinfo?(lsalzman)
(In reply to Milan Sreckovic [:milan] from comment #8) > Lee, Ryan, this is a DrawTargetSkia::FillRect fast path (bug 1294337) with > OMTP. The fact that this is inside DrawTargetSkia::FillRect is merely incidental, and not causative. I don't think the problem is FillRect directly. It's accessing a SourceSurface inside a SurfacePattern on a DrawTargetCapture playback. There are thus two possibilities here given the line it is crashing at: https://hg.mozilla.org/mozilla-central/annotate/e8ce4865e421/gfx/2d/DrawTargetSkia.cpp#l765 Either 1) the pat.mSurface is crashing, meaning pat is already freed, which means the FillRectCommand containing this StoredSurface/SurfacePattern is freed, which means the DrawTargetCapture around it is likely thus bogus. or 2) pat.mSurface->GetFormat() is crashing, implying that the mSurface is already freed, even though pat is still valid. Superficial inspection of the StoredPattern code looks like it is more or less currently keeping the reference to the SourceSurface that is pointed at, though. I don't have access to bug 1413857, so I can offer no insight into a relationship there. Someone more familiar with the lifetimes of SourceSurfaces within OMTP would be better to follow up.
Flags: needinfo?(lsalzman)
I suspect this is bug 1416862 with subtly different timings, let's see how the patch there affects this.
Flags: needinfo?(rhunt)
Crash numbers are low on nightly so it's hard to say if bug 1416862 made a difference here, but I'm going to keep an eye on this.
Assignee: nobody → bas
Status: NEW → ASSIGNED
Analysis of a couple of crash dumps shows that for the address of the mSurface vtable we find 0xe5e5e5e5 or 0x1 in most of these cases. This suggests bug 1416862 will either fix this or there may be another race condition we don't know about yet. So far I haven't been able to figure out if the surface is a snapshot here or not.
Bug 1416862 landed around 12/8, and was uplifted to beta 12/10 (so isn't in 58b10, but would be in b11). 59 crash rate is too low to use for verification. Beta should show quickly if that bug fixed this.
Time has passed long enough for us to get new crashes. We still get some reports after 58b11 with addresses like 0xffffffffe5e5e5f1, 0xffffffffe0201b64, 0x7fed3ffc5e0, 0xffffffffffffffff still implies a use-after-free. Can you take another look, Bas?
Flags: needinfo?(bas)
So, in the recent builds (58.0.2), there appear to be three crashes, one of them is with 0xffffffffffffffff, and one with 0x0, those are completely different crashes than the original bug here, it is actually accessing aPattern here rather than aSurface, so accessing arguments to the functions, these are very mysterious, but I don't think at a crash rate of 2 over all of 58.0.2, they're a concern, nor do I think there's enough information at least in these bugs to figure out the cause, it may just be some random form of corruption at this crash rate. The third is a surface which it can't find at the address it's expecting, but that doesn't have the typical UAF poisoning pattern, it may be something else going on, but from a single crash it's really hard to figure out what it could be. That crash has no painting activity on any other threads at least. There's a single one that looks like a real UAF (with the memory poisoning pattern in there), which is https://crash-stats.mozilla.com/report/index/ae44615c-0070-413f-a291-a5f860180213. This one is the sole crash that looks similar to the original bug filed here, I can't really explain that one. There's also a non-OMTP one in OS X here: https://crash-stats.mozilla.com/report/index/4f3bf002-426e-4a81-bf8e-c52c10180207 This is an extremely hot function, these crash reports are so rare, and show so little correlation, I think we can consider the original bug fixed. Unless we find something more common indicative of a race condition, if there is one, I don't think at this point this is actionable, or a real threat to our users from a security perspective.
Flags: needinfo?(bas)
(In reply to Randell Jesup [:jesup] from comment #16) > Exec wildptr crash in 59b6: > https://crash-stats.mozilla.com/report/index/309ac918-089b-43c3-878e- > 5fbd90180207 and other wildptr READ violations I'll have a look at these, not sure what's up with these, they don't look like the original bug. > Lots of clear UAF signatures in 59a1 and 58bN: > https://crash-stats.mozilla.com/report/index/1262307a-d9f3-4ad0-b14c- > 685140180217 (that seems to have gone away - temporary bug?) It looks like these were actually the original bug? Indeed this report is from before the patch landed (11/22) > 25-30 58.0.N crashes, many wildptr, one clear UAF: > https://crash-stats.mozilla.com/report/index/ae44615c-0070-413f-a291- > a5f860180213 I think this is the one I listed in my comment, that's the one crash that looks like the original bug, I can't explain it, but it's pretty lonely in there.
Flags: needinfo?(bas)
Any news?
Flags: needinfo?(bas)
https://crash-stats.mozilla.com/report/index/06c1020a-3735-4725-988f-0ea500180602 is an e5e5 crash in 60.0.1 About 30 or 40 crashes in 60 and newer in the last month. Mostly wildptr or fff's, one e5
From a discussion with Lee last week, these crashes could be caused by us iterating over the command list at the same time we are mutating it (causing a resize and free of the backing store). This should not be happening and it's not clear how we could be doing that as we lock around this. So I think it may be worth adding a diagnostic assert to catch improper uses of CaptureCommandList to give us some more useful call stacks. I've prototyped a patch and will upload it soon.
Assignee: bas → rhunt
Flags: needinfo?(bas)
This patch adds some threadsafe tracking information to assert that a capture command list is either only being read (by possibly multiple readers), or only being written to by one writer. The tracking flags should detect any case of calling Append, ReuseOrAppend, operator= while iterating over the display list. It will also assert if we destroy the CaptureCommandList while an iterator still active. The one thing this doesn't protect against is multiple threads calling Append but somehow always never at the same time, but stomping over the pointer returned by Append. This is a limitation of the CaptureCommandList API, but I don't think it's worth working around because it's very unlikely.
Attachment #8989447 - Flags: review?(bas)
Attachment #8989447 - Flags: review?(bas) → review+
Does this patch need sec-review? It's mostly just a diagnostic crash to try and discover the real issue.
Flags: needinfo?(dveditz)
Comment on attachment 8989447 [details] [diff] [review] capture-command-list-thread-safety.patch It needs sec-approval+, not sec-review. Even though it's not a fix we need to double-check that such diagnostic changes don't point a finger at an obvious problem area. sec-approval=dveditz Make sure you mark the bug "leave-open" so the sheriffs don't close the bug when they merge the diagnostic patch.
Flags: needinfo?(dveditz)
Attachment #8989447 - Flags: sec-approval+
Keywords: leave-open
I did some searching on crash stats and have been unable to find any reports hitting this assert yet. Unsure if either there is a processing delay, I'm not searching optimally, or no users have hit this assert. I've searched for all combinations of `RwAssert` and `CaptureCommandList` in `proto signature` and `signature`.
From triaging my open bugs I noticed bug 1445481, which is a release assert being hit in SourceSurfaceCapture. Now that I know this code better here, this assert was added to detect when we have a SourceSurfaceCapture on the paint thread which has an underlying CaptureCommandList that could be mutated by an attached DrawTargetCapture. This seems like it could be the same situation, and getting a better crash report here could help there.
Also according to Bas, the locking and tracking flags introduced here are showing up in his profiles and are causing some real performance regressions. I'd like to leave this code in until we get some actionable crash reports and then remove it before it goes to beta. Or alternatively compile it conditionally on nightly.
(In reply to Ryan Hunt [:rhunt] from comment #25) > I did some searching on crash stats and have been unable to find any reports > hitting this assert yet. > > Unsure if either there is a processing delay, I'm not searching optimally, > or no users have hit this assert. > > I've searched for all combinations of `RwAssert` and `CaptureCommandList` in > `proto signature` and `signature`. Update. Still haven't seen any diagnostics crashes in crash-stats. I haven't seen any of the original FillRect crashes in the 63 nightly either [1]. [1] https://crash-stats.mozilla.com/signature/?product=Firefox&version=63.0a1&signature=mozilla::gfx::DrawTargetSkia::FillRect&date=%3E%3D2017-10-19T20:14:37.000Z&date=%3C2018-07-17T20:14:00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1
The crashes now are definitely different. I don't see the original UAF signatures after 58 (in recent crashes; see comment 19), and a bunch of the recent crashes are near-null. In the 62/63 crashes it looks like aPattern is bogus and is simply detected at various points depending on how bogus (usually at aPattern.GetType() or pat.mSurface->GetFormat() a couple lines later). But it's not marked with the freed-memory poison value.
Attached patch rw-assert.patchSplinter Review
Still haven't seen any diagnostic crashes since it landed. According to lsalzman and Bas, this assertion has real performance impacts and shows up in profiles so we should remove it.
Attachment #9002962 - Flags: review?(lsalzman)
Attachment #9002962 - Flags: review?(lsalzman) → review+
Comment on attachment 9002962 [details] [diff] [review] rw-assert.patch [Security approval request comment] How easily could an exploit be constructed based on the patch? Not easily, it's removing a diagnostic crash. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No Which older supported branches are affected by this flaw? Unsure If not all supported branches, which bug introduced the flaw? Enabling of OMTP Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? How likely is this patch to cause regressions; how much testing does it need?
Attachment #9002962 - Flags: sec-approval?
sec-approval+ for landing on trunk since it is just removing the crash.
Attachment #9002962 - Flags: sec-approval? → sec-approval+
Ryan, do you think that your patch can be uplifted to 63 beta? Thanks
Flags: needinfo?(rhunt)
The diagnostic patch was added and removed in the 63 nightly cycle so no uplifting should be necessary for removing the diagnostic from any of our branches.
Flags: needinfo?(rhunt)
(In reply to Ryan Hunt [:rhunt] from comment #35) > The diagnostic patch was added and removed in the 63 nightly cycle so no > uplifting should be necessary for removing the diagnostic from any of our > branches. Thanks, adjusting flags accordingly then.
Adding Jessie (new graphics engineering manager) to all sec-crit and sec-high graphics bugs
See Also: → 1440937
Keywords: stalled

The leave-open keyword is there and there is no activity for 6 months.
:rhunt, maybe it's time to close this bug?

Flags: needinfo?(rhunt)

Looking at the crash table, it seems like this is very low frequency but is still probably not fixed.

Flags: needinfo?(rhunt)

The leave-open keyword is there and there is no activity for 6 months.
:rhunt, maybe it's time to close this bug?

Flags: needinfo?(rhunt)
Flags: needinfo?(rhunt)
Severity: critical → S2

Haven't seen this in current builds

Group: gfx-core-security
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.

Keywords: stalled
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: