Closed
Bug 1414389
Opened 8 years ago
Closed 2 years ago
Crash in mozilla::gfx::DrawTargetSkia::FillRect
Categories
(Core :: Graphics: Layers, defect, P3)
Tracking
()
RESOLVED
WORKSFORME
| Tracking | Status | |
|---|---|---|
| firefox-esr52 | --- | unaffected |
| firefox-esr60 | --- | wontfix |
| firefox56 | --- | wontfix |
| firefox57 | --- | wontfix |
| firefox58 | --- | wontfix |
| firefox59 | --- | wontfix |
| firefox60 | --- | wontfix |
| firefox61 | --- | wontfix |
| firefox62 | --- | wontfix |
| firefox63 | + | wontfix |
| firefox64 | --- | wontfix |
| firefox65 | --- | wontfix |
| firefox66 | --- | wontfix |
People
(Reporter: marcia, Assigned: rhunt)
References
Details
(5 keywords, Whiteboard: [gfx-noted])
Crash Data
Attachments
(2 files)
|
4.39 KB,
patch
|
bas.schouten
:
review+
dveditz
:
sec-approval+
|
Details | Diff | Splinter Review |
|
3.85 KB,
patch
|
lsalzman
:
review+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-ac95df4c-5c1d-4f78-a9a6-767ce0171102.
=============================================================
Seen while looking at nightly crash stats: http://bit.ly/2zuw0q7
Crashes started in 58 using 20171031220132 (17 crashes/11 installs) and affect Win 7, Win 8 and Win 10. One crash also present in the later 57 betas, and 2 in Firefox 56.
marking as sec sensitive due to the possible UAF addresses.
This may very well be the same root cause as bug 1413857, just some kind of a race condition. Even though it's in 56, possibly tickled by OMTP.
Flags: needinfo?(rhunt)
Flags: needinfo?(dvander)
Flags: needinfo?(bas)
Updated•8 years ago
|
Updated•8 years ago
|
Also, bug 1412850
Updated•8 years ago
|
Flags: needinfo?(rhunt)
Flags: needinfo?(dvander)
Updated•8 years ago
|
Keywords: csectype-race,
csectype-uaf
| Reporter | ||
Comment 4•8 years ago
|
||
I see a few other signatures that look related:
*mozilla::gfx::GetSkImageForSurface - http://bit.ly/2zypvls
*mozilla::gfx::StoredPattern::Assign - http://bit.ly/2zypvls
Updated•8 years ago
|
Keywords: regression,
sec-high
Comment 7•8 years ago
|
||
this is one of the top content crashes in 58.0b so far (nearly 2% of content crashes in 58.0b6).
can we increase the priority based on this?
Flags: needinfo?(milan)
Lee, Ryan, this is a DrawTargetSkia::FillRect fast path (bug 1294337) with OMTP.
Flags: needinfo?(rhunt)
Flags: needinfo?(milan)
Flags: needinfo?(lsalzman)
Comment 9•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #8)
> Lee, Ryan, this is a DrawTargetSkia::FillRect fast path (bug 1294337) with
> OMTP.
The fact that this is inside DrawTargetSkia::FillRect is merely incidental, and not causative. I don't think the problem is FillRect directly. It's accessing a SourceSurface inside a SurfacePattern on a DrawTargetCapture playback. There are thus two possibilities here given the line it is crashing at: https://hg.mozilla.org/mozilla-central/annotate/e8ce4865e421/gfx/2d/DrawTargetSkia.cpp#l765
Either 1) the pat.mSurface is crashing, meaning pat is already freed, which means the FillRectCommand containing this StoredSurface/SurfacePattern is freed, which means the DrawTargetCapture around it is likely thus bogus.
or 2) pat.mSurface->GetFormat() is crashing, implying that the mSurface is already freed, even though pat is still valid. Superficial inspection of the StoredPattern code looks like it is more or less currently keeping the reference to the SourceSurface that is pointed at, though.
I don't have access to bug 1413857, so I can offer no insight into a relationship there.
Someone more familiar with the lifetimes of SourceSurfaces within OMTP would be better to follow up.
Flags: needinfo?(lsalzman)
Comment 10•8 years ago
|
||
I suspect this is bug 1416862 with subtly different timings, let's see how the patch there affects this.
Updated•8 years ago
|
Flags: needinfo?(rhunt)
Comment 11•8 years ago
|
||
Crash numbers are low on nightly so it's hard to say if bug 1416862 made a difference here, but I'm going to keep an eye on this.
Assignee: nobody → bas
Status: NEW → ASSIGNED
Comment 12•8 years ago
|
||
Analysis of a couple of crash dumps shows that for the address of the mSurface vtable we find 0xe5e5e5e5 or 0x1 in most of these cases. This suggests bug 1416862 will either fix this or there may be another race condition we don't know about yet. So far I haven't been able to figure out if the surface is a snapshot here or not.
Comment 13•8 years ago
|
||
Bug 1416862 landed around 12/8, and was uplifted to beta 12/10 (so isn't in 58b10, but would be in b11). 59 crash rate is too low to use for verification. Beta should show quickly if that bug fixed this.
Comment 14•8 years ago
|
||
Time has passed long enough for us to get new crashes.
We still get some reports after 58b11 with addresses like 0xffffffffe5e5e5f1, 0xffffffffe0201b64, 0x7fed3ffc5e0, 0xffffffffffffffff still implies a use-after-free. Can you take another look, Bas?
Flags: needinfo?(bas)
Comment 15•8 years ago
|
||
So, in the recent builds (58.0.2), there appear to be three crashes, one of them is with 0xffffffffffffffff, and one with 0x0, those are completely different crashes than the original bug here, it is actually accessing aPattern here rather than aSurface, so accessing arguments to the functions, these are very mysterious, but I don't think at a crash rate of 2 over all of 58.0.2, they're a concern, nor do I think there's enough information at least in these bugs to figure out the cause, it may just be some random form of corruption at this crash rate.
The third is a surface which it can't find at the address it's expecting, but that doesn't have the typical UAF poisoning pattern, it may be something else going on, but from a single crash it's really hard to figure out what it could be. That crash has no painting activity on any other threads at least.
There's a single one that looks like a real UAF (with the memory poisoning pattern in there), which is https://crash-stats.mozilla.com/report/index/ae44615c-0070-413f-a291-a5f860180213. This one is the sole crash that looks similar to the original bug filed here, I can't really explain that one.
There's also a non-OMTP one in OS X here: https://crash-stats.mozilla.com/report/index/4f3bf002-426e-4a81-bf8e-c52c10180207
This is an extremely hot function, these crash reports are so rare, and show so little correlation, I think we can consider the original bug fixed. Unless we find something more common indicative of a race condition, if there is one, I don't think at this point this is actionable, or a real threat to our users from a security perspective.
Flags: needinfo?(bas)
Comment 16•8 years ago
|
||
Exec wildptr crash in 59b6: https://crash-stats.mozilla.com/report/index/309ac918-089b-43c3-878e-5fbd90180207 and other wildptr READ violations
Lots of clear UAF signatures in 59a1 and 58bN: https://crash-stats.mozilla.com/report/index/1262307a-d9f3-4ad0-b14c-685140180217 (that seems to have gone away - temporary bug?)
25-30 58.0.N crashes, many wildptr, one clear UAF: https://crash-stats.mozilla.com/report/index/ae44615c-0070-413f-a291-a5f860180213
Search: https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3Agfx%3A%3ADrawTargetSkia%3A%3AFillRect&date=%3E%3D2018-01-28T08%3A48%3A56.000Z&date=%3C2018-02-28T08%3A48%3A56.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-version&_sort=-date&page=1
Further thoughts, Bas?
Flags: needinfo?(bas)
Comment 17•8 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #16)
> Exec wildptr crash in 59b6:
> https://crash-stats.mozilla.com/report/index/309ac918-089b-43c3-878e-
> 5fbd90180207 and other wildptr READ violations
I'll have a look at these, not sure what's up with these, they don't look like the original bug.
> Lots of clear UAF signatures in 59a1 and 58bN:
> https://crash-stats.mozilla.com/report/index/1262307a-d9f3-4ad0-b14c-
> 685140180217 (that seems to have gone away - temporary bug?)
It looks like these were actually the original bug? Indeed this report is from before the patch landed (11/22)
> 25-30 58.0.N crashes, many wildptr, one clear UAF:
> https://crash-stats.mozilla.com/report/index/ae44615c-0070-413f-a291-
> a5f860180213
I think this is the one I listed in my comment, that's the one crash that looks like the original bug, I can't explain it, but it's pretty lonely in there.
Flags: needinfo?(bas)
Updated•8 years ago
|
Updated•7 years ago
|
status-firefox61:
--- → wontfix
status-firefox62:
--- → affected
status-firefox-esr52:
--- → unaffected
status-firefox-esr60:
--- → affected
Comment 19•7 years ago
|
||
https://crash-stats.mozilla.com/report/index/06c1020a-3735-4725-988f-0ea500180602 is an e5e5 crash in 60.0.1
About 30 or 40 crashes in 60 and newer in the last month. Mostly wildptr or fff's, one e5
| Assignee | ||
Comment 20•7 years ago
|
||
From a discussion with Lee last week, these crashes could be caused by us iterating over the command list at the same time we are mutating it (causing a resize and free of the backing store). This should not be happening and it's not clear how we could be doing that as we lock around this.
So I think it may be worth adding a diagnostic assert to catch improper uses of CaptureCommandList to give us some more useful call stacks.
I've prototyped a patch and will upload it soon.
Assignee: bas → rhunt
Flags: needinfo?(bas)
| Assignee | ||
Comment 21•7 years ago
|
||
This patch adds some threadsafe tracking information to assert that a capture command list is either only being read (by possibly multiple readers), or only being written to by one writer.
The tracking flags should detect any case of calling Append, ReuseOrAppend, operator= while iterating over the display list. It will also assert if we destroy the CaptureCommandList while an iterator still active.
The one thing this doesn't protect against is multiple threads calling Append but somehow always never at the same time, but stomping over the pointer returned by Append. This is a limitation of the CaptureCommandList API, but I don't think it's worth working around because it's very unlikely.
Attachment #8989447 -
Flags: review?(bas)
Updated•7 years ago
|
Attachment #8989447 -
Flags: review?(bas) → review+
| Assignee | ||
Comment 22•7 years ago
|
||
Does this patch need sec-review? It's mostly just a diagnostic crash to try and discover the real issue.
Flags: needinfo?(dveditz)
Comment 23•7 years ago
|
||
Comment on attachment 8989447 [details] [diff] [review]
capture-command-list-thread-safety.patch
It needs sec-approval+, not sec-review. Even though it's not a fix we need to double-check that such diagnostic changes don't point a finger at an obvious problem area.
sec-approval=dveditz
Make sure you mark the bug "leave-open" so the sheriffs don't close the bug when they merge the diagnostic patch.
Flags: needinfo?(dveditz)
Attachment #8989447 -
Flags: sec-approval+
| Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Comment 24•7 years ago
|
||
| Assignee | ||
Comment 25•7 years ago
|
||
I did some searching on crash stats and have been unable to find any reports hitting this assert yet.
Unsure if either there is a processing delay, I'm not searching optimally, or no users have hit this assert.
I've searched for all combinations of `RwAssert` and `CaptureCommandList` in `proto signature` and `signature`.
| Assignee | ||
Comment 26•7 years ago
|
||
From triaging my open bugs I noticed bug 1445481, which is a release assert being hit in SourceSurfaceCapture.
Now that I know this code better here, this assert was added to detect when we have a SourceSurfaceCapture on the paint thread which has an underlying CaptureCommandList that could be mutated by an attached DrawTargetCapture.
This seems like it could be the same situation, and getting a better crash report here could help there.
| Assignee | ||
Comment 27•7 years ago
|
||
Also according to Bas, the locking and tracking flags introduced here are showing up in his profiles and are causing some real performance regressions.
I'd like to leave this code in until we get some actionable crash reports and then remove it before it goes to beta. Or alternatively compile it conditionally on nightly.
| Assignee | ||
Comment 28•7 years ago
|
||
(In reply to Ryan Hunt [:rhunt] from comment #25)
> I did some searching on crash stats and have been unable to find any reports
> hitting this assert yet.
>
> Unsure if either there is a processing delay, I'm not searching optimally,
> or no users have hit this assert.
>
> I've searched for all combinations of `RwAssert` and `CaptureCommandList` in
> `proto signature` and `signature`.
Update. Still haven't seen any diagnostics crashes in crash-stats. I haven't
seen any of the original FillRect crashes in the 63 nightly either [1].
[1] https://crash-stats.mozilla.com/signature/?product=Firefox&version=63.0a1&signature=mozilla::gfx::DrawTargetSkia::FillRect&date=%3E%3D2017-10-19T20:14:37.000Z&date=%3C2018-07-17T20:14:00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1
Comment 29•7 years ago
|
||
The crashes now are definitely different. I don't see the original UAF signatures after 58 (in recent crashes; see comment 19), and a bunch of the recent crashes are near-null.
In the 62/63 crashes it looks like aPattern is bogus and is simply detected at various points depending on how bogus (usually at aPattern.GetType() or pat.mSurface->GetFormat() a couple lines later). But it's not marked with the freed-memory poison value.
| Assignee | ||
Comment 30•7 years ago
|
||
Still haven't seen any diagnostic crashes since it landed. According to lsalzman and Bas, this assertion has real performance impacts and shows up in profiles so we should remove it.
Attachment #9002962 -
Flags: review?(lsalzman)
Updated•7 years ago
|
Attachment #9002962 -
Flags: review?(lsalzman) → review+
| Assignee | ||
Comment 31•7 years ago
|
||
Comment on attachment 9002962 [details] [diff] [review]
rw-assert.patch
[Security approval request comment]
How easily could an exploit be constructed based on the patch? Not easily, it's removing a diagnostic crash.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No
Which older supported branches are affected by this flaw? Unsure
If not all supported branches, which bug introduced the flaw? Enabling of OMTP
Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
How likely is this patch to cause regressions; how much testing does it need?
Attachment #9002962 -
Flags: sec-approval?
Comment 32•7 years ago
|
||
sec-approval+ for landing on trunk since it is just removing the crash.
status-firefox63:
--- → affected
tracking-firefox63:
--- → +
Updated•7 years ago
|
Attachment #9002962 -
Flags: sec-approval? → sec-approval+
Comment 33•7 years ago
|
||
Updated•7 years ago
|
Comment 34•7 years ago
|
||
Ryan, do you think that your patch can be uplifted to 63 beta? Thanks
Flags: needinfo?(rhunt)
| Assignee | ||
Comment 35•7 years ago
|
||
The diagnostic patch was added and removed in the 63 nightly cycle so no uplifting should be necessary for removing the diagnostic from any of our branches.
Flags: needinfo?(rhunt)
Comment 36•7 years ago
|
||
(In reply to Ryan Hunt [:rhunt] from comment #35)
> The diagnostic patch was added and removed in the 63 nightly cycle so no
> uplifting should be necessary for removing the diagnostic from any of our
> branches.
Thanks, adjusting flags accordingly then.
Updated•7 years ago
|
Comment 37•7 years ago
|
||
Adding Jessie (new graphics engineering manager) to all sec-crit and sec-high graphics bugs
Updated•7 years ago
|
status-firefox65:
--- → fix-optional
status-firefox66:
--- → fix-optional
Comment 38•6 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:rhunt, maybe it's time to close this bug?
Flags: needinfo?(rhunt)
| Assignee | ||
Comment 39•6 years ago
|
||
Looking at the crash table, it seems like this is very low frequency but is still probably not fixed.
Flags: needinfo?(rhunt)
Updated•6 years ago
|
Comment 40•5 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:rhunt, maybe it's time to close this bug?
Flags: needinfo?(rhunt)
| Assignee | ||
Updated•5 years ago
|
Flags: needinfo?(rhunt)
Updated•3 years ago
|
Severity: critical → S2
Comment 42•2 years ago
|
||
Haven't seen this in current builds
Group: gfx-core-security
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
Updated•2 years ago
|
Keywords: leave-open
Comment 43•2 years ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.
Keywords: stalled
You need to log in
before you can comment on or make changes to this bug.
Description
•