Poison crash in [@ gfxFontEntry::FontTableBlobData::ManageHashEntry]
Categories
(Core :: Graphics: Text, defect)
Tracking
()
People
(Reporter: mccr8, Assigned: jfkthame)
References
Details
(Keywords: crash, csectype-uaf, sec-moderate, Whiteboard: [adv-main130+r])
Crash Data
Attachments
(1 file)
|
48 bytes,
text/x-phabricator-request
|
dveditz
:
sec-approval+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/52599a1f-b661-475a-8bde-06a1a0240720
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames:
0 libxul.so gfxFontEntry::FontTableBlobData::ManageHashEntry(nsTHashtable<gfxFontEntry::F... gfx/thebes/gfxFontEntry.cpp:436
0 libxul.so gfxFontEntry::FontTableHashEntry::ShareTableAndGetBlob(nsTArray<unsigned char... gfx/thebes/gfxFontEntry.cpp:486
0 libxul.so gfxFontEntry::ShareFontTableAndGetBlob(unsigned int, nsTArray<unsigned char>*) gfx/thebes/gfxFontEntry.cpp:568
0 libxul.so gfxFontEntry::GetFontTable(unsigned int) gfx/thebes/gfxFontEntry.cpp:589
1 libxul.so gfxFontEntry::AutoTable::AutoTable(gfxFontEntry*, unsigned int) gfx/thebes/gfxFontEntry.h:376
1 libxul.so gfxFontEntry::UnitsPerEm() gfx/thebes/gfxFontEntry.cpp:270
2 libxul.so gfxFT2FontBase::gfxFT2FontBase(RefPtr<mozilla::gfx::UnscaledFontFreeType> con... gfx/thebes/gfxFT2FontBase.cpp:36
3 libxul.so gfxFT2Font::gfxFT2Font(RefPtr<mozilla::gfx::UnscaledFontFreeType> const&, Ref... gfx/thebes/gfxFT2Fonts.cpp:157
4 libxul.so FT2FontEntry::CreateFontInstance(gfxFontStyle const*) gfx/thebes/gfxFT2FontList.cpp:240
5 libxul.so gfxFontEntry::FindOrMakeFont(gfxFontStyle const*, gfxCharacterMap*) gfx/thebes/gfxFontEntry.cpp:250
This looks like another DOM worker font issue, though a few of the crashes are on the main thread. There are 24 such crashes in the last 6 months, which is a pretty solid volume. Notable that aside from a single ESR115 crash (bp-46a7b498-d822-46bb-a112-a3cd30240311) these are all on 126 and later. I don't know if that's a regression or a signature change. Mostly Android, but the ESR115 crash and another one (bp-26526df4-b754-46df-975a-6fda20240528) are on desktop.
| Reporter | ||
Comment 1•1 year ago
|
||
I didn't see any other signatures in the last 6 months for crashes on poison values where the proto signature contained ShareFontTableAndGetBlob so probably not a signature change?
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
| Assignee | ||
Comment 3•1 year ago
|
||
It looks like we must be accessing a deleted entry in the gfxFontEntry::mFontTableCache hashtable. It's not clear to me how this would happen (and without a testcase or pernosco trace, it's difficult to confirm anything...), but one thing I think we can do is to simplify the locking around this hashtable, such that it is annotated with MOZ_GUARDED_BY(mLock) in the entry, and it's clear that all access is appropriately protected, instead of the atomic-pointer dance that we currently do.
I'll put up a patch for this, but without the ability to reproduce the crash, it's hard to say whether it will help. (So I think we should leave this open until we have a chance to see the effect.) If nothing else, it should make it easier to reason about the code if we do still see crashes happening in the wild.
| Assignee | ||
Comment 4•1 year ago
|
||
Updated•1 year ago
|
| Assignee | ||
Comment 5•1 year ago
|
||
Comment on attachment 9416850 [details]
Bug 1909367 - Simplify & annotate locking around mFontTableCache. r=#gfx-reviewers
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Not easily; unclear that the issue is exploitable -- UAF of poisoned memory that results in a content-process crash
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which branches (beta, release, and/or ESR) are affected by this flaw, and do the release status flags reflect this affected/unaffected state correctly?: all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: Should be trivial transplant
- How likely is this patch to cause regressions; how much testing does it need?: Unlikely to cause regressions - just cleaning up locking implementation
- Is the patch ready to land after security approval is given?: Yes
- Is Android affected?: Yes
| Assignee | ||
Updated•1 year ago
|
| Reporter | ||
Comment 6•1 year ago
|
||
I don't know if this helps at all, but I did see that in the crash report in comment 0 that thread 12 (the main Gecko thread I think) is holding a lock doing font stuff:
mozilla::detail::RWLockImpl::readLock() mozglue/misc/RWLock_posix.cpp:38
mozilla::RWLock::ReadLock() xpcom/threads/RWLock.h:60
mozilla::BaseAutoReadLock<mozilla::RWLock>::BaseAutoReadLock(mozilla::RWLock&) xpcom/threads/RWLock.h:111
gfxFT2FontEntryBase::GetGlyph(unsigned int, gfxFT2FontBase*) gfx/thebes/gfxFT2FontBase.cpp:101
gfxFT2FontBase::GetGlyph(unsigned int) gfx/thebes/gfxFT2FontBase.h:51
gfxFT2FontBase::GetCharExtents(unsigned int, double*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits, double>*) gfx/thebes/gfxFT2FontBase.cpp:176
gfxFT2FontBase::InitMetrics() gfx/thebes/gfxFT2FontBase.cpp:473
It might be worth checking out other crash reports as occasionally they do capture a race.
| Assignee | ||
Comment 7•1 year ago
|
||
I don't think there's anything suspicious about that... what we see is thread 12 wanting to take a (non-exclusive) lock on the gfxFontEntry at https://hg.mozilla.org/releases/mozilla-release/file/14b32d530926c66251f10fe2410184f7cc7de839/gfx/thebes/gfxFT2FontBase.cpp#l101, but it has to wait because the worker thread (39) took an exclusive lock at https://hg.mozilla.org/releases/mozilla-release/file/14b32d530926c66251f10fe2410184f7cc7de839/gfx/thebes/gfxFontEntry.cpp#l556 in order to update the table cache. So AFAICS things are working as expected.
Quite a few of the reports do show multiple threads (main and worker, or multiple DOM Worker threads) doing font stuff at the time of the crash, but all those I looked at seemed like normal, legitimate usage patterns.
Comment 8•1 year ago
|
||
Comment on attachment 9416850 [details]
Bug 1909367 - Simplify & annotate locking around mFontTableCache. r=#gfx-reviewers
sec-approval+ = dveditz to land in Firefox 130
Updated•1 year ago
|
Comment 10•1 year ago
|
||
Comment 11•1 year ago
|
||
Comment 12•1 year ago
|
||
Updated•1 year ago
|
Comment 13•1 year ago
|
||
:jfkthame are you landing anything else here, asking since there is a leave-open keyword on the bug.
If not, can this be resolved and get uplift request for beta, esr128, and esr115?
| Assignee | ||
Comment 14•1 year ago
|
||
(In reply to Donal Meehan [:dmeehan] from comment #13)
:jfkthame are you landing anything else here, asking since there is a
leave-openkeyword on the bug.
If not, can this be resolved and get uplift request for beta, esr128, and esr115?
I'm skeptical whether the patch here will make any difference; waiting to see what crash-stats shows. (Hopefully if the crashes do continue, it will at least be easier to reason about the code, but currently we don't really know the root cause here.)
So no, I don't have anything further to land at this point, but I was hesitant to resolve the bug without any assurance that we've fixed the issue.
Uplifting to Beta, at least, would help us see sooner whether anything has changed, given that the crash rate is too low to get any useful signal from Nightly. Checking just now, I see one single report from 127-beta, everything else is from release channels. But I don't know how you feel about uplifting a speculative patch for an unresolved bug?
Comment 15•1 year ago
|
||
(In reply to Jonathan Kew [:jfkthame] from comment #14)
(In reply to Donal Meehan [:dmeehan] from comment #13)
:jfkthame are you landing anything else here, asking since there is a
leave-openkeyword on the bug.
If not, can this be resolved and get uplift request for beta, esr128, and esr115?I'm skeptical whether the patch here will make any difference; waiting to see what crash-stats shows. (Hopefully if the crashes do continue, it will at least be easier to reason about the code, but currently we don't really know the root cause here.)
So no, I don't have anything further to land at this point, but I was hesitant to resolve the bug without any assurance that we've fixed the issue.
Uplifting to Beta, at least, would help us see sooner whether anything has changed, given that the crash rate is too low to get any useful signal from Nightly. Checking just now, I see one single report from 127-beta, everything else is from release channels. But I don't know how you feel about uplifting a speculative patch for an unresolved bug?
Makes sense to hold back until you're ready to uplift.
Though I'll defer to :dveditz in terms of the timing. It's a sec-high and has landed in central, so it would need to be uplifted at some point?
Comment 16•1 year ago
|
||
I support uplifting this to beta, but not resolving the bug and watching what happens with fx130
Comment 17•1 year ago
|
||
Clearing Lee's needinfo: it looks like it was for comment 2 which Jonathan answered
Comment 18•1 year ago
|
||
The severity field is not set for this bug.
:lsalzman, could you have a look please?
For more information, please visit BugBot documentation.
Updated•1 year ago
|
| Assignee | ||
Comment 19•1 year ago
|
||
I don't think this is higher than an S3, given the low crash rate and the fact that while the underlying issue is likely some kind of UAF, the memory being poisoned turns it into a safe crash afaics.
Comment 20•1 year ago
|
||
The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:jfkthame, could you consider increasing the severity of this security bug?
For more information, please visit BugBot documentation.
Comment 21•1 year ago
|
||
The bug is marked as tracked for firefox130 (beta) and tracked for firefox131 (nightly). However, the bug still has low severity.
:bhood, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 22•1 year ago
|
||
I think S3 is reasonable for now; see comment 19.
| Reporter | ||
Comment 23•1 year ago
|
||
(In reply to Jonathan Kew [:jfkthame] from comment #19)
I don't think this is higher than an S3, given the low crash rate and the fact that while the underlying issue is likely some kind of UAF, the memory being poisoned turns it into a safe crash afaics.
Nothing we know of is preventing a new object from being allocated at the same address before the use, in which case the poisoning would not help.
That being said, these kinds of crash reports are often not very actionable, though a patch has landed, so I don't know if S2 really makes sense for them.
Comment 24•1 year ago
|
||
Did you want to uplift the patch to Beta so we can see how Fx130 fares when it ships in a couple weeks? Please nominate ASAP if so.
Updated•1 year ago
|
| Assignee | ||
Comment 25•1 year ago
|
||
I was trying to uplift this, but I think I was confused.... AFAICT the patch did land in 130 already. It hit mozilla-central on Aug 3rd (comment 12), and the version bump to 131 would have been Aug 5th (according to https://whattrainisitnow.com/calendar/). And when I switch to beta on my mozilla-unified checkout, I see the changes in place.
So no uplift needed, I think; this will be going out in Fx130 as things stand.
| Reporter | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 26•1 year ago
|
||
@jfkthame: Should this still be leave-open?
Also, based on your reasoning for S3 instead of S2, should this be sec-moderate, or are you convinced of S2+sec-high per @mccr8 in Comment 23?
| Reporter | ||
Comment 27•1 year ago
|
||
It seems like this is probably a race, so I think we can bump it down to sec-moderate due to a likely difficulty of exploitation. Although I think we have had reliable TSan test cases involving font stuff before so maybe that's optimistic.
Updated•1 year ago
|
Updated•1 year ago
|
| Assignee | ||
Comment 28•1 year ago
|
||
(In reply to Kelsey Gilbert [:jgilbert] from comment #26)
@jfkthame: Should this still be leave-open?
I see that we now have a couple of crash reports from Fx130, indicating (as I feared, per comment 14) the patch that landed here to simplify/clarify the code did not actually prevent the problem. So yes, this should remain open, and we need to take another look to try and understand exactly how it's happening.
Updated•1 year ago
|
Updated•1 year ago
|
| Reporter | ||
Comment 29•1 year ago
•
|
||
I still see crashes in 130, but no crashes in 131 or later yet? No changes to gfxFontEntry.cpp in 131 so who knows.
Updated•1 year ago
|
| Reporter | ||
Comment 30•1 year ago
|
||
Still seeing a few poison crashes in 133.
bp-189c1a66-3a60-4d62-a008-54f660241214
bp-a7ead77d-58ab-44bf-bb9c-8b0660241211
bp-76f5976e-db6d-4858-a197-29df10241210
| Assignee | ||
Comment 31•6 months ago
|
||
I suspect the underlying cause here might be the data race in bug 1970422. If that's the case, the pending patch there (just awaiting sec-approval) may finally put an end to these crashes.
Updated•4 months ago
|
Updated•4 months ago
|
| Assignee | ||
Comment 32•4 months ago
|
||
I think another potential cause here is bug 1976782. Between that and bug 1970422, I'm hopeful this may be resolved, but I guess only time will tell as we don't have clear STR.
Updated•4 months ago
|
Comment 33•1 month ago
|
||
(In reply to Jonathan Kew [:jfkthame] from comment #31)
I suspect the underlying cause here might be the data race in bug 1970422. If that's the case, the pending patch there (just awaiting sec-approval) may finally put an end to these crashes.
Following up two months later: looks like crash data has borne out this^ theory.
The latest crash here was in Firefox 140.0.4 -- no crashes in 141 or newer (and bug 1970422's fix happens to have been first-released in v141).
Let's call this WFM with dependency on bug 1970422. Please reopen if we see a crash report in a build that includes bug 1970422's fix.
Updated•1 month ago
|
Updated•1 month ago
|
Updated•4 days ago
|
Description
•