Closed Bug 1909367 Opened 1 year ago Closed 1 month ago

Poison crash in [@ gfxFontEntry::FontTableBlobData::ManageHashEntry]

Categories

(Core :: Graphics: Text, defect)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
142 Branch
Tracking Status
firefox-esr115 --- fixed
firefox-esr128 --- fixed
firefox-esr140 --- fixed
firefox129 --- wontfix
firefox130 + wontfix
firefox131 --- wontfix
firefox132 --- wontfix
firefox133 --- wontfix
firefox134 --- wontfix
firefox135 --- wontfix
firefox141 --- fixed
firefox142 --- fixed

People

(Reporter: mccr8, Assigned: jfkthame)

References

Details

(Keywords: crash, csectype-uaf, sec-moderate, Whiteboard: [adv-main130+r])

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/52599a1f-b661-475a-8bde-06a1a0240720

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames:

0  libxul.so  gfxFontEntry::FontTableBlobData::ManageHashEntry(nsTHashtable<gfxFontEntry::F...  gfx/thebes/gfxFontEntry.cpp:436
0  libxul.so  gfxFontEntry::FontTableHashEntry::ShareTableAndGetBlob(nsTArray<unsigned char...  gfx/thebes/gfxFontEntry.cpp:486
0  libxul.so  gfxFontEntry::ShareFontTableAndGetBlob(unsigned int, nsTArray<unsigned char>*)  gfx/thebes/gfxFontEntry.cpp:568
0  libxul.so  gfxFontEntry::GetFontTable(unsigned int)  gfx/thebes/gfxFontEntry.cpp:589
1  libxul.so  gfxFontEntry::AutoTable::AutoTable(gfxFontEntry*, unsigned int)  gfx/thebes/gfxFontEntry.h:376
1  libxul.so  gfxFontEntry::UnitsPerEm()  gfx/thebes/gfxFontEntry.cpp:270
2  libxul.so  gfxFT2FontBase::gfxFT2FontBase(RefPtr<mozilla::gfx::UnscaledFontFreeType> con...  gfx/thebes/gfxFT2FontBase.cpp:36
3  libxul.so  gfxFT2Font::gfxFT2Font(RefPtr<mozilla::gfx::UnscaledFontFreeType> const&, Ref...  gfx/thebes/gfxFT2Fonts.cpp:157
4  libxul.so  FT2FontEntry::CreateFontInstance(gfxFontStyle const*)  gfx/thebes/gfxFT2FontList.cpp:240
5  libxul.so  gfxFontEntry::FindOrMakeFont(gfxFontStyle const*, gfxCharacterMap*)  gfx/thebes/gfxFontEntry.cpp:250

This looks like another DOM worker font issue, though a few of the crashes are on the main thread. There are 24 such crashes in the last 6 months, which is a pretty solid volume. Notable that aside from a single ESR115 crash (bp-46a7b498-d822-46bb-a112-a3cd30240311) these are all on 126 and later. I don't know if that's a regression or a signature change. Mostly Android, but the ESR115 crash and another one (bp-26526df4-b754-46df-975a-6fda20240528) are on desktop.

I didn't see any other signatures in the last 6 months for crashes on poison values where the proto signature contained ShareFontTableAndGetBlob so probably not a signature change?

No longer blocks: gfx-triage

Jonathan could you take a look at this one?

Flags: needinfo?(jfkthame)
Flags: needinfo?(lsalzman)

It looks like we must be accessing a deleted entry in the gfxFontEntry::mFontTableCache hashtable. It's not clear to me how this would happen (and without a testcase or pernosco trace, it's difficult to confirm anything...), but one thing I think we can do is to simplify the locking around this hashtable, such that it is annotated with MOZ_GUARDED_BY(mLock) in the entry, and it's clear that all access is appropriately protected, instead of the atomic-pointer dance that we currently do.

I'll put up a patch for this, but without the ability to reproduce the crash, it's hard to say whether it will help. (So I think we should leave this open until we have a chance to see the effect.) If nothing else, it should make it easier to reason about the code if we do still see crashes happening in the wild.

Flags: needinfo?(jfkthame)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED

Comment on attachment 9416850 [details]
Bug 1909367 - Simplify & annotate locking around mFontTableCache. r=#gfx-reviewers

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Not easily; unclear that the issue is exploitable -- UAF of poisoned memory that results in a content-process crash
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which branches (beta, release, and/or ESR) are affected by this flaw, and do the release status flags reflect this affected/unaffected state correctly?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: Should be trivial transplant
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely to cause regressions - just cleaning up locking implementation
  • Is the patch ready to land after security approval is given?: Yes
  • Is Android affected?: Yes
Attachment #9416850 - Flags: sec-approval?
Keywords: leave-open

I don't know if this helps at all, but I did see that in the crash report in comment 0 that thread 12 (the main Gecko thread I think) is holding a lock doing font stuff:

mozilla::detail::RWLockImpl::readLock() 	mozglue/misc/RWLock_posix.cpp:38
mozilla::RWLock::ReadLock() 	xpcom/threads/RWLock.h:60
mozilla::BaseAutoReadLock<mozilla::RWLock>::BaseAutoReadLock(mozilla::RWLock&) 	xpcom/threads/RWLock.h:111
gfxFT2FontEntryBase::GetGlyph(unsigned int, gfxFT2FontBase*) 	gfx/thebes/gfxFT2FontBase.cpp:101
gfxFT2FontBase::GetGlyph(unsigned int) 	gfx/thebes/gfxFT2FontBase.h:51
gfxFT2FontBase::GetCharExtents(unsigned int, double*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits, double>*) 	gfx/thebes/gfxFT2FontBase.cpp:176
gfxFT2FontBase::InitMetrics() 	gfx/thebes/gfxFT2FontBase.cpp:473

It might be worth checking out other crash reports as occasionally they do capture a race.

I don't think there's anything suspicious about that... what we see is thread 12 wanting to take a (non-exclusive) lock on the gfxFontEntry at https://hg.mozilla.org/releases/mozilla-release/file/14b32d530926c66251f10fe2410184f7cc7de839/gfx/thebes/gfxFT2FontBase.cpp#l101, but it has to wait because the worker thread (39) took an exclusive lock at https://hg.mozilla.org/releases/mozilla-release/file/14b32d530926c66251f10fe2410184f7cc7de839/gfx/thebes/gfxFontEntry.cpp#l556 in order to update the table cache. So AFAICS things are working as expected.

Quite a few of the reports do show multiple threads (main and worker, or multiple DOM Worker threads) doing font stuff at the time of the crash, but all those I looked at seemed like normal, legitimate usage patterns.

Comment on attachment 9416850 [details]
Bug 1909367 - Simplify & annotate locking around mFontTableCache. r=#gfx-reviewers

sec-approval+ = dveditz to land in Firefox 130

Attachment #9416850 - Flags: sec-approval? → sec-approval+
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ec2fc35658b2 Simplify & annotate locking around mFontTableCache. r=gfx-reviewers,lsalzman
Backout by nerli@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d12c1ba0fbdc Backed out changeset ec2fc35658b2 for causing mochitest failures on RWLock_posix.cpp CLOSED TREE
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5bafded60869 Simplify & annotate locking around mFontTableCache. r=gfx-reviewers,lsalzman

:jfkthame are you landing anything else here, asking since there is a leave-open keyword on the bug.
If not, can this be resolved and get uplift request for beta, esr128, and esr115?

Flags: needinfo?(jfkthame)

(In reply to Donal Meehan [:dmeehan] from comment #13)

:jfkthame are you landing anything else here, asking since there is a leave-open keyword on the bug.
If not, can this be resolved and get uplift request for beta, esr128, and esr115?

I'm skeptical whether the patch here will make any difference; waiting to see what crash-stats shows. (Hopefully if the crashes do continue, it will at least be easier to reason about the code, but currently we don't really know the root cause here.)

So no, I don't have anything further to land at this point, but I was hesitant to resolve the bug without any assurance that we've fixed the issue.

Uplifting to Beta, at least, would help us see sooner whether anything has changed, given that the crash rate is too low to get any useful signal from Nightly. Checking just now, I see one single report from 127-beta, everything else is from release channels. But I don't know how you feel about uplifting a speculative patch for an unresolved bug?

Flags: needinfo?(jfkthame) → needinfo?(dmeehan)

(In reply to Jonathan Kew [:jfkthame] from comment #14)

(In reply to Donal Meehan [:dmeehan] from comment #13)

:jfkthame are you landing anything else here, asking since there is a leave-open keyword on the bug.
If not, can this be resolved and get uplift request for beta, esr128, and esr115?

I'm skeptical whether the patch here will make any difference; waiting to see what crash-stats shows. (Hopefully if the crashes do continue, it will at least be easier to reason about the code, but currently we don't really know the root cause here.)

So no, I don't have anything further to land at this point, but I was hesitant to resolve the bug without any assurance that we've fixed the issue.

Uplifting to Beta, at least, would help us see sooner whether anything has changed, given that the crash rate is too low to get any useful signal from Nightly. Checking just now, I see one single report from 127-beta, everything else is from release channels. But I don't know how you feel about uplifting a speculative patch for an unresolved bug?

Makes sense to hold back until you're ready to uplift.
Though I'll defer to :dveditz in terms of the timing. It's a sec-high and has landed in central, so it would need to be uplifted at some point?

Flags: needinfo?(dmeehan) → needinfo?(dveditz)

I support uplifting this to beta, but not resolving the bug and watching what happens with fx130

Flags: needinfo?(dveditz)

Clearing Lee's needinfo: it looks like it was for comment 2 which Jonathan answered

Flags: needinfo?(lsalzman)

The severity field is not set for this bug.
:lsalzman, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(lsalzman)
Flags: needinfo?(lsalzman) → needinfo?(jfkthame)

I don't think this is higher than an S3, given the low crash rate and the fact that while the underlying issue is likely some kind of UAF, the memory being poisoned turns it into a safe crash afaics.

Severity: -- → S3
Flags: needinfo?(jfkthame)

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:jfkthame, could you consider increasing the severity of this security bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(jfkthame)

The bug is marked as tracked for firefox130 (beta) and tracked for firefox131 (nightly). However, the bug still has low severity.

:bhood, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(bhood)

I think S3 is reasonable for now; see comment 19.

Flags: needinfo?(jfkthame)

(In reply to Jonathan Kew [:jfkthame] from comment #19)

I don't think this is higher than an S3, given the low crash rate and the fact that while the underlying issue is likely some kind of UAF, the memory being poisoned turns it into a safe crash afaics.

Nothing we know of is preventing a new object from being allocated at the same address before the use, in which case the poisoning would not help.

That being said, these kinds of crash reports are often not very actionable, though a patch has landed, so I don't know if S2 really makes sense for them.

Did you want to uplift the patch to Beta so we can see how Fx130 fares when it ships in a couple weeks? Please nominate ASAP if so.

Flags: needinfo?(jfkthame)

I was trying to uplift this, but I think I was confused.... AFAICT the patch did land in 130 already. It hit mozilla-central on Aug 3rd (comment 12), and the version bump to 131 would have been Aug 5th (according to https://whattrainisitnow.com/calendar/). And when I switch to beta on my mozilla-unified checkout, I see the changes in place.

So no uplift needed, I think; this will be going out in Fx130 as things stand.

Flags: needinfo?(jfkthame)
Whiteboard: [adv-main130-]
Whiteboard: [adv-main130-] → [adv-main130+r]

@jfkthame: Should this still be leave-open?
Also, based on your reasoning for S3 instead of S2, should this be sec-moderate, or are you convinced of S2+sec-high per @mccr8 in Comment 23?

Flags: needinfo?(jfkthame)

It seems like this is probably a race, so I think we can bump it down to sec-moderate due to a likely difficulty of exploitation. Although I think we have had reliable TSan test cases involving font stuff before so maybe that's optimistic.

Keywords: sec-highsec-moderate
Group: gfx-core-security → core-security-release

(In reply to Kelsey Gilbert [:jgilbert] from comment #26)

@jfkthame: Should this still be leave-open?

I see that we now have a couple of crash reports from Fx130, indicating (as I feared, per comment 14) the patch that landed here to simplify/clarify the code did not actually prevent the problem. So yes, this should remain open, and we need to take another look to try and understand exactly how it's happening.

Flags: needinfo?(jfkthame)

I still see crashes in 130, but no crashes in 131 or later yet? No changes to gfxFontEntry.cpp in 131 so who knows.

I suspect the underlying cause here might be the data race in bug 1970422. If that's the case, the pending patch there (just awaiting sec-approval) may finally put an end to these crashes.

See Also: → 1970422

I think another potential cause here is bug 1976782. Between that and bug 1970422, I'm hopeful this may be resolved, but I guess only time will tell as we don't have clear STR.

(In reply to Jonathan Kew [:jfkthame] from comment #31)

I suspect the underlying cause here might be the data race in bug 1970422. If that's the case, the pending patch there (just awaiting sec-approval) may finally put an end to these crashes.

Following up two months later: looks like crash data has borne out this^ theory.

The latest crash here was in Firefox 140.0.4 -- no crashes in 141 or newer (and bug 1970422's fix happens to have been first-released in v141).

Let's call this WFM with dependency on bug 1970422. Please reopen if we see a crash report in a build that includes bug 1970422's fix.

Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Depends on: 1970422
Resolution: --- → WORKSFORME
Resolution: WORKSFORME → FIXED
See Also: 1970422
Target Milestone: --- → 142 Branch
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: