AddressSanitizer: heap-use-after-free and attempting-double-free in CrashStatsLogForwarder (potentially due to race-condition)
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: sourc7, Assigned: tnikkel)
Details
(Keywords: csectype-race, reporter-external, sec-high, Whiteboard: [client-bounty-form][adv-main145+][adv-ESR140.5+][adv-ESR115.30+])
Attachments
(8 files, 1 obsolete file)
|
34.57 KB,
text/plain
|
Details | |
|
32.01 KB,
text/plain
|
Details | |
|
32.26 KB,
text/plain
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
tjr
:
sec-approval+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr140+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr115+
|
Details | Review |
|
71 bytes,
text/plain
|
Details |
Summary
When simulating swgl gl.cc new_buf realloc failure as on my previous bug 1975837, it able to generate a lot of Crash Annotation GraphicsCriticalError on log stdout, interestingly in rare case it able to hit AddressSanitizer: heap-use-after-free and attempting-double-free in CrashStatsLogForwarder.
Analysis
From the ASan log and local LLM analysis, the crash is probably caused by unsynchronised access/data race between main thread (T0) and render thread (T52) to a shared std::vector<std::string> that lives in CrashStatsLogForwarder instance, as follow:
- Render thread (T52)
- Creates a temporary std::string (S₁).
- Stores that string into an element of
CrashStatsLogForwarder::mStrings. The vector now owns S₁ and its internal character buffer.
- Main thread (T0)
- Calls
CrashStatsLogForwarder::UpdateStringsVector(const std::string&)with a different string (S₂). - Inside that function the code assigns (S₂) into the same vector slot that already holds (S₁)
- The assignment also triggers the destructor of the old string (
std::string::_Rep::_M_dispose). - That destructor calls
operator deleteto free the character buffer that belonged to S₁.
- Render thread (T52) (later)
- Still iterating over mStrings in UpdateCrashReport() and tries to write S₁ to an
std::ostream. - The stream’s
operator<<internally callsstd::char_traits<char>::copy, which usesmemcpy(). - It now read/copies (S₁) from the dangling buffer that was already freed by the main thread, resulting the ASan "heap‑use‑after‑free".
I'm able to reproduce the UAF and double-free on Firefox ESR140 build (commit 9571910, Sep 27 2025) and Firefox ESR128 build (commit 4442784, Jul 3 2025)
I haven't receive the UAF or double-free report on Firefox Nightly build yet, probably due to new assertion Hit MOZ_CRASH(bug: unable to lock indirect) at gfx/wr/webrender/src/compositor/sw_compositor.rs:164, so it crash early and won't generate a lot Crash Annotation GraphicsCriticalError log.
| Reporter | ||
Comment 1•8 months ago
|
||
| Reporter | ||
Comment 2•8 months ago
|
||
Updated•8 months ago
|
Comment 3•8 months ago
|
||
Looks like CrashStatsLogForwarder::UpdateStringsVector doesn't acquire mMutex like CrashStatsLogForwarder::Log does, which seems bad. The latter can call the former, in which case the lock is acquired, but so can two Recv methods. It might be worth looking into the thread locking annotations.
The RecvGraphicsError could be easily triggered from a compromised content process, but I don't know about the other stack.
Comment 4•8 months ago
|
||
sec-high might be overkill but I'll err on the side of caution here.
Comment 5•8 months ago
|
||
Tim, is the referenced code an area you have any knowledge of? Or do you know who might be a good candidate to look at this?
Updated•8 months ago
|
Updated•8 months ago
|
| Assignee | ||
Comment 6•8 months ago
|
||
| Assignee | ||
Comment 7•8 months ago
|
||
Comment on attachment 9518021 [details]
(secure)
Security Approval Request
- How easily could an exploit be constructed based on the patch?: pretty easy to tell that the problem was a missing lock from the patch
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which branches (beta, release, and/or ESR) are affected by this flaw, and do the release status flags reflect this affected/unaffected state correctly?: all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?: easy
- How likely is this patch to cause regressions; how much testing does it need?: very unlikely
- Is the patch ready to land after security approval is given?: Yes
- Is Android affected?: Yes
Updated•8 months ago
|
Comment 8•7 months ago
|
||
Comment on attachment 9518021 [details]
(secure)
Approved to land and request uplift
FWIW for next time:
How easily could an exploit be constructed based on the patch?:
pretty easy to tell that the problem was a missing lock from the patch
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?:
No
The answer to the first question is really the answer to the second: 'No, these other things don't tell you, but it's pretty easy to tell we're missing a lock'. Calling out those things specifically is more of a reminder to check for them. The first question is more asking about how easy it is to trigger the bug, or reverse engineer a test case from the patch: Is it a difficult to trigger race condition, is the patch in a wildly different part of the code or is it right at the surface level of IPC or WebIDL? Is the exploit primitive very limited (one byte overwrite of a null byte) or very powerful (general UAF in the main allocation arena).
| Assignee | ||
Comment 9•7 months ago
|
||
Yes I understood the questions, I was not linking the two answers to those two questions in my head, but I guess it sounded like I was making some link between those two answers for someone reading them?
Comment 10•7 months ago
|
||
Comment 11•7 months ago
|
||
Comment 12•7 months ago
|
||
The patch landed in nightly and beta is affected.
:tnikkel, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- See https://wiki.mozilla.org/Release_Management/Requesting_an_Uplift for documentation on how to request an uplift.
- If no, please set
status-firefox145towontfix.
For more information, please visit BugBot documentation.
Comment 13•7 months ago
|
||
firefox-beta Uplift Approval Request
- User impact if declined: sec-high
- Code covered by automated testing: no
- Fix verified in Nightly: no
- Needs manual QE test: no
- Steps to reproduce for manual QE testing:
- Risk associated with taking this patch: low
- Explanation of risk level: puts variable access behind lock where it should be
- String changes made/needed: none
- Is Android affected?: yes
| Assignee | ||
Comment 14•7 months ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D267389
Comment 15•7 months ago
|
||
firefox-esr140 Uplift Approval Request
- User impact if declined: sec-high
- Code covered by automated testing: no
- Fix verified in Nightly: no
- Needs manual QE test: no
- Steps to reproduce for manual QE testing:
- Risk associated with taking this patch: low
- Explanation of risk level: puts variable access behind lock where it should be
- String changes made/needed: none
- Is Android affected?: yes
| Assignee | ||
Comment 16•7 months ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D267389
Comment 17•7 months ago
|
||
firefox-esr115 Uplift Approval Request
- User impact if declined: sec-high
- Code covered by automated testing: no
- Fix verified in Nightly: no
- Needs manual QE test: no
- Steps to reproduce for manual QE testing:
- Risk associated with taking this patch: low
- Explanation of risk level: puts variable access behind lock where it should be
- String changes made/needed: none
- Is Android affected?: yes
| Assignee | ||
Comment 18•7 months ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D267389
Comment 19•7 months ago
|
||
We're arguing between moderate and high for this race bug. We've decided to call it "high" on the low end for bounty purposes. We'll grant control of the messages from the child process, but if you can demonstrate the ability for a web content process to cause sufficient swgl errors to reliably trigger this race without a hacked process then we will consider increasing the bounty.
| Assignee | ||
Updated•7 months ago
|
Updated•7 months ago
|
Updated•7 months ago
|
Updated•7 months ago
|
Comment 20•7 months ago
|
||
| uplift | ||
Comment 21•7 months ago
•
|
||
Timothy, your patch does not apply to the esr115 branch, could your rebase/update it please? Thanks
| Assignee | ||
Comment 22•7 months ago
|
||
Sorry about that. Updated, should be good now.
Updated•7 months ago
|
Comment 23•7 months ago
|
||
| uplift | ||
Updated•7 months ago
|
Updated•7 months ago
|
Comment 24•7 months ago
|
||
| uplift | ||
Updated•7 months ago
|
Updated•7 months ago
|
Comment 25•7 months ago
|
||
Updated•7 months ago
|
Comment 26•7 months ago
|
||
Updated•6 months ago
|
Updated•9 days ago
|
Description
•