Use-after-free crash in [@ NS_GetContentList]
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
People
(Reporter: mccr8, Assigned: mccr8)
Details
(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [adv-main114+r][adv-esr102.12+r])
Crash Data
Attachments
(5 files)
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr102+
tjr
:
sec-approval+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr102+
tjr
:
sec-approval+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr102+
tjr
:
sec-approval+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
tjr
:
sec-approval+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr102+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/c962a61c-6553-4d2f-940f-ce5c30230407
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames of crashing thread:
0 libxul.so mozilla::RefPtrTraits<nsContentList>::AddRef mfbt/RefPtr.h:49
0 libxul.so RefPtr<nsContentList>::ConstRemovingRefPtrTraits<nsContentList>::AddRef mfbt/RefPtr.h:380
0 libxul.so RefPtr<nsContentList>::assign_with_AddRef mfbt/RefPtr.h:60
0 libxul.so RefPtr<nsContentList>::operator= mfbt/RefPtr.h:190
0 libxul.so NS_GetContentList dom/base/nsContentList.cpp:208
1 libxul.so mozilla::dom::Element::GetElementsByTagName dom/base/Element.cpp:669
2 libxul.so mozilla::dom::Element_Binding::getElementsByTagName dom/bindings/ElementBinding.cpp:2312
3 libxul.so mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions> dom/bindings/BindingUtils.cpp:3318
4 libxul.so CallJSNative js/src/vm/Interpreter.cpp:459
4 libxul.so js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:547
This is happening on this line: if (entry) list = entry->mContentList;
I think this shows up as [@ ContentListHashtableMatchEntry ] in ESR 102: bp-a0fd6c63-bc5b-4281-b4df-b28640230328
NS_GetContentList adds things to a hash table, gContentListHashTable, using nsContentList::MatchesKey(), which does something like cache a set of live content lists. This hash table uses a raw pointer to the lists it is caching.
When an nsContentList dies, it removes itself using RemoveFromHashtable(). However, the removal is not done on the value of the pointer to the nsContentList, because this hash table is actually keyed to nsContentListKey, which is based on the fields mRootNode, mMatchNameSpaceId, mXMLMatchAtom, and mIsHTMLDocument. If any of those fields change between the time we add the nsContentList and when the nsContentList dies, I think we'll fail to remove the nsContentList and we can end up with a UAF later.
I audited the places that change those fields.
- mXMLMatchAtom, mMatchNameSpaceId and mIsHTMLDocument never change.
- mRootNode is cleared in nsContentList::NodeWillBeDestroyed() and nsContentList::LastRelease(), but RemoveFromCaches() is called first.
- nsLabelsNodeList::MaybeResetRoot changes the value of mRootNode, but does NOT clear the cache. It looks like this was added in bug 556743, which is much later than the cache was added so it wouldn't be too surprising if this had been overlooked.
So maybe we need to call RemoveFromCaches() in nsLabelsNodeList::MaybeResetRoot?
Assignee | ||
Comment 1•2 years ago
|
||
It feels rather fragile that you have to audit every single place 4 different fields are used to ensure the cache is okay. I don't know if there's a better way to protect against modification. At least throwing some consts around might help.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 2•2 years ago
|
||
This isn't a frequent crash. It has happened about 7 times in the last month.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 4•2 years ago
|
||
On second thought, I think my theory is wrong, because we can only actually cache nsContentList (not a subclass), which means you can't call nsLabelsNodeList::MaybeResetRoot on a node list in the cache.
Assignee | ||
Comment 5•2 years ago
|
||
We could try rewriting this to not use PLDHashtable, which is old and weird. Nika pointed out that we end up using the clear entry stub to initialize the entry type, instead of a ctor. Maybe that would clear out some issues.
I was also wondering if it is possible that the code in between when we look up an entry and when we use it could result in a GC that might somehow shrink the hashtable and invalidate our entry pointer. I'm not even sure if that would lead to this kind of UAF though.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 6•2 years ago
|
||
I was looking over this code and I was wondering why this would be happening for gContentListHashTable when it wasn't happening for gFuncStringContentListHashTable, which is basically the same thing. Well, it looks like it does happen for that, too, though I only see 2 UAFs in the last 6 months, so it is rarer:
[@ GetFuncStringContentList<T> ]
bp-261c1724-5895-4b8a-8758-daa5c0230328
bp-b365dbdb-f47d-4b9f-903d-78bea0230416
Assignee | ||
Comment 7•2 years ago
|
||
This isn't necessary for the later parts, but I think it makes
it easier to understand.
Assignee | ||
Comment 8•2 years ago
|
||
Assignee | ||
Comment 9•2 years ago
|
||
In the next patch, I'm adding a field to nsContentList that I want these hash
table entries to be able to access, so I'm making them inner classes here.
I've also renamed them to get rid of some redundancy, because
otherwise the qualified names would be long.
Assignee | ||
Comment 10•2 years ago
|
||
Add a new field to nsContentList that should be true if and only if
that list is in one of the hashtable caches.
The entry destructor release assert double checks that any list it contains
knew it was in an entry. This should be superfluous as long as
SetContentList is the only place that sets HashEntry::mContentList to a non-null
value, and nsContentList::mInHashtable to true.
The first SetContentList release assert ensures that the entry starts out empty,
because if it wasn't we'd need to tell the current list that it is being
removed from the cache entry.
The second SetContentList release assert ensures that the list is never in more
than one entry at the same time, which allows us to make mInHashtable a boolean
and not an integer.
The release asserts in the remove-from-hashtable methods ensure that the list
isn't in a hashtable afterwards. These are done as two separate checks to try to
help with debugging if we see these asserts get hit. These methods are called
from the destructor, and other places related to tearing down a content list.
Assignee | ||
Comment 11•2 years ago
|
||
Hopefully these checks aren't overkill. I tried to make sure that the fast path in the caches won't be affected.
The basic idea here is that it should crash immediately whenever one of the caches contains a stale pointer. Then we can hopefully figure out what is going wrong and fix it. This will likely increase the crash rate, because right now we only crash if we actually use the cache entry containing the stale pointer. The current UAF rate is low, so hopefully it won't be an issue for stability.
The content list cache has an MRU cache sitting in front of it. I didn't add any checking for it because I didn't see any UAFs in that cache. I could imagine that UAFs would be rarer because lists tend to spend a lot less time in that cache, because of the limited size.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 12•2 years ago
|
||
Comment on attachment 9333229 [details]
Bug 1830795, part 1 - Convert gContentListHashTable to an nsTHashtable.
Security Approval Request
- How easily could an exploit be constructed based on the patch?: It should be pretty clear that this is trying to deal with UAFs from weak pointers, but my patches work by dynamically detecting them, so there's no indication how you actually get in that state. Though crashes that might start showing up could indicate that.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Yes
- Which older supported branches are affected by this flaw?: all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: This code has changed very little, with some of it dating back to 2002 so I think it should be easy to backport.
- How likely is this patch to cause regressions; how much testing does it need?: This patch will (hopefully) change us from crashing when we use a dead pointer in the cache to crashing immediately when a pointer in the cache becomes dead, which could increase the crash rate quite a bit. That said, the baseline crash rate is very low. Ideally the crashes will allow followups that fix whatever is causing a stale object to not be removed from the cache. In the worst case, I could implement a fallback path that scans the entire cache and nulls out matching pointers instead of crashing when a dead pointer is detected.
- Is Android affected?: Yes
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 13•2 years ago
|
||
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Yes
Comments highlight the fact that I'm implementing a dynamic lifetime checker for this weak pointer, but I think it is obvious enough without the comments that I think it is better to have the comments. There's no indication of how we can get into that state, because I have no idea what causes it.
Also, only part 4 should change the behavior to fix the security issues. The other 3 clean up ancient code (some of it dates to 2002) to make it easier to do the fix. This code changes so little I think it is better to try to backport all of them rather than spinning out a smaller version of part 4 that is harder to understand.
Comment 14•2 years ago
|
||
FYI, Part 4 needs rebasing for ESR.
Assignee | ||
Comment 15•2 years ago
|
||
Add a new field to nsContentList that should be true if and only if
that list is in one of the hashtable caches.
The entry destructor release assert double checks that any list it contains
knew it was in an entry. This should be superfluous as long as
SetContentList is the only place that sets HashEntry::mContentList to a non-null
value, and nsContentList::mInHashtable to true.
The first SetContentList release assert ensures that the entry starts out empty,
because if it wasn't we'd need to tell the current list that it is being
removed from the cache entry.
The second SetContentList release assert ensures that the list is never in more
than one entry at the same time, which allows us to make mInHashtable a boolean
and not an integer.
The release asserts in the remove-from-hashtable methods ensure that the list
isn't in a hashtable afterwards. These are done as two separate checks to try to
help with debugging if we see these asserts get hit. These methods are called
from the destructor, and other places related to tearing down a content list.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 16•2 years ago
|
||
The new attachment https://phabricator.services.mozilla.com/D178322 is rebased for ESR102. I haven't actually tried to build it. The differences are that mNamedItemsCacheValid doesn't exist which broke some trivial things a tiny amount, and mIsLiveList is a uint8_t instead of a bool, so I changed mInHashtable to match it. That changed in bug 1771959. There's a comment "The booleans have to use uint8_t to pack with mState, because MSVC won't pack different typedefs together. Once we no longer have to worry about flushes in XML documents, we can go back to using bool for the booleans." so it sounds like it shouldn't matter much either way.
Comment 17•2 years ago
|
||
Comment on attachment 9333229 [details]
Bug 1830795, part 1 - Convert gContentListHashTable to an nsTHashtable.
Approved to land and request uplift
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 18•2 years ago
|
||
Landed:
https://hg.mozilla.org/integration/autoland/rev/2fcc9a72eb7619a9bb749226aeeb76bdb608a083
https://hg.mozilla.org/integration/autoland/rev/66c562eb7c969ac39d16e8837311916ba98ffa61
https://hg.mozilla.org/integration/autoland/rev/82198ba7bafc765593bdd59ca4367b41bcb4bb1d
https://hg.mozilla.org/integration/autoland/rev/dd88ac675a2ce22f7af7784f1abf046f96e3d720
Backed out for causing valgrind failure:
https://hg.mozilla.org/integration/autoland/rev/3cff908b615f9aaa2956ded65e14f0e6a9d38f2a
TEST-UNEXPECTED-FAIL | valgrind-test | Conditional jump or move depends on uninitialised value(s) at nsContentList::nsContentList / nsHTMLDocument::GetFormsAndFormControls / nsContentUtils::GenerateStateKey / nsGenericHTMLFormControlElementWithState::GenerateStateKey
Assignee | ||
Comment 19•2 years ago
|
||
It says it is happening on this line in the ctor: if (mIsLiveList) {
I have no idea how my patch could have affected that. Did I break some kind of ignore list?
Updated•2 years ago
|
Assignee | ||
Comment 20•2 years ago
|
||
Julian, do you have any idea what might be going wrong here? Thanks.
I added a new bit field mInHashtable
to nsContentList, right after the existing one mIsLiveList
.
const bool mIsLiveList : 1;
bool mInHashtable : 1;
Now, in the ctor for nsContentList, it seems to be saying that code that is conditional on mIsLiveList
(which already existed, and I did not change this code).
I looked at the Memcheck:Cond suppressions in the tree and I didn't see anything obviously related.
Maybe I could work around this by changing it to check aLiveList instead of mIsLiveList?
It seems odd that the issue is all showing up for one ctor and not the other.
Assignee | ||
Comment 21•2 years ago
|
||
I thought maybe adding another field caused the bitfield stuff to spill into another byte that remains partially uninitialized (and apparently Valgrind is or was sensitive to that), but it looks like nsContentList currently has a uint8_t, and then 7 single bit fields, so I would have thought adding one more wouldn't cause an issue.
Assignee | ||
Comment 22•2 years ago
•
|
||
I'll see if testing aLiveList helps: https://treeherder.mozilla.org/jobs?repo=try&revision=3ba32660336dfba10077949a459972f934e2dff0
Assignee | ||
Comment 23•2 years ago
|
||
That worked, so I'll go with that. I've added a comment.
For posterity, the specific error message it is seeing is:
Conditional jump or move depends on uninitialised value(s) at nsContentList::nsContentList / [various things]
Updated•2 years ago
|
Comment 24•2 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/f672f1e89144
https://hg.mozilla.org/mozilla-central/rev/d84a7fb099ea
https://hg.mozilla.org/mozilla-central/rev/8c254914fa80
https://hg.mozilla.org/mozilla-central/rev/f1d8476e13e6
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 26•2 years ago
|
||
Comment on attachment 9333232 [details]
Bug 1830795, part 4 - Implement lifetime checking for content list cache entries.
Beta/Release Uplift Approval Request
- User impact if declined: Possible security issues.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): This patch works by crashing earlier when a security problem happens, so in theory it could cause the crash rate to increase. That said, the crash volume is so low that even if it became 10 times more frequent, it would be irrelevant for stability. It has been on Nightly for a few days and I don't see any crashes yet
- String changes made/needed: none
- Is Android affected?: Yes
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 27•2 years ago
|
||
Comment on attachment 9333229 [details]
Bug 1830795, part 1 - Convert gContentListHashTable to an nsTHashtable.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: sec-high
- User impact if declined: sec-high
- Fix Landed on Version: 115
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): See comment 26.
Assignee | ||
Updated•2 years ago
|
Comment 28•2 years ago
|
||
Comment on attachment 9333229 [details]
Bug 1830795, part 1 - Convert gContentListHashTable to an nsTHashtable.
Approved for 114 beta 8, thanks.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 29•2 years ago
|
||
uplift |
Comment 30•2 years ago
|
||
Comment on attachment 9333229 [details]
Bug 1830795, part 1 - Convert gContentListHashTable to an nsTHashtable.
Approved for 102.12esr.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 31•2 years ago
|
||
uplift |
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•1 year ago
|
Description
•