Crash in [@ nsTArray_Impl<T>::Compare<T> | NS_QuickSort | XUL@0x79486f | mozilla::net::CacheIndex::FrecencyArray::SortIfNeeded]
Categories
(Core :: Networking: Cache, defect, P1)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr102 | - | wontfix |
| firefox109 | --- | unaffected |
| firefox110 | --- | unaffected |
| firefox111 | --- | unaffected |
People
(Reporter: kershaw, Assigned: edgul)
References
Details
(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [necko-triaged][necko-priority-review])
Crash Data
Attachments
(3 files, 1 obsolete file)
|
48 bytes,
text/x-phabricator-request
|
tjr
:
sec-approval+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
tjr
:
sec-approval+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
tjr
:
sec-approval+
|
Details | Review |
+++ This bug was initially created as a clone of Bug #1662676 +++
Crash report: https://crash-stats.mozilla.org/report/index/d634ed01-a060-4993-a124-2734e0200901
Top 10 frames of crashing thread:
0 XUL int nsTArray_Impl<mozilla::net::CacheIndexRecord*, nsTArrayInfallibleAllocator>::Compare<detail::CompareWrapper<mozilla::net:: xpcom/ds/nsTArray.h:2255
1 XUL NS_QuickSort xpcom/ds/nsQuickSort.cpp:117
2 XUL XUL@0x79486f
3 XUL mozilla::net::CacheIndex::FrecencyArray::SortIfNeeded netwerk/cache2/CacheIndex.cpp:3401
4 XUL mozilla::net::CacheIndexEntryAutoManage::~CacheIndexEntryAutoManage netwerk/cache2/CacheIndex.cpp:108
5 XUL mozilla::net::CacheIndex::MergeJournal netwerk/cache2/CacheIndex.cpp:95
6 XUL mozilla::net::CacheIndex::FinishRead netwerk/cache2/CacheIndex.cpp:2537
7 XUL mozilla::net::CacheIndex::OnDataRead netwerk/cache2/CacheIndex.cpp:3667
8 XUL mozilla::net::ReadEvent::Run netwerk/cache2/CacheFileIOManager.cpp:700
9 XUL mozilla::net::CacheIOThread::LoopOneLevel netwerk/cache2/CacheIOThread.cpp:538
For a Windows crash report I see these inline frames with VS:
[Inline Frame] xul.dll!mozilla::net::`anonymous namespace'::FrecencyComparator::Equals(mozilla::net::CacheIndexRecord * a, mozilla::net::CacheIndexRecord * b) Line 47
at /builds/worker/checkouts/gecko/netwerk/cache2/CacheIndex.cpp(47)
[Inline Frame] xul.dll!detail::CompareWrapper<mozilla::net::(anonymous namespace)::FrecencyComparator,mozilla::net::CacheIndexRecord *,0>::Equals(mozilla::net::CacheIndexRecord * const & aLeft, mozilla::net::CacheIndexRecord * const & aRight) Line 928
at /builds/worker/workspace/obj-build/dist/include/nsTArray.h(928)
[Inline Frame] xul.dll!detail::CompareWrapper<mozilla::net::(anonymous namespace)::FrecencyComparator,mozilla::net::CacheIndexRecord *,0>::Compare(mozilla::net::CacheIndexRecord * const & aLeft, mozilla::net::CacheIndexRecord * const & aRight) Line 920
at /builds/worker/workspace/obj-build/dist/include/nsTArray.h(920)
[Inline Frame] xul.dll!nsTArray_Impl<mozilla::net::CacheIndexRecord *,nsTArrayInfallibleAllocator>::Compare(const void * aE1, const void * aE2, void * aData) Line 2300
at /builds/worker/workspace/obj-build/dist/include/nsTArray.h(2300)
xul.dll!NS_QuickSort(void * a, unsigned int n, unsigned int es, int(*)(const void *, const void *, void *) cmp, void * data) Line 134
at /builds/worker/checkouts/gecko/xpcom/ds/nsQuickSort.cpp(134)
[Inline Frame] xul.dll!nsTArray_Impl<mozilla::net::CacheIndexRecord *,nsTArrayInfallibleAllocator>::Sort(const mozilla::net::`anonymous namespace'::FrecencyComparator & aComp) Line 2310
at /builds/worker/workspace/obj-build/dist/include/nsTArray.h(2310)
xul.dll!mozilla::net::CacheIndex::FrecencyArray::SortIfNeeded() Line 3401
at /builds/worker/checkouts/gecko/netwerk/cache2/CacheIndex.cpp(3401)
This looks like a UAF on some of the elements within FrecencyArray::mRecs
| Reporter | ||
Comment 1•4 years ago
|
||
The crash is still happening, so open a new bug to track this crash.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 2•4 years ago
|
||
Won't have time to look at this this month.
| Reporter | ||
Comment 3•4 years ago
|
||
Randell's work in bug 1747439 and bug 1746875 could also fix this.
Let's watch if the crash rate goes down.
I'll also take a look again.
| Reporter | ||
Comment 4•4 years ago
|
||
Update: we are still waiting to see if bug 1747439 fixes this.
Currently, there is no crash after beta 97.0b9.
Comment 5•4 years ago
|
||
Looks like we're still seeing crashes with Release 97.
| Reporter | ||
Comment 6•4 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #5)
Looks like we're still seeing crashes with Release 97.
Thanks. I am aware of this and I am trying to have a patch to fix this.
| Reporter | ||
Comment 7•4 years ago
|
||
The idea in this patch is like D99680, but the difference is that mFrecencyArray now contains CacheIndexRecordWrapper.
This should be able to help us track why CacheIndexRecordWrapper is released before removing it from mFrecencyArray.
| Reporter | ||
Updated•4 years ago
|
| Reporter | ||
Comment 8•4 years ago
|
||
Comment on attachment 9265135 [details]
Bug 1745972 - Check if CacheIndexRecordWrapper is still in mFrecencyArray, r=#necko
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Probably not easy. This is just a diagnostic patch.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?: Should be able to uplift cleanly.
- How likely is this patch to cause regressions; how much testing does it need?: Low. This patch only adds a diagnostic assertion.
| Reporter | ||
Comment 9•4 years ago
|
||
Hi Tom,
Could you take a look at this sec-approval?
thanks.
Comment 10•4 years ago
|
||
Since the request we've been past the last uplift window; sec-approvals will resume once the tree reopens.
Comment 11•4 years ago
|
||
Comment on attachment 9265135 [details]
Bug 1745972 - Check if CacheIndexRecordWrapper is still in mFrecencyArray, r=#necko
Approved to land and if desired, uplift.
Comment 12•4 years ago
|
||
Check if CacheIndexRecordWrapper is still in mFrecencyArray, r=necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/850740138196ee03044cbad587e676a0bbd68aba
https://hg.mozilla.org/mozilla-central/rev/850740138196
| Reporter | ||
Comment 13•4 years ago
|
||
Comment on attachment 9265135 [details]
Bug 1745972 - Check if CacheIndexRecordWrapper is still in mFrecencyArray, r=#necko
Beta/Release Uplift Approval Request
- User impact if declined: No impact. This is a diagnostic patch.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: N/A
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This patch only adds a diagnostic assertion.
- String changes made/needed: N/A
Comment 14•4 years ago
|
||
Comment on attachment 9265135 [details]
Bug 1745972 - Check if CacheIndexRecordWrapper is still in mFrecencyArray, r=#necko
Approved for 99.0b3. Thanks.
Comment 15•4 years ago
|
||
| uplift | ||
Comment 16•4 years ago
|
||
Comment on attachment 9265135 [details]
Bug 1745972 - Check if CacheIndexRecordWrapper is still in mFrecencyArray, r=#necko
Cleared approval-mozilla-beta to remove this from the "missed uplift" queries.
| Reporter | ||
Comment 17•4 years ago
|
||
Status update: we are still waiting for this diagnostic assertion to be triggered.
There is no crash report for this so far.
| Assignee | ||
Comment 18•3 years ago
|
||
Status update: After significant research and discussion with Kershaw, we intend on moving CacheIndexRecordWrapper deletion to the Cache IO thread to eliminate the possibility that the main thread deletions are causing some kind of race condition during sorting. Hoping to get to this next week.
| Assignee | ||
Comment 19•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
| Assignee | ||
Comment 20•3 years ago
|
||
Comment on attachment 9285377 [details]
Bug 1745972 - Dispatch cache wrapper deletion to caller thread to acquire lock before deletion and safety-removal from frecency array. r?kershaw
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Difficult, I cannot seem to replicate this.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: All
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: Shouldn't need to uplift
- How likely is this patch to cause regressions; how much testing does it need?: Low.
- Is Android affected?: Yes
Comment 22•3 years ago
|
||
Comment on attachment 9285377 [details]
Bug 1745972 - Dispatch cache wrapper deletion to caller thread to acquire lock before deletion and safety-removal from frecency array. r?kershaw
Approved to land and uplift if needed
Comment 23•3 years ago
|
||
Dispatch cache wrapper deletion to caller thread to acquire lock before deletion and safety-removal from frecency array. r=kershaw,necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/c1b26d78f5d1fbd7392fee3fb646d8c99016c2a2
https://hg.mozilla.org/mozilla-central/rev/c1b26d78f5d1
| Assignee | ||
Comment 24•3 years ago
|
||
Moving to necko review queue to monitor if fix works.
Comment 25•3 years ago
|
||
Did you want to uplift this to Beta so it can go out with the 105 release in a couple weeks instead of waiting until October for 106 to ship? Go ahead and nominate the patch for approval if yes.
| Assignee | ||
Comment 26•3 years ago
|
||
Comment on attachment 9285377 [details]
Bug 1745972 - Dispatch cache wrapper deletion to caller thread to acquire lock before deletion and safety-removal from frecency array. r?kershaw
Beta/Release Uplift Approval Request
- User impact if declined: Low. Most users don't observe the original crash
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Low impact to users. Only safety precautions added.
- String changes made/needed: None
- Is Android affected?: Yes
Comment 27•3 years ago
|
||
Comment on attachment 9285377 [details]
Bug 1745972 - Dispatch cache wrapper deletion to caller thread to acquire lock before deletion and safety-removal from frecency array. r?kershaw
Approved for 105.0b6.
Comment 28•3 years ago
|
||
| uplift | ||
Comment on attachment 9285377 [details]
Bug 1745972 - Dispatch cache wrapper deletion to caller thread to acquire lock before deletion and safety-removal from frecency array. r?kershaw
Landed on Beta for 105.0b6. Clearing the approval flag to get this off the needs-uplift radar.
https://hg.mozilla.org/releases/mozilla-beta/rev/4748ec75e338
| Reporter | ||
Comment 29•3 years ago
|
||
I found this crash report that shows there is a crash when deleting mWrapper.
I also noticed that the sLock sometimes is unlocked when doing I/O, so maybe we should use another lock to protect CacheIndexRecordWrapper.
| Assignee | ||
Comment 30•3 years ago
|
||
Noted. Will try to take another look next week.
| Assignee | ||
Comment 31•3 years ago
•
|
||
Under investigation. re: comment 29
Comment 32•3 years ago
|
||
So, looking at https://crash-stats.mozilla.org/report/index/07b6144c-c48a-442b-bb2e-4c9380221010#tab-details
We crash on this line which makes me think that we have an ownership problem here.
mIndex->mFrecencyArray.RemoveRecord(entry->mRec, mProofOfLock); actually release mRec? (we don't seem to hold a ref to it.
Maybe the simple fix is to have a RefPtr<CacheIndexRecordWrapper> rec = entry->mRec; then use that instead of entry->mRec.
I haven't fully confirmed that's the problem, but that code seems a bit fishy to me.
Comment 33•3 years ago
•
|
||
(In reply to Valentin Gosu [:valentin] (he/him) from comment #32)
mIndex->mFrecencyArray.RemoveRecord(entry->mRec, mProofOfLock);actually release mRec?
It seems to be the case that we null out mRecs[idx], where idx comes from entry->mRec. Being mRecs an nsTArray<RefPtr<CacheIndexRecordWrapper>> an array of RefPtr this might delete something unexpectedly, indeed.
Not sure about the level of indirection here, though: Keeping a RefPtr for entry->mRec might actually not refer to the same object we potentially delete?
| Assignee | ||
Comment 34•3 years ago
|
||
A few things to note here.
We can note some key differences between the crash report in comment 32 and here.
Respectively:
- EXCEPTION_ACCESS_VIOLATION_READ vs EXCEPTION_ACCESS_VIOLATION_WRITE
- Crash line in ~CacheIndexEntryAutoManage:
here vs here - Crash in NS_QuickSort vs std::min, which contains call to
Length(), which calls mRecs'sLength()
So [1] and [3] seem to suggest that just touching mRecs, could this be multiple thread access?
While [2] seems to support with ownership/deletion theory, being another occurrence upon removal.
Along these lines I'm very suspicious of the mIndex which doesn't appear to ever remove it's entries, just label them as removed. If this ever got out of sync after a manual mRecs[idx] = nullptr; then the FindEntry() could pass back a live raw pointer to garbage. Need more digging here, but this smells like the right direction.
| Assignee | ||
Comment 35•3 years ago
•
|
||
I'm not sure if this is the same root cause, but this assert crash seems to indicate that we are failing to find an item in the frencency array during removal after it has already been removed from the entry list.
https://crash-stats.mozilla.org/report/index/85ca9316-79b1-4728-9c86-f3d9b0221030
Since the RefPtr is already being held by mOldRecord in this case, it may also suggest the idea to protect ourselves with RefPtr<CacheIndexRecordWrapper> rec = entry->mRec; may not be enough.
| Assignee | ||
Comment 36•3 years ago
|
||
| Assignee | ||
Comment 37•3 years ago
|
||
Also the patch above adds what may be too simplistic a test, but perhaps suggests that the nsTarray<..> mRecs is not the reason for a double delete.
Comment 38•3 years ago
|
||
Randell has been adding some more thread safety annotations in https://phabricator.services.mozilla.com/D160392, which may help here.
| Assignee | ||
Comment 39•3 years ago
|
||
Randell's patch should only affect compile time error detection, so we shouldn't expect any fixes here. But reviewing the added protections may inspire some ideas. Will try to review this week.
| Assignee | ||
Comment 40•3 years ago
|
||
Depends on D161215
| Assignee | ||
Comment 41•3 years ago
|
||
Added basic xpcshell-test that makes many requests to different paths of httpd server. Tested on linux and windows without any luck eliciting the crash.
Some limitations exist in the current test, likely due to the test setup/framework. For example, the test fails for huge numbers of requests (>10000 and with --verify). Maybe there is room to increase the number of runnable tests to improve the likelihood of catching the crash with automation?
| Assignee | ||
Comment 42•3 years ago
|
||
Additional testing doesn't seem like a viable approach here given how difficult the issue has been to reproduce.
I am currently working on a diagnostic patch to improve our understanding of the suspicious items.
Updated•3 years ago
|
| Assignee | ||
Comment 43•3 years ago
•
|
||
"Quick" update.
Changes in the above diagnostic patch will help validate:
- Correct item is removed from mRecs on release builds
- Items are removed from mRecs during resizing are indeed null first on release builds
- Duplicate appends to mRecs will not occur on release builds
Some other considerations:
- Could lock is drop for potentially blocking IO operations be a factor here? I took a quick look, and didn’t see anything during the lock drop that touches
mRecs. So this seems unlikely. mRecs.IndexOfseems to return an unexpectedNoIndex. Maybe it could return an incorrect item that gets deleted too early or twice? The assumptions of E innsTArray<E>appear to hold: RefPtr has copy-constructor, operator< and operator== are defined in FrecencyComparator and (I’m pretty sure(?)) RefPtr is trivially copyable. If we assume all modifications to mRecs (justification below) then is there any other wayIndexOfcould return an incorrect value? This commit may help narrow down if this is a problem to begin with.- Could an item be destroyed and re-added? It looks like there are situations where we remove and then re-add an item to
mRecs. But the item should be kept alive by RefPtr from what I can tell. sLock.AssertCurrentThreadOwnswill only verify on debug builds, which isn’t super helpful as this crash seems rare enough to only happen on release builds in the wild. I considered adding MOZ_RELEASE_ASSERT here, but we already have proof_of_lock parameters, so this didn’t seem worth the time at this point.
It may also be worth noting the following:
- we seem to also have allocation problems, for example. May allow us to be more confident that items are being correctly added otherwise we would see these errors first.
- This crash was also interesting. It appears to be a result of trying to directly index
mRecswith aNoIndex. Though the line reference is off and I’m not sure how we got past the ASSERT. I took a look around but it looks like all direct indexing (including via the iterators) of mRecs happens under lock. So it doesn’t seem like we can label this as another thread messing with the contents of mRecs.
Any further ideas or critiques of my assumptions would be greatly appreciated.
| Assignee | ||
Comment 44•3 years ago
|
||
Comment on attachment 9301876 [details]
Bug 1745972 - Added assertions to verify additions and removals of mRecs. r=kershaw
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Difficult, this is diagnostic
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: All
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?:
- How likely is this patch to cause regressions; how much testing does it need?: Low, this is diagnostic
- Is Android affected?: Unknown
| Reporter | ||
Updated•3 years ago
|
| Assignee | ||
Comment 45•3 years ago
|
||
| Assignee | ||
Comment 46•3 years ago
|
||
Hey Tom, can we get security approval on this when you get a minute?
Comment 47•3 years ago
|
||
Comment on attachment 9301876 [details]
Bug 1745972 - Added assertions to verify additions and removals of mRecs. r=kershaw
Yes; tree just opened up so approved to land.
| Assignee | ||
Comment 48•3 years ago
|
||
Comment on attachment 9301876 [details]
Bug 1745972 - Added assertions to verify additions and removals of mRecs. r=kershaw
Beta/Release Uplift Approval Request
- User impact if declined: This is diagnostic. Crash will persist longer without additional information to help debug.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Diagnostic
- String changes made/needed: None
- Is Android affected?: Yes
Updated•3 years ago
|
| Assignee | ||
Comment 49•3 years ago
|
||
Put back in review for monitoring.
Comment 50•3 years ago
|
||
Added assertions to verify additions and removals of mRecs. r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/a72fdd4a523adf189b53e2f7e5b4cbb46921e329
https://hg.mozilla.org/mozilla-central/rev/a72fdd4a523a
Comment 51•3 years ago
|
||
Comment on attachment 9301876 [details]
Bug 1745972 - Added assertions to verify additions and removals of mRecs. r=kershaw
Approved for 109.0b2.
Comment 52•3 years ago
|
||
| uplift | ||
Comment on attachment 9301876 [details]
Bug 1745972 - Added assertions to verify additions and removals of mRecs. r=kershaw
Landed on Beta. Removing the approval to get it off the needs-uplift radar.
https://hg.mozilla.org/releases/mozilla-beta/rev/91741e5269e0
Comment 53•3 years ago
|
||
[Tracking Requested - why for this release]:
Active crashes in 102esr
No crashes past 105, so likely fixed in 106. Someone should survey likely candidates from 106 for backport to 102esr
Updated•3 years ago
|
| Assignee | ||
Comment 54•3 years ago
|
||
Closing as the discussion around this issue is moving to Bug 1812420. The new bug will aim to also investigate potential signature changes.
Updated•3 years ago
|
Comment 55•3 years ago
|
||
Given bug 1812420, it doesn't sound like there's anything actionable for ESR to do in this bug.
Updated•2 years ago
|
Description
•