Crash in [@ mozilla::dom::ScriptLoadContext::~ScriptLoadContext]
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
People
(Reporter: gsvelto, Assigned: jonco)
References
Details
(5 keywords, Whiteboard: [adv-main110-][adv-esr102.8-])
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/1b54277a-c51b-43ef-b79f-995cc0221209
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames of crashing thread:
0 libxul.so mozilla::RefPtrTraits<JS::loader::ScriptLoadRequest>::Release mfbt/RefPtr.h:50
0 libxul.so RefPtr<JS::loader::ScriptLoadRequest>::ConstRemovingRefPtrTraits<JS::loader::ScriptLoadRequest>::Release mfbt/RefPtr.h:381
0 libxul.so RefPtr<JS::loader::ScriptLoadRequest>::assign_assuming_AddRef mfbt/RefPtr.h:69
0 libxul.so RefPtr<JS::loader::ScriptLoadRequest>::operator= mfbt/RefPtr.h:168
0 libxul.so mozilla::dom::ScriptLoadContext::~ScriptLoadContext dom/script/ScriptLoadContext.cpp:76
1 libxul.so mozilla::dom::ScriptLoadContext::~ScriptLoadContext dom/script/ScriptLoadContext.cpp:72
2 libxul.so SnowWhiteKiller::MaybeKillObject xpcom/base/nsCycleCollector.cpp:2447
2 libxul.so SnowWhiteKiller::Visit xpcom/base/nsCycleCollector.cpp:2472
2 libxul.so nsPurpleBuffer::VisitEntries<SnowWhiteKiller> xpcom/base/nsCycleCollector.cpp:939
3 libxul.so nsCycleCollector::FreeSnowWhiteWithBudget xpcom/base/nsCycleCollector.cpp:2640
This appears to be a double-free that's Android specific. I'm specifying double-free because the crash is obviously an UAF but it's happening within a destructor which suggests the destructor is being called on an object that was already freed. A couple of note: in the past we had similar crashes being triggered by the SnowWhiteKiller which were obviously caused by bad hardware; I don't think that's the case here though. All the crashes have consistent stacks and this is completely Android-specific.
Reporter | ||
Comment 1•3 years ago
|
||
A note about the crash distribution: this appears like a spike but it's not, this signature was broken off from this signature when we tweaked signature generation. This being said it doesn't appear like we had a crash on file for this particular issue under the previous signature.
Comment 3•3 years ago
|
||
So I duped bug 1805133 but comment 0 here claims this is Android-specific, while bug 1805133 says there are reports on Windows and macOS (but not Linux, perhaps because the crash signature is different there?). Perhaps my duping was done in error, or perhaps there's something else going on?
Reporter | ||
Comment 4•3 years ago
|
||
Indeed there are non-Android crashes. When I looked at it I only saw non-Android crashes coming from old versions but maybe I missed the new ones. This seems to be the same crash on all platforms, just more common on Android. There aren't comments hinting at what might have happened but here's two URLs that appear in the reports:
Comment 5•3 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on beta
:edgar, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 7•3 years ago
•
|
||
Interesting. This is a crash on the DOM script loader if I understand correctly. I haven't been working on it recently, but I have done a lot of work on the workers side of things. I will take a look. Did this start on android in 108?
Updated•3 years ago
|
Reporter | ||
Comment 8•3 years ago
|
||
(In reply to Yulia Startsev [:yulia] from comment #7)
Interesting. This is a crash on the DOM script loader if I understand correctly. I haven't been working on it recently, but I have done a lot of work on the workers side of things. I will take a look. Did this start on android in 108?
It looks that way but the signature changed so there's older crashes. Specifically I think this is the first crash we have on file that appears to be this issue. The build id is 20221119085828. We later have crashes appearing in 108 beta so I guess it might be a regression from a patch that was uplifted?
Reporter | ||
Comment 9•3 years ago
|
||
This seems to be the first beta crash we have on file and it's in 108 beta 4, build id 20221124185931.
This is the set of changes for the 20221119085828 build. Bug 1774111 is in the regression range and was backported to beta and ESR. I also see two pieces of ESMification which could theoretically be related (bug 1801346, bug 1801338), but those were not backported to 108.
This is the set of changes added from beta 3 to beta 4, and it doesn't include bug 1774111, so I don't know. Bug 1801044 is in both sets of patches, but I can't imagine how it could be related.
Comment 12•3 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #8)
It looks that way but the signature changed so there's older crashes. Specifically I think this is the first crash we have on file that appears to be this issue. The build id is 20221119085828. We later have crashes appearing in 108 beta so I guess it might be a regression from a patch that was uplifted?
Adding older signature [@ mozilla::RefPtrTraits<T>::Release ]
so we can see if the crash volume has changed between the old and new signatures.
I set the severity to S2.
Comment 14•3 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #12)
Adding older signature
[@ mozilla::RefPtrTraits<T>::Release ]
so we can see if the crash volume has changed between the old and new signatures.
I don't think we'll be able to see that. That signature is much broader, and most of the crashes are from unrelated bugs.
Updated•3 years ago
|
Comment 15•3 years ago
|
||
I'll cc jonco here since he most recently worked on this
Assignee | ||
Comment 16•3 years ago
|
||
So this appears to be a double free of ScriptLoadContext.
This class is derived from LoadContextBase, and is strored in ScriptLoadRequest::mLoadContext. It has a back pointer to the request in mRequest (and we're crashing when we try to follow this pointer in the destructor).
AFAICS we create instances of this class and pass them straight into factory methods for ScriptLoadRequest. The data contained is accessed via load requests but we never maintain external pointers to them. This is simpler than I expected and I don't seen any obvious way we could be creating references to a dead object.
The one non-standard thing is that ScriptLoadRequest::mLoadContext is a RefPtr to an nsISupports derived base class rather than a concrete type. This seems to be against the general rule of thumb in the documentation but i don't know if would cause this problem.
Andrew, how much of a problem is this?
I think using a RefPtr instead of an nsCOMPtr for an nsISupports class just means you are missing out on some debug assertions that the pointer is the canonical pointer. I suppose that could catch some kinds of refcounting issues, but again, only in debug builds.
Comment 18•3 years ago
•
|
||
It should be fine to use a RefPtr<LoadContextBase>
for ScriptLoadRequest::mLoadContext
, storing concrete classes (as opposed to XPCOM interfaces) in a RefPtr
is ok. It is pretty weird that LoadContextBase
doesn't return anything for nsISupports
from its QueryInterface
implementation (see https://searchfox.org/mozilla-central/rev/cef96316b3643720769dec96542604c3209f1877/js/loader/LoadContextBase.cpp#19-20), but I don't think that's related to the crash. I filed bug 1806872 for that though.
Comment 19•3 years ago
|
||
We're running out of betas this cycle to address this for 109. Who owns the next step here?
Updated•3 years ago
|
Comment 20•3 years ago
|
||
The bug is marked as tracked for firefox109 (beta) and tracked for firefox110 (nightly). We have limited time to fix this, the soft freeze is in 8 days. However, the bug still isn't assigned.
:hsinyi, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit auto_nag documentation.
Comment 21•3 years ago
|
||
I'm currently sick, but should be back monday. I don't think I have time to fix this. If jonco is back maybe he can take a look?
Comment 22•3 years ago
|
||
jonco told me he is investigating this. Thank you, jonco.
Assignee | ||
Comment 23•3 years ago
|
||
I'm looking into this but haven't found anything so far. The way completion of off-thread compilations works is a bit sketchy so I'm focusing on that at the moment.
I wasn't able to find any patches that were backported to beta in the right timeframe to explain why this started happening on both beta and nightly around the same time (around 23/24 Nov AFAICS from crashstats).
Bug 1800496 landed just before this and affects ScriptLoadContext, but it's not present on beta.
ESMification greatly increased our use of modules, but doesn't involve ScriptLoadContext which is only used for by the DOM (it uses ComponentLoadContext instead).
Comment 24•3 years ago
|
||
We're not going to block the 109.0 release on this, but the crash volume is high enough that I'd consider respinning the RC builds for it if a low-risk fix is available. Otherwise, leaving this on the radar for a possible dot release ride-along too.
Assignee | ||
Comment 25•3 years ago
|
||
Current status: I've been looking into this but haven't made any progress at finding the problem.
The code using ScriptLoadContext directly is pretty normal. One possibility is that the problem is related to off-thread parsing since the completion runnable has a pointer to the ScriptLoadRequest, which has a pointer to the context. The code around this is pretty hairy and I wouldn't be surprised if there were bugs lurking, however I've written a test to exercise this but it hasn't shown up any problems.
The timing of this is strange. I'm mystified as to why it suddenly showed up and beta and nightly at the same time, with no significant changes to the script loader around that time.
Updated•3 years ago
|
Comment 26•3 years ago
|
||
Shot in the dark - have we seen anything like this in the fuzzers lately? Grasping at straws a bit here, but seems like any STR we can find would be useful.
Comment 27•3 years ago
|
||
Unfortunately not. I'll keep an eye out in case anything related comes in.
Updated•3 years ago
|
Comment 28•3 years ago
|
||
Not going to make 109 at this point, but fingers crossed that bug 1811939 works for 110.
Assignee | ||
Comment 29•3 years ago
|
||
No crashes on nightly since bug 1811939 landed.
Comment 30•3 years ago
|
||
No crashes from Beta since either. I think we're good here.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 31•3 years ago
|
||
Since nothing landed here and the signatures are identical we should dupe this
Updated•2 years ago
|
Description
•