Closed Bug 1805019 Opened 3 years ago Closed 3 years ago

Crash in [@ mozilla::dom::ScriptLoadContext::~ScriptLoadContext]

Categories

(Core :: DOM: Core & HTML, defect)

Unspecified
All
defect

Tracking

()

RESOLVED DUPLICATE of bug 1811939
Tracking Status
firefox-esr102 110+ fixed
firefox108 --- wontfix
firefox109 + wontfix
firefox110 + fixed
firefox111 + fixed

People

(Reporter: gsvelto, Assigned: jonco)

References

Details

(5 keywords, Whiteboard: [adv-main110-][adv-esr102.8-])

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/1b54277a-c51b-43ef-b79f-995cc0221209

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0  libxul.so  mozilla::RefPtrTraits<JS::loader::ScriptLoadRequest>::Release  mfbt/RefPtr.h:50
0  libxul.so  RefPtr<JS::loader::ScriptLoadRequest>::ConstRemovingRefPtrTraits<JS::loader::ScriptLoadRequest>::Release  mfbt/RefPtr.h:381
0  libxul.so  RefPtr<JS::loader::ScriptLoadRequest>::assign_assuming_AddRef  mfbt/RefPtr.h:69
0  libxul.so  RefPtr<JS::loader::ScriptLoadRequest>::operator=  mfbt/RefPtr.h:168
0  libxul.so  mozilla::dom::ScriptLoadContext::~ScriptLoadContext  dom/script/ScriptLoadContext.cpp:76
1  libxul.so  mozilla::dom::ScriptLoadContext::~ScriptLoadContext  dom/script/ScriptLoadContext.cpp:72
2  libxul.so  SnowWhiteKiller::MaybeKillObject  xpcom/base/nsCycleCollector.cpp:2447
2  libxul.so  SnowWhiteKiller::Visit  xpcom/base/nsCycleCollector.cpp:2472
2  libxul.so  nsPurpleBuffer::VisitEntries<SnowWhiteKiller>  xpcom/base/nsCycleCollector.cpp:939
3  libxul.so  nsCycleCollector::FreeSnowWhiteWithBudget  xpcom/base/nsCycleCollector.cpp:2640

This appears to be a double-free that's Android specific. I'm specifying double-free because the crash is obviously an UAF but it's happening within a destructor which suggests the destructor is being called on an object that was already freed. A couple of note: in the past we had similar crashes being triggered by the SnowWhiteKiller which were obviously caused by bad hardware; I don't think that's the case here though. All the crashes have consistent stacks and this is completely Android-specific.

A note about the crash distribution: this appears like a spike but it's not, this signature was broken off from this signature when we tweaked signature generation. This being said it doesn't appear like we had a crash on file for this particular issue under the previous signature.

Duplicate of this bug: 1805133

So I duped bug 1805133 but comment 0 here claims this is Android-specific, while bug 1805133 says there are reports on Windows and macOS (but not Linux, perhaps because the crash signature is different there?). Perhaps my duping was done in error, or perhaps there's something else going on?

Flags: needinfo?(gsvelto)
Flags: needinfo?(cpeterson)

Indeed there are non-Android crashes. When I looked at it I only saw non-Android crashes coming from old versions but maybe I missed the new ones. This seems to be the same crash on all platforms, just more common on Android. There aren't comments hinting at what might have happened but here's two URLs that appear in the reports:

Flags: needinfo?(gsvelto)
OS: Android → All

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on beta

:edgar, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(echen)
Keywords: topcrash

yulia, by any chance, does this ring any bells?

Flags: needinfo?(ystartsev)

Interesting. This is a crash on the DOM script loader if I understand correctly. I haven't been working on it recently, but I have done a lot of work on the workers side of things. I will take a look. Did this start on android in 108?

(In reply to Yulia Startsev [:yulia] from comment #7)

Interesting. This is a crash on the DOM script loader if I understand correctly. I haven't been working on it recently, but I have done a lot of work on the workers side of things. I will take a look. Did this start on android in 108?

It looks that way but the signature changed so there's older crashes. Specifically I think this is the first crash we have on file that appears to be this issue. The build id is 20221119085828. We later have crashes appearing in 108 beta so I guess it might be a regression from a patch that was uplifted?

This seems to be the first beta crash we have on file and it's in 108 beta 4, build id 20221124185931.

This is the set of changes for the 20221119085828 build. Bug 1774111 is in the regression range and was backported to beta and ESR. I also see two pieces of ESMification which could theoretically be related (bug 1801346, bug 1801338), but those were not backported to 108.

This is the set of changes added from beta 3 to beta 4, and it doesn't include bug 1774111, so I don't know. Bug 1801044 is in both sets of patches, but I can't imagine how it could be related.

(In reply to Gabriele Svelto [:gsvelto] from comment #8)

It looks that way but the signature changed so there's older crashes. Specifically I think this is the first crash we have on file that appears to be this issue. The build id is 20221119085828. We later have crashes appearing in 108 beta so I guess it might be a regression from a patch that was uplifted?

Adding older signature [@ mozilla::RefPtrTraits<T>::Release ] so we can see if the crash volume has changed between the old and new signatures.

Crash Signature: [@ mozilla::dom::ScriptLoadContext::~ScriptLoadContext] → [@ mozilla::dom::ScriptLoadContext::~ScriptLoadContext] [@ @0x0 | mozilla::dom::ScriptLoadContext::~ScriptLoadContext ] [@ mozilla::RefPtrTraits<T>::Release ]
Flags: needinfo?(cpeterson)

I set the severity to S2.

Flags: needinfo?(echen)

(In reply to Chris Peterson [:cpeterson] from comment #12)

Adding older signature [@ mozilla::RefPtrTraits<T>::Release ] so we can see if the crash volume has changed between the old and new signatures.

I don't think we'll be able to see that. That signature is much broader, and most of the crashes are from unrelated bugs.

Crash Signature: [@ mozilla::dom::ScriptLoadContext::~ScriptLoadContext] [@ @0x0 | mozilla::dom::ScriptLoadContext::~ScriptLoadContext ] [@ mozilla::RefPtrTraits<T>::Release ] → [@ mozilla::dom::ScriptLoadContext::~ScriptLoadContext] [@ @0x0 | mozilla::dom::ScriptLoadContext::~ScriptLoadContext ]

I'll cc jonco here since he most recently worked on this

Flags: needinfo?(ystartsev) → needinfo?(jcoppeard)

So this appears to be a double free of ScriptLoadContext.

This class is derived from LoadContextBase, and is strored in ScriptLoadRequest::mLoadContext. It has a back pointer to the request in mRequest (and we're crashing when we try to follow this pointer in the destructor).

AFAICS we create instances of this class and pass them straight into factory methods for ScriptLoadRequest. The data contained is accessed via load requests but we never maintain external pointers to them. This is simpler than I expected and I don't seen any obvious way we could be creating references to a dead object.

The one non-standard thing is that ScriptLoadRequest::mLoadContext is a RefPtr to an nsISupports derived base class rather than a concrete type. This seems to be against the general rule of thumb in the documentation but i don't know if would cause this problem.

Andrew, how much of a problem is this?

Crash Signature: [@ mozilla::dom::ScriptLoadContext::~ScriptLoadContext] [@ @0x0 | mozilla::dom::ScriptLoadContext::~ScriptLoadContext ] → [@ mozilla::dom::ScriptLoadContext::~ScriptLoadContext]
Flags: needinfo?(jcoppeard) → needinfo?(continuation)

I think using a RefPtr instead of an nsCOMPtr for an nsISupports class just means you are missing out on some debug assertions that the pointer is the canonical pointer. I suppose that could catch some kinds of refcounting issues, but again, only in debug builds.

Flags: needinfo?(continuation)

It should be fine to use a RefPtr<LoadContextBase> for ScriptLoadRequest::mLoadContext, storing concrete classes (as opposed to XPCOM interfaces) in a RefPtr is ok. It is pretty weird that LoadContextBase doesn't return anything for nsISupports from its QueryInterface implementation (see https://searchfox.org/mozilla-central/rev/cef96316b3643720769dec96542604c3209f1877/js/loader/LoadContextBase.cpp#19-20), but I don't think that's related to the crash. I filed bug 1806872 for that though.

We're running out of betas this cycle to address this for 109. Who owns the next step here?

Flags: needinfo?(ystartsev)
Flags: needinfo?(jcoppeard)

The bug is marked as tracked for firefox109 (beta) and tracked for firefox110 (nightly). We have limited time to fix this, the soft freeze is in 8 days. However, the bug still isn't assigned.

:hsinyi, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(htsai)

I'm currently sick, but should be back monday. I don't think I have time to fix this. If jonco is back maybe he can take a look?

Flags: needinfo?(ystartsev)

jonco told me he is investigating this. Thank you, jonco.

Flags: needinfo?(htsai)

I'm looking into this but haven't found anything so far. The way completion of off-thread compilations works is a bit sketchy so I'm focusing on that at the moment.

I wasn't able to find any patches that were backported to beta in the right timeframe to explain why this started happening on both beta and nightly around the same time (around 23/24 Nov AFAICS from crashstats).

Bug 1800496 landed just before this and affects ScriptLoadContext, but it's not present on beta.

ESMification greatly increased our use of modules, but doesn't involve ScriptLoadContext which is only used for by the DOM (it uses ComponentLoadContext instead).

Assignee: nobody → jcoppeard
Flags: needinfo?(jcoppeard)

We're not going to block the 109.0 release on this, but the crash volume is high enough that I'd consider respinning the RC builds for it if a low-risk fix is available. Otherwise, leaving this on the radar for a possible dot release ride-along too.

Current status: I've been looking into this but haven't made any progress at finding the problem.

The code using ScriptLoadContext directly is pretty normal. One possibility is that the problem is related to off-thread parsing since the completion runnable has a pointer to the ScriptLoadRequest, which has a pointer to the context. The code around this is pretty hairy and I wouldn't be surprised if there were bugs lurking, however I've written a test to exercise this but it hasn't shown up any problems.

The timing of this is strange. I'm mystified as to why it suddenly showed up and beta and nightly at the same time, with no significant changes to the script loader around that time.

Depends on: 1809861

Shot in the dark - have we seen anything like this in the fuzzers lately? Grasping at straws a bit here, but seems like any STR we can find would be useful.

Flags: needinfo?(jkratzer)

Unfortunately not. I'll keep an eye out in case anything related comes in.

Flags: needinfo?(jkratzer)

Not going to make 109 at this point, but fingers crossed that bug 1811939 works for 110.

No crashes on nightly since bug 1811939 landed.

No crashes from Beta since either. I think we're good here.

Group: dom-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Whiteboard: [adv-main110-]
Whiteboard: [adv-main110-] → [adv-main110-][adv-esr102.8-]

Since nothing landed here and the signatures are identical we should dupe this

Duplicate of bug: CVE-2023-25739
Resolution: FIXED → DUPLICATE
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.