Closed Bug 1443429 Opened 2 years ago Closed 8 months ago

Crash in mozilla::CycleCollectedJSContext::CleanupIDBTransactions

Categories

(Core :: DOM: Core & HTML, defect, P2, critical)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox58 --- unaffected
firefox59 --- unaffected
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: calixte, Assigned: jya)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(4 files, 2 obsolete files)

This bug was filed from the Socorro interface and is
report bp-3bcb2239-a611-42e2-be03-b218f0180306.
=============================================================

Top 10 frames of crashing thread:

0 libxul.so mozilla::CycleCollectedJSContext::CleanupIDBTransactions xpcom/base/CycleCollectedJSContext.cpp:323
1 libxul.so mozilla::CycleCollectedJSContext::PerformMicroTaskCheckPoint xpcom/base/CycleCollectedJSContext.cpp:492
2 libxul.so mozilla::CycleCollectedJSContext::LeaveMicroTask xpcom/base/CycleCollectedJSContext.h:197
3 libxul.so mozilla::nsAutoMicroTask::~nsAutoMicroTask xpcom/base/CycleCollectedJSContext.h:290
4 libxul.so nsXBLProtoImplAnonymousMethod::Execute dom/xbl/nsXBLProtoImplMethod.cpp:334
5 libxul.so nsBindingManager::RemovedFromDocumentInternal dom/xbl/nsBindingManager.cpp:220
6 libxul.so mozilla::dom::RemoveFromBindingManagerRunnable::Run dom/base/Element.cpp:1872
7 libxul.so nsContentUtils::RemoveScriptBlocker dom/base/nsContentUtils.cpp:5783
8 libxul.so mozilla::PresShell::DoFlushPendingNotifications dom/base/nsContentUtils.h:3579
9 libxul.so nsDocument::FlushPendingNotifications dom/base/nsDocument.cpp:7602

=============================================================

There is 1 crash in nightly 60 with buildid 20180305100140. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1193394.

[1] https://hg.mozilla.org/mozilla-central/rev?node=068c59c7c4ec
Flags: needinfo?(arai.unmht)
I'll look into this tomorrow
The bug is high up in the stack in the media code, if I read the stack correctly.
Component: DOM → Audio/Video
Either MediaDecoder::UpdateLogicalPositionInternal() shouldn't be scheduled to run at stable state
https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/dom/media/MediaDecoder.cpp#261-263

or https://hg.mozilla.org/mozilla-central/annotate/51200c0fdaddb2749549a82596da5323a4cbd499/dom/media/MediaDecoder.cpp#l860 should be postponed to happen outside stable state.

or mozilla::detail::RunnableMethodImpl<FdWatcher*, void (FdWatcher::*)(), true, mozilla::RunnableKind::Standard>::Run in the stack should not happen inside a stable state.

rillian may have an opinion, or bholley.
(In reply to Olli Pettay [:smaug] (unusual timezone [JST] for a week) from comment #3)
> Either MediaDecoder::UpdateLogicalPositionInternal() shouldn't be scheduled
> to run at stable state
> https://searchfox.org/mozilla-central/rev/
> efce4ceddfbe5fe4e2d74f1aed93894bada9b273/dom/media/MediaDecoder.cpp#261-263

It needs to be, because that's how all the state watching stuff works.

> or
> https://hg.mozilla.org/mozilla-central/annotate/
> 51200c0fdaddb2749549a82596da5323a4cbd499/dom/media/MediaDecoder.cpp#l860
> should be postponed to happen outside stable state.

Yeah, seems like the call to FireTimeUpdate could be delayed. We don't want to call into JS from a state watching callback.

Thoughts jya?
Flags: needinfo?(arai.unmht) → needinfo?(jyavenard)
Component: Audio/Video → Audio/Video: Playback
Priority: -- → P2
Assignee: nobody → jyavenard
Flags: needinfo?(jyavenard)
Sorry, I've let this sit for too long.

(In reply to Bobby Holley (:bholley) from comment #4)
e4ceddfbe5fe4e2d74f1aed93894bada9b273/dom/media/MediaDecoder.cpp#261-263
> 
> It needs to be, because that's how all the state watching stuff works.

Could you explain why that be?

> 
> > or
> > https://hg.mozilla.org/mozilla-central/annotate/
> > 51200c0fdaddb2749549a82596da5323a4cbd499/dom/media/MediaDecoder.cpp#l860
> > should be postponed to happen outside stable state.
> 
> Yeah, seems like the call to FireTimeUpdate could be delayed. We don't want
> to call into JS from a state watching callback.

I'm confused now.
State Watching was precisely done to ensure we had JS variables updated in an apparent synchronicity even when the canonical is updated in another thread. We do that for many of JS attributes (buffered, currentTime, seekingRange etc)

So if we can't update those through a state watcher, it makes most of our use of state watching incorrect.
Flags: needinfo?(bobbyholley)
We discussed on IRC.

[14:43:27]  <bholley>	jya: there are two things we want to do here
[14:43:31]  <bholley>	jya: (1) fix this case
[14:43:36]  <bholley>	jya: (2) assert against it happening
[14:43:55]  <bholley>	jya: (1) happens by queuing TextTrackManager::TimeMarchesOn as an event
[14:44:23]  <bholley>	(2) happens by setting a static bool if we're main-thread in EventTargetWrapper::FireTailDispatcher, and then diagnostic-asserting against that bool if we're main thread in AutoJSAPI::InitInternal
[14:44:33]  <jya>	Not sure sure about that 1) I need to check the spec, but no matter, that needs fixing
Flags: needinfo?(bobbyholley)
See also: https://docs.google.com/presentation/d/1momsC3suU8m-CrdZyYD_6QATATehjzZHbkGmL6KsmSk/edit#slide=id.g28ecd2197a_0_269

In particular:

"However, the algorithm run in stable state should not do anything visible from the script. i.e., firing event or running JS callback which unexpectedly makes content “unstable” again."
(In reply to Bobby Holley (:bholley) from comment #7)
> "However, the algorithm run in stable state should not do anything visible
> from the script. i.e., firing event or running JS callback which
> unexpectedly makes content “unstable” again."

And so given that, I think it would be ideal to set the assertion boolean in CycleCollectedJSContext::ProcessStableStateQueue, so that we can catch all bugs of this variant, rather than only those related to state watching.
There are 163 crashes in the past week with this signature.

https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3ACycleCollectedJSContext%3A%3ACleanupIDBTransactions&date=%3E%3D2018-11-15T19%3A43%3A46.000Z&date=%3C2018-11-22T19%3A43%3A46.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1

So far, of the dozen I've looked at, none were due to the use of StateWatcher, many occurs in HTMLImageElement::LoadSelectedImage
like this one:
https://crash-stats.mozilla.com/report/index/4f1d1e8f-1fb2-4a20-b37f-f4f400181122

We can fix the case involving state watched in particular, but it's not going to fix the great majority of the crashes.
:smaug, any recommendations?
Flags: needinfo?(bugs)
This is tricky. How can we ensure that code using stable states doesn't end up running JS?
That particular HTMLImageElement case is a regression from 
https://bugzilla.mozilla.org/show_bug.cgi?id=1037643
In the old days we explicitly started loads and such using script runners (I recall fixing several security bugs related to starting loads at unsafe time),
but have apparently since then moved to use stable states in some cases - I presume mostly because the specs want that behavior.
Luckily JS in stable state isn't a security bug, but more like a possible web compat issue.
But it does tell always that we ended up running some code, which we didn't anticipate to run, which is worrisome.

So far I haven't come up with anything better than adding more assertions and fix all the problematic cases we find.
A hacky temporary workaround for simpler cases like that HTMLImageElement load trigger could be to ensure nested microtasks and stable states and handled only after the currently being handled stable states are handled. But it wouldn't help at all with those couple media handling cases where event loop spinning has been happening (perhaps all those cases have been fixed already).
Flags: needinfo?(bugs)
There are crashes such as this one using today's nightly, not media, no HTMLImageElement

https://crash-stats.mozilla.com/report/index/709bc626-544f-49b3-87b8-425600181122

though it may be that the crash stack is truncated, it appears quite short.

I'll open a bug for the HTMLImageElement one
(In reply to Olli Pettay [:smaug] (high review load) from comment #10)
> This is tricky. How can we ensure that code using stable states doesn't end
> up running JS?
> That particular HTMLImageElement case is a regression from 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1037643
> In the old days we explicitly started loads and such using script runners (I
> recall fixing several security bugs related to starting loads at unsafe
> time),
> but have apparently since then moved to use stable states in some cases - I
> presume mostly because the specs want that behavior.
> Luckily JS in stable state isn't a security bug, but more like a possible
> web compat issue.
> But it does tell always that we ended up running some code, which we didn't
> anticipate to run, which is worrisome.
> 
> So far I haven't come up with anything better than adding more assertions
> and fix all the problematic cases we find.

If we add the MOZ_DIAGNOSTIC_ASSERT I suggest in comment 8 and comment 6, we should be able to catch all of these closer to the source.
Yeah, adding some more assertions could be useful. Adding one to AutoJSAPI would catch some, and the ones like MOZ_RELEASE_ASSERT(!mDoingStableStates) in CycleCollectedJSContext.cpp catch then different issues (spinning event loop for example).
Due to the state watcher logic, mirror tasks can be dispatched while in stable state. We must have visible JS change during such stable state.
And minor C++ cleanup to keep the static analyser happy.

Depends on D12699
Depends on D12699
Attachment #9027044 - Attachment description: Bug 1443429 - P2. Fix constness. r?bholley! And minor C++ cleanup to keep the static analyser happy. → Bug 1443429 - P2. Fix constness. r?bholley!
Probably move this bug as meta. As there's a lot of fixes required to get that commit through

https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=213540214&revision=4479f96239d918a11144909b474e88dd1741b93a
(In reply to Jean-Yves Avenard [:jya] from comment #19)
> Probably move this bug as meta. As there's a lot of fixes required to get
> that commit through
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&selectedJob=213540214&revision=4479f96239d918a11144909b474e88dd
> 1741b93a

I think this bug should be for the specific crash we're fixing (which we should fix now), and then we can file another bug for getting the assertion landed.
Attachment #9027042 - Attachment is obsolete: true
Bug 1363421 made the initialization UserAgentOverrides.jsm be done on first initialization, which is typically done during stable state.

Even should we only prevent microtasks and/or event to be fired, this will still be a problem.
Blocks: 1363421
Component: Audio/Video: Playback → DOM
(In reply to Jean-Yves Avenard [:jya] from comment #21)
> Bug 1363421 made the initialization UserAgentOverrides.jsm be done on first
> initialization, which is typically done during stable state.

I'm considering rewriting UserAgentOverrides.jsm as a C++ service, mostly for performance reasons.
Would that help this bug?
(In reply to Valentin Gosu [:valentin] from comment #22)
> (In reply to Jean-Yves Avenard [:jya] from comment #21)
> > Bug 1363421 made the initialization UserAgentOverrides.jsm be done on first
> > initialization, which is typically done during stable state.
> 
> I'm considering rewriting UserAgentOverrides.jsm as a C++ service, mostly
> for performance reasons.
> Would that help this bug?

yes. it would fix some of those crashes (both with the image and media elements)
Depends on: 1513574
Blocks: 1513582
Attachment #9027045 - Attachment is obsolete: true
Attachment #9027041 - Attachment description: Bug 1443429 - P1. Ensure that we don't fire events while in stable state. r?pehrsons! → Bug 1443429 - P1. Ensure that we don't run JS wrapper while in stable state. r?pehrsons!
WebVTTListener is a JS wrapper around vtt.js.

Depends on D12702
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/df4554217d9a
P1. Ensure that we don't run JS wrapper while in stable state. r=pehrsons
https://hg.mozilla.org/integration/autoland/rev/dd14a663cea2
P2. Fix constness. r=bholley
https://hg.mozilla.org/integration/autoland/rev/f0bd2be0b68c
P3. Don't use WebVTTListener while in stable state. r=smaug
https://hg.mozilla.org/integration/autoland/rev/1ae2a252d3e8
P4. Ensure MediaShutdownManager is initialized outside stable state. r=pehrsons
Depends on: 1515021
Is this something we should consider uplifting to Beta or can it ride the trains? Crash rate doesn't look too high.
Flags: needinfo?(jyavenard)

We can certainly uplift this to 65. We did get a regression in bug 1515021 which will have to be uplifted too

Flags: needinfo?(jyavenard)

The more I think about it, the more I think that this isn't worth backporting with a week left before we're building RCs. Let's let this ride with 66 unless you feel strongly otherwise.

Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.