Closed
Bug 1158761
Opened 9 years ago
Closed 9 years ago
crash in nsBaseAppShell::RunSyncSectionsInternal(bool, unsigned int)
Categories
(Core :: Widget, defect)
Tracking
()
RESOLVED
FIXED
mozilla40
People
(Reporter: ananuti, Assigned: bugzilla)
References
Details
(Keywords: crash, regression, topcrash-win)
Crash Data
Attachments
(2 files)
1.08 KB,
patch
|
bholley
:
review+
lizzard
:
approval-mozilla-beta+
Sylvestre
:
approval-mozilla-release+
Sylvestre
:
approval-mozilla-esr38+
|
Details | Diff | Splinter Review |
6.92 KB,
patch
|
jimm
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-f33171a8-c0cf-4048-a76e-bccfb2150427. ============================================================= STR: 1. verify java https://www.java.com/verify 2. closed tab after verified
Comment 1•9 years ago
|
||
So, just as I was worried about. In bug 1157488, I added a release-mode assertion against re-entering sync sections, because that machinery is totally not re-entrant and it seemed like exactly the sort of thing that would cause weird plugin crashes. The crash report here indicates that Java is doing something exactly like that. This could explain a variety of weird plugin crashes. Aaron, are you the right person to look into this? If not, any idea who is?
Flags: needinfo?(bobbyholley) → needinfo?(aklotz)
Comment 2•9 years ago
|
||
To be clear, the invariant that this assertion is trying to enforce is that code run from RunSyncSectionsInternal never spins the event loop, which is presumably what's happening here. It would be nice if the stack in the crash were more useful so that we could figure out how that's happening and whether that's an invariant we can really guarantee or whether we need to handle it somehow.
What kind of information would you like to see in the stack? I might be able to clean it up a bit.
I am almost certain it's possible to run arbitrary script from some of these callbacks (e.g. the one used to load images).
Comment 5•9 years ago
|
||
> What kind of information would you like to see in the stack? I might be able > to clean it up a bit. It would be helpful to see: RunSyncSectionsInternal() ... EventLoopSpinning ... SomeCulprit() RunSyncSectionsInternal() So that we can identify |SomeCulprit|. (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4) > I am almost certain it's possible to run arbitrary script from some of these > callbacks (e.g. the one used to load images). You mean javascript URIs etc? Do we have a hope of forbidding that? If not, how do we handle this case?
Assignee | ||
Comment 6•9 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #1) > So, just as I was worried about. > > In bug 1157488, I added a release-mode assertion against re-entering sync > sections, because that machinery is totally not re-entrant and it seemed > like exactly the sort of thing that would cause weird plugin crashes. The > crash report here indicates that Java is doing something exactly like that. > This could explain a variety of weird plugin crashes. > > Aaron, are you the right person to look into this? If not, any idea who is? In the de facto sense, I probably am. Thanks for the heads up.
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(aklotz)
(In reply to Bobby Holley (:bholley) from comment #5) > (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4) > > I am almost certain it's possible to run arbitrary script from some of these > > callbacks (e.g. the one used to load images). > > You mean javascript URIs etc? Do we have a hope of forbidding that? If not, > how do we handle this case? I mean that HTMLImageElement::QueueImageLoadTask queues a call to nsImageLoadingContent::LoadImage, which can fire DOM events.
I get the feeling this ought to be easy to repro. Live machines are better than minidumps. Anyone wanna give it a try? Crash ID Comment 27d9d941-21fb-4b28-badd-8cd142150427 Java Test?! 5b136807-83cf-4af3-8379-e9c9b2150426 Java 64 Bit check 8d563483-c36d-409e-9a67-bf79d2150426 Trying to run the Java test page. Keeps crashing.
Comment 9•9 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #7) > (In reply to Bobby Holley (:bholley) from comment #5) > > (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4) > > > I am almost certain it's possible to run arbitrary script from some of these > > > callbacks (e.g. the one used to load images). > > > > You mean javascript URIs etc? Do we have a hope of forbidding that? If not, > > how do we handle this case? > > I mean that HTMLImageElement::QueueImageLoadTask queues a call to > nsImageLoadingContent::LoadImage, which can fire DOM events. Couldn't we fire those events asynchronously, or at the very least append them to run as another sync section?
Possibly, idk. Putting them in another sync section doesn't help because you can still do an XHR from there.
Comment 11•9 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #10) > Possibly, idk. Putting them in another sync section doesn't help because > you can still do an XHR from there. Right, good point. They'd need to be async I guess.
Comment 12•9 years ago
|
||
xul!nsBaseAppShell::RunSyncSectionsInternal xul!nsBaseAppShell::OnProcessNextEvent xul!nsThread::ProcessNextEvent xul!NS_ProcessPendingEvents xul!nsBaseAppShell::NativeEventCallback xul!nsAppShell::EventWindowProc user32!InternalCallWinProc user32!UserCallWinProcCheckWow user32!DispatchMessageWorker user32!DispatchMessageA [opaque npjp2 frames] xul!nsPluginHost::StopPluginInstance xul!nsObjectLoadingContent::DoStopPlugin xul!nsObjectLoadingContent::StopPluginInstance xul!CheckPluginStopEvent::Run xul!nsBaseAppShell::RunSyncSectionsInternal <--- xul!nsBaseAppShell::AfterProcessNextEvent xul!NS_ProcessNextEvent xul!mozilla::ipc::MessagePump::Run xul!MessageLoop::RunHandler xul!MessageLoop::Run xul!nsBaseAppShell::Run xul!nsAppShell::Run xul!nsAppStartup::Run xul!XREMain::XRE_mainRun xul!XREMain::XRE_main xul!XRE_main firefox!do_main firefox!NS_internal_main firefox!wmain firefox!__tmainCRTStartup kernel32!BaseThreadInitThunk ntdll!__RtlUserThreadStart ntdll!_RtlUserThreadStart
Flags: needinfo?(dmajor)
Comment 13•9 years ago
|
||
Thanks a lot dmajor! Aaron, is there a reason that CheckPluginStopEvent needs to run as a sync section rather than fully asynchronously?
Comment 14•9 years ago
|
||
Visiting this Java link with the latest Nightly makes me crash: http://java.com/en/download/installed8.jsp?detect=jre CR: https://crash-stats.mozilla.com/report/index/e2c41166-2cec-4782-aa18-56f3e2150428
Keywords: regression
Comment 15•9 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #13) > Aaron, is there a reason that CheckPluginStopEvent needs to run as a sync > section rather than fully asynchronously?
Flags: needinfo?(aklotz)
Assignee | ||
Comment 16•9 years ago
|
||
I have no idea :-( I'm not very familiar with that part of the plugin code, I'm afraid.
Flags: needinfo?(aklotz)
Assignee | ||
Comment 17•9 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #13) > Aaron, is there a reason that CheckPluginStopEvent needs to run as a sync > section rather than fully asynchronously? Let's see if johns has some time to comment...
Flags: needinfo?(john)
Comment 18•9 years ago
|
||
Any code anywhere that touches plugins, in terms of starting/stopping/script-touching them, runs the risk of the plugin spinning the event loop as such. Java spins the event loop basically 100% of the time you touch it. It's method of operation as far as I can tell is:
> InteractionFromBrowser() {
> SendMessageToJavaThread();
> while (!IsResponseAvailable()) {
> SpinEventLoop();
> }
> return GetResponse();
> }
I don't fully appreciate the differences of a sync vs "100% async" event on the loop. I think CheckPluginStopEvent could be async, but unless you can guarantee that no sync section ever runs content script, there's no way they won't hit java spinning the event loop.
Flags: needinfo?(john)
Comment 19•9 years ago
|
||
(In reply to John Schoenick [:johns] from comment #18) > I don't fully appreciate the differences of a sync vs "100% async" event on > the loop. "100% async" means creating a runnable and dispatching as normal. "Sync sections" are runnables that are explicitly dispatched to the AppShell via RunInStableState. These events get run "between" events - they're basically a special queue that gets drained each time we process an event. > I think CheckPluginStopEvent could be async That sounds great - we should try it. Do you happen to know why this code is currently using RunInStableState? > but unless you can guarantee that no sync section ever runs content script That would be the goal here. > there's no way they won't hit java spinning the event loop. Right. So we need to avoid running content script and java in sync sections.
Comment 20•9 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #19) > (In reply to John Schoenick [:johns] from comment #18) > > I don't fully appreciate the differences of a sync vs "100% async" event on > > the loop. > > "100% async" means creating a runnable and dispatching as normal. "Sync > sections" are runnables that are explicitly dispatched to the AppShell via > RunInStableState. These events get run "between" events - they're basically > a special queue that gets drained each time we process an event. > > > I think CheckPluginStopEvent could be async > > That sounds great - we should try it. Do you happen to know why this code is > currently using RunInStableState? IIRC This code was using RunInStableState for plugin events since antiquity, and I was simply preserving that ordering when I added this event. I'm ~80% sure it doesn't need it, but it could introduce some new issues with e.g. plugins outliving their owning page - but nothing unfixable. > > but unless you can guarantee that no sync section ever runs content script > > That would be the goal here. > > > there's no way they won't hit java spinning the event loop. > > Right. So we need to avoid running content script and java in sync sections. Anything that touches plugins at all should probably assert in sync sections - probably in PluginModuleParent. It's not just java that does this :(. I think there's even an edge case in windows where we have to spin the event loop when IPCing to plugin-container to avoid a windows API deadlock (child is waiting for us to process a message).
Comment 21•9 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #7) > (In reply to Bobby Holley (:bholley) from comment #5) > > (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4) > > > I am almost certain it's possible to run arbitrary script from some of these > > > callbacks (e.g. the one used to load images). > > > > You mean javascript URIs etc? Do we have a hope of forbidding that? If not, > > how do we handle this case? > > I mean that HTMLImageElement::QueueImageLoadTask queues a call to > nsImageLoadingContent::LoadImage, which can fire DOM events. ( this is also my fault ) The LoadImage call needs to be sync, but IIRC these events firing inline from the LoadImage task isn't necessary/proper per spec
Reporter | ||
Updated•9 years ago
|
Keywords: topcrash-win
Comment 22•9 years ago
|
||
This is now the #1 crash in nightly 40. Is this issue tied to nightly-only features, or will it ride the trains if we don't do anything?
Assignee | ||
Comment 23•9 years ago
|
||
(In reply to David Major [:dmajor] from comment #22) > This is now the #1 crash in nightly 40. Is this issue tied to nightly-only > features, or will it ride the trains if we don't do anything? It will ride the trains. I'll try moving CheckPluginStopEvent and see what happens.
Comment 24•9 years ago
|
||
(In reply to David Major [:dmajor] from comment #22) > This is now the #1 crash in nightly 40. Is this issue tied to nightly-only > features, or will it ride the trains if we don't do anything? The reason this is crashing is that I added a release-mode assertion against this behavior. But we're only asserting in the case where we enter non-reentrant-safe code, which will cause us to run the same event multiple times. This is extremely unsafe, and most likely the cause of other, harder-to-track-down crashes.
(In reply to Bobby Holley (:bholley) from comment #24) > The reason this is crashing is that I added a release-mode assertion And to think I doubted that this would happen ;)
Comment 26•9 years ago
|
||
It's a useful assert; it helped us find this issue. And if we can fix things so that it stops firing, even better! [Tracking Requested - why for this release]: comment 23
tracking-firefox40:
--- → ?
Assignee | ||
Comment 27•9 years ago
|
||
Changing CheckPluginStopEvent over to a regular runnable has broken a bunch of tests :-( https://treeherder.mozilla.org/#/jobs?repo=try&revision=5106c6981c18
Comment 28•9 years ago
|
||
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #27) > Changing CheckPluginStopEvent over to a regular runnable has broken a bunch > of tests :-( > > https://treeherder.mozilla.org/#/jobs?repo=try&revision=5106c6981c18 Looks like it's just dom/plugins/test/mochitest/test_instance_unparentN.html, no? Is that actually testing invariants that need to be preserved?
Assignee | ||
Comment 29•9 years ago
|
||
I'll take a closer look tomorrow.
Comment 30•9 years ago
|
||
This test is wrong:
> p1.callOnDestroy(onDestroy);
> setTimeout(checkPluginAlreadyDestroyed, 0);
> d1.removeChild(p1);
The setTimeout will now fire before the CheckPluginStop event triggered from the third line. I don't think this test means to be checking that the plugin will stop before other events, so those two lines can likely be swapped to fix this.
Comment 31•9 years ago
|
||
This also happens under e10s, but it is much less frequently. https://crash-stats.mozilla.com/search/?product=Firefox&signature=~nsBaseAppShell%3A%3ARunSyncSectionsInternal&version=40.0a1&dom_ipc_enabled=!__null__&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports 18 results over 7 days. example: https://crash-stats.mozilla.com/report/index/dfaab0d1-1218-42ae-bef8-a34812150429
Comment 33•9 years ago
|
||
Since the crash report indicates there are crashes in 39, 38, as well as 40, I changed the flags for those to affected, but only set tracking for 40.
status-firefox38.0.5:
--- → affected
status-firefox39:
--- → affected
Comment 34•9 years ago
|
||
I'd prefer not to see this crash reach our aurora users. Merge day is coming soon. If we can't get a fix in time, how about silencing the assert for now? The volume on this crash is really high and I can't imagine that it would be replaced by an equal number of weird plugin crashes.
Assignee | ||
Comment 35•9 years ago
|
||
(In reply to John Schoenick [:johns] from comment #30) > This test is wrong: > > > p1.callOnDestroy(onDestroy); > > setTimeout(checkPluginAlreadyDestroyed, 0); > > d1.removeChild(p1); > > The setTimeout will now fire before the CheckPluginStop event triggered from > the third line. I don't think this test means to be checking that the plugin > will stop before other events, so those two lines can likely be swapped to > fix this. I've tried that with no luck, unfortunately. checkPluginAlreadyDestroyed is still executing before the plugin is destroyed.
Assignee | ||
Comment 36•9 years ago
|
||
OK, I think I've got a fix for those tests. It looks like the first executeSoon arrives before the plugin stop event, but a second one arrives after. Running a try push ATM, but I think that if checkPluginAlreadyDestroyed executeSoons itself a second time, the tests pass.
Assignee | ||
Comment 37•9 years ago
|
||
I'll put this part up first so we can get that reviewed. I'll upload the tests as soon as I've verified that they look OK on try.
Attachment #8602898 -
Flags: review?(bobbyholley)
Comment 38•9 years ago
|
||
Comment on attachment 8602898 [details] [diff] [review] Part 1: Run the event fully asynchronously Review of attachment 8602898 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for seeing this through! r=bholley
Attachment #8602898 -
Flags: review?(bobbyholley) → review+
Assignee | ||
Comment 39•9 years ago
|
||
This patch just updates tests to take into account that there is now a slight delay in the execution of the plugin stop event. We re-run the check a few times until the plugin stop event has executed. This is a little fragile but I'm not optimistic that there is any alternative.
Updated•9 years ago
|
Attachment #8603204 -
Flags: review?(jmathies) → review+
Comment 40•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/e7219efa371e https://hg.mozilla.org/integration/mozilla-inbound/rev/afbc1f7623e7
Comment 41•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/e7219efa371e https://hg.mozilla.org/mozilla-central/rev/afbc1f7623e7
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Comment 42•9 years ago
|
||
This signature is now gone from desktop nightlies. The crashes continue on Fennec; I've opened bug 1163814.
Comment 43•9 years ago
|
||
Can we nominate this for Beta/release/esr38 please?
status-firefox-esr38:
--- → affected
Flags: needinfo?(aklotz)
Assignee | ||
Comment 44•9 years ago
|
||
Comment on attachment 8602898 [details] [diff] [review] Part 1: Run the event fully asynchronously [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: This simple patch could help alleviate numerous plugin crashes which is something that would be nice to fix in ESR. User impact if declined: Weird plugin crashes Fix Landed on Version: Landed on 40, already requesting uplift to 39 and 38 release Risk to taking this patch (and alternatives if risky): Low, trivial patch String or UUID changes made by this patch: None See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info. Approval Request Comment [Feature/regressing bug #]: NPAPI Plugins [User impact if declined]: Weird plugin crashes [Describe test coverage new/current, TreeHerder]: Covered by existing tests plus test modifications in part 2 of this patch [Risks and why]: Low, trivial patch, already on aurora [String/UUID change made/needed]: None
Flags: needinfo?(aklotz)
Attachment #8602898 -
Flags: approval-mozilla-release?
Attachment #8602898 -
Flags: approval-mozilla-esr38?
Attachment #8602898 -
Flags: approval-mozilla-beta?
Comment 45•9 years ago
|
||
Comment on attachment 8602898 [details] [diff] [review] Part 1: Run the event fully asynchronously Top crash, taking it.
Attachment #8602898 -
Flags: approval-mozilla-release?
Attachment #8602898 -
Flags: approval-mozilla-release+
Attachment #8602898 -
Flags: approval-mozilla-esr38?
Attachment #8602898 -
Flags: approval-mozilla-esr38+
Attachment #8602898 -
Flags: approval-mozilla-beta?
Attachment #8602898 -
Flags: approval-mozilla-beta+
Comment 46•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/6abba49b6c14 https://hg.mozilla.org/releases/mozilla-beta/rev/0a02d104de2a
Flags: in-testsuite+
Comment 47•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-release/rev/c163f5453215 https://hg.mozilla.org/releases/mozilla-release/rev/aa884d29e93c
Comment 48•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-esr38/rev/e46129b3427d https://hg.mozilla.org/releases/mozilla-esr38/rev/0b2ca932ae87
Comment 49•9 years ago
|
||
Backed out from Beta due to test_instance_re-parent.html failures. Interestingly, release/esr38 are OK. Aaron is investigating, but we decided to backout for now to avoid getting in the way of other release work off the branch. https://treeherder.mozilla.org/logviewer.html#?job_id=372247&repo=mozilla-beta https://hg.mozilla.org/releases/mozilla-beta/rev/31d67ade8354
Flags: needinfo?(aklotz)
Comment 50•9 years ago
|
||
Setting qe-verify- for this fix since it's already covered by automated tests.
Flags: qe-verify-
Comment 51•9 years ago
|
||
Any news here? :)
Comment 52•9 years ago
|
||
I missed this for 39 - it wasn't tracked. Tracking it now!
tracking-firefox39:
--- → +
Comment 53•9 years ago
|
||
The immediate symptom of this particular crash can't be happening on 39, since the assert was added in 40 by bug 1157488. The underlying problem is still present on 39, and may be causing random crashes out in the long tail. Uplifting would be good if we can, but I think we could survive without if we had to.
Comment 54•9 years ago
|
||
OK. sounds like I can wontfix this for 39. If that's not correct and you all have a fix coming, please let me know and re-set the flag to "affected". It seems odd to have this fixed in 38 and 40 but not in 39.
Updated•9 years ago
|
Comment 55•9 years ago
|
||
Comment on attachment 8602898 [details] [diff] [review] Part 1: Run the event fully asynchronously Removing the approval flag to get it off the needs-uplift radar.
Attachment #8602898 -
Flags: approval-mozilla-beta+
Assignee | ||
Comment 56•9 years ago
|
||
I've uplifted bug 1172627 to beta. That should take care of test_instance_re-parent.html.
Flags: needinfo?(aklotz) → needinfo?(ryanvm)
Comment 57•9 years ago
|
||
Try push of these patches re-applied to mozilla-beta tip: https://treeherder.mozilla.org/#/jobs?repo=try&revision=408ac3ec45eb
Comment 58•9 years ago
|
||
Comment on attachment 8602898 [details] [diff] [review] Part 1: Run the event fully asynchronously Try likes it. Re-nominating for beta.
Flags: needinfo?(ryanvm)
Attachment #8602898 -
Flags: approval-mozilla-beta?
Comment 59•9 years ago
|
||
Comment on attachment 8602898 [details] [diff] [review] Part 1: Run the event fully asynchronously OK, let's uplift this to beta.
Attachment #8602898 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Updated•9 years ago
|
Comment 60•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/e688b0b0dbd4 https://hg.mozilla.org/releases/mozilla-beta/rev/764dff9361a3
You need to log in
before you can comment on or make changes to this bug.
Description
•