Closed Bug 1397214 Opened 7 years ago Closed 7 years ago

Crash in gfxPlatform::InitAcceleration

Categories

(Core :: Graphics, defect, P1)

57 Branch
Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 + fixed

People

(Reporter: calixte, Assigned: aosmond)

References

Details

(Keywords: crash, regression, topcrash, Whiteboard: [gfx-noted])

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-00e0635f-0699-4956-9824-fb39e0170906.
=============================================================

There are 1465 crashes in nightly 57 (coming from 34 installations!) with buildid  20170905220108.
There are 259 crashes in beta 56 (from 26 installations) and 359 in release 55 with the same signature.

:nical, could you investigate please ?
Flags: needinfo?(nical.bugzilla)
We've more than 2000 crashes for ~50 users: http://bit.ly/2eF9EGh
:mattwoodrow, could you investigate please ?
Flags: needinfo?(nical.bugzilla) → needinfo?(matt.woodrow)
This looks like a shutdown issue.

Why are we attempting to initialise the GPU stuff while we're shutting down?
Keywords: regression
Yep, looks like we're trying to initialize gfxPlatform during shutdown, and we crash because the Preferences API has already been shutdown.

This is unlikely to be a gfx issue, we need to figure out what changed to cause this initialization to happen at the wrong time.
Flags: needinfo?(matt.woodrow)
an affected user posted at https://www.reddit.com/r/firefox/comments/6yi5e6/tabs_keep_crashing_nightly/ in case we'd need some troubleshooting or more information from them
Crash Signature: [@ gfxPlatform::InitAcceleration] → [@ gfxPlatform::InitAcceleration] [@ gfxPlatform::InitAcceleration | gfxWindowsPlatform::InitAcceleration]
The reddit user linked in comment 5 indicates that a fresh profile does no good (in case there was any doubt) and lists several crash reports.
For information, the crash signature 'gfxPlatform::InitAcceleration' is now obsolete since gfxPlatform::Init has been added to prefix skiplist in Socorro [1].

[1] https://github.com/mozilla-services/socorro/commit/5d132e24bedd9abb2ef911ad588913bbde9a702f#diff-60e3eca13218b9b56380ed0a26fd92cc
:njn the only code in that pushlog that remotely touch what appears in this crash is bug 1392884.

I don't think it has anything to do at all.. but I figured you may want to look into this bug anyway
Flags: needinfo?(n.nethercote)
(In reply to Jean-Yves Avenard [:jya] from comment #8)
> :njn the only code in that pushlog that remotely touch what appears in this
> crash is bug 1392884.
> 
> I don't think it has anything to do at all.. but I figured you may want to
> look into this bug anyway

You mean the pushlog in comment 3? I don't see bug 1392884 there, and nsIAtomService changes seem unlikely to affect this.

I do see bug 1395828 in that pushlog, which is mine, but that does very minor changes to HTML parsing and also seems unlikely to be related.
Flags: needinfo?(n.nethercote)
I don't understand these crashes. Supposedly the crashing line is this one:

> Preferences::GetBool("media.windows-media-foundation.use-dxva", true) &&

The crash is an EXCEPTION_ACCESS_VIOLATION_READ at address 0x0, i.e. a null pointer read. I don't see how this line could trigger a null pointer read, given that it involves a static method call, a string literal, and a bool literal.
i) jya was referring to the pushlog of the nightly just before [1].
ii) since the condition in the "if" is on several lines, maybe we should take all the condition rather than just the part at line 2346, and so the problem could be to have a null pointer for gfxInfo.
iii) this signature is ranked #2 in top-crashers for content process.


[1] https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1401e3eec44df87963d3af329ef8a4183ab0483f&tochange=3ecda4678c49ca255c38b1697142b9118cdd27e7
Keywords: topcrash
I'm going to call this 57 only because that's where this is spiking, the volume for these signatures is very low on earlier branches (and earlier nightlies) so something changed in the last few days on central.
Could be related to the shutdown crashes Andrew is looking at...
Flags: needinfo?(aosmond)
Whiteboard: [gfx-noted]
This is the #1 non-ShutDownKill Windows topcrash in Nightly 20170908100218, with 772 occurrences, which is a lot for Nightly.
For me it happen 100% of the time on my Dell Laptop E7440 but not on my workstation (faster machine).
Also if I disable 'multi-process' it works.
Priority: -- → P1
See Also: → 1389021
[Tracking Requested - why for this release]: Nightly top crash.

Still the #1 non-ShutDownKill Windows top crash, in the 9-13 Nightly, with 742 occurrences, which as Nick said, is a lot.
Assignee: nobody → aosmond
Minh, could you run a quick experiment?  Leaving multi-process on, then in about:config:
1. Right click anywhere, choose New -> Boolean
2. For preference name, enter browser.pagethumbnails.capturing_disabled
3. Set the value to true
4. Shutdown Firefox

If you now restart Firefox, do you still have a crash on shutdown?
Flags: needinfo?(tranle)
When I have the multi-process on all my tabs a crashing here one of the most recent:
https://crash-stats.mozilla.com/report/index/bp-fddd60e0-dd46-4db7-8af1-095d90170914

I have tried with browser.pagethumbnails.capturing_disabled=true and it did not help.
Flags: needinfo?(tranle)
From the stack trace it seems that the content process has finished running its message loop <https://hg.mozilla.org/mozilla-central/annotate/dd6b788f1497/toolkit/xre/nsEmbedFunctions.cpp#l709> and then after that we start to process our pending events as part of that which is when we run ContentChild::RecvSetXPCOMProcessAttributes() which tries to initialize things way too late into shutdown.

It seems like a fix to this bug is to make ContentChild observe NS_XPCOM_SHUTDOWN_OBSERVER_ID, and set a boolean based on that, and check that boolean in ContentChild::RecvSetXPCOMProcessAttributes() and bail out early if that is true to avoid calling into gfxPlatform::Init() that late.  Note that ContentChild already has an mShuttingDown, but that is set in RecvShutdown() for cases where the ContentChild is being shut down by the parent process, and this doesn't seem to be one of those cases, since in that case IPDL message ordering would ensure SetXPCOMProcessAttributes would be delivered before Shutdown.
tracking as top crash
I did not find any smoking guns in the push log -- there are a few potentially related changes, but none of them appear to do anything that would obviously have the impact of the crash stack.

Unfortunately simply listening on the observer service does not appear to be possible since the ContentChild is created before the observer service, and the next IPDL method called is RecvSetXPCOMProcessAttributes. I used ClearOnShutdown as a workaround to this as all we need is a flag to check in RecvSetXPCOMProcessAttributes.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a53000d42a20fae93614d6d0fff460e6dc326ad
Flags: needinfo?(aosmond)
Comment on attachment 8908698 [details] [diff] [review]
Prevent ContentChild::RecvSetXPCOMProcessAttributes from running after shutdown., v1

Not sure if I initialized sShutdownObserver in the best place, let me know if you have other ideas :).
Attachment #8908698 - Flags: review?(ehsan)
Comment on attachment 8908698 [details] [diff] [review]
Prevent ContentChild::RecvSetXPCOMProcessAttributes from running after shutdown., v1

Review of attachment 8908698 [details] [diff] [review]:
-----------------------------------------------------------------

Nice trick!

I think we should land this and watch the crash reports and see if the crash goes away, since honestly my theory could end up being wrong...  If it proved to be ineffective then we can back it out and think of something else.

r=me for now.  Thanks a lot.

::: dom/ipc/ContentChild.cpp
@@ +519,5 @@
> +class ContentChild::ShutdownObserver final
> +{
> +public:
> +  ShutdownObserver()
> +  { }

Nit:

  ShutdownObserver() = default;

Or just omit it!

@@ +526,2 @@
>  ContentChild* ContentChild::sSingleton;
> +StaticAutoPtr<ContentChild::ShutdownObserver> ContentChild::sShutdownObserver;

"Observer" has a meaning associated with the observer service for the better or worse, how about calling this ShutdownCanary and sShutdownCanary?
Attachment #8908698 - Flags: review?(ehsan) → review+
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/47901a63dd2b
Prevent ContentChild::RecvSetXPCOMProcessAttributes from running after shutdown. r=ehsan
https://hg.mozilla.org/mozilla-central/rev/47901a63dd2b
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Depends on: 1400623
Depends on: 1400637
See Also: → 1399847
You need to log in before you can comment on or make changes to this bug.