Closed Bug 1116540 Opened 8 years ago Closed 6 years ago

startup crash in mozalloc_abort(char const* const) | NS_DebugBreak | gfxPlatform::Init()

Categories

(Core :: Graphics, defect)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox45 --- affected
firefox46 --- affected
firefox47 --- wontfix
firefox48 --- ?
firefox49 --- fixed

People

(Reporter: wsmwk, Assigned: dvander)

Details

(Keywords: crash, Whiteboard: [startupcrash][tbird crash][gfx-noted])

Crash Data

Attachments

(2 files)

#30 crash for Thunderbird 31.3.0

This bug was filed from the Socorro interface and is 
report bp-5eb2a2cd-f5fd-4f48-b428-002e22141230.
=============================================================
 0 	mozalloc.dll	mozalloc_abort(char const* const)	memory/mozalloc/mozalloc_abort.cpp
1 	xul.dll	NS_DebugBreak	xpcom/base/nsDebugImpl.cpp
2 	xul.dll	gfxPlatform::Init()	gfx/thebes/gfxPlatform.cpp
3 	xul.dll	gfxPlatform::GetPlatform()	gfx/thebes/gfxPlatform.cpp
4 	xul.dll	ShouldUseImageSurfaces	image/src/imgFrame.cpp
5 	xul.dll	imgFrame::Init(int, int, int, int, gfxImageFormat, unsigned char)	image/src/imgFrame.cpp
6 	xul.dll	mozilla::image::RasterImage::EnsureFrame(unsigned int, int, int, int, int, gfxImageFormat, unsigned char, unsigned char**, unsigned int*, unsigned int**, unsigned int*, imgFrame**)	image/src/RasterImage.cpp
7 	xul.dll	mozilla::image::RasterImage::EnsureFrame(unsigned int, int, int, int, int, gfxImageFormat, unsigned char**, unsigned int*, imgFrame**)	image/src/RasterImage.cpp
8 	xul.dll	mozilla::image::Decoder::AllocateFrame()	image/src/Decoder.cpp
9 	xul.dll	mozilla::image::RasterImage::InitDecoder(bool)	image/src/RasterImage.cpp
10 	xul.dll	mozilla::image::RasterImage::SyncDecode()	image/src/RasterImage.cpp
11 	xul.dll	mozilla::image::RasterImage::RequestDecodeIfNeeded(tag_nsresult, mozilla::image::RasterImage::eShutdownIntent, bool, bool)	image/src/RasterImage.cpp 

another example bp-7f034b1b-411a-4d8c-b66b-d55422141226
p.s. all are windows 7 and windows 8
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak | gfxPlatform::Init()] → [@ mozalloc_abort(char const* const) | NS_DebugBreak | gfxPlatform::Init()] [@ mozalloc_abort | NS_DebugBreak | gfxPlatform::Init]
In at least some of these, we're hitting the path that is gone as of 44 - trying to use D2D 1.0, rather than D2D 1.1, or when that is not available, dropping out of acceleration.  In this case, we appear to crash because we want to do D2D, but we can't find d2d1.dll.
I see this as one of the crashes. I don't have a STR yet. However, here the condition under which I get this crash:

On Win10 64-bit + Nightly 32-bit e10s DISABLED, when I view a Flash Video with VLC, with Firefox in the background, the firefox.exe size in task manager starts ballooning and I sometimes get an OOM and firefox crashes. This is repeatable but the crashes are not the same. Some of the crashes are:
https://crash-stats.mozilla.com/report/index/bp-ecf85e54-a5b0-4ae1-ba0b-e0baf2160318
https://crash-stats.mozilla.com/report/index/bp-ecd37cb0-3cd9-497b-bd1d-309f42160319 
3e5803d3-978f-47e7-a33f-5d11fa521176 (not submitted to crash-stats?)
https://crash-stats.mozilla.com/report/index/bp-eb34b0c4-61ff-4948-b487-3d33a2160319

On Win10 64-bit + Nightly 32-bit e10s ENABLED, when I view a Flash Video with VLC, with Firefox in the background, the plugin_container.exe size in task manager starts ballooning and I sometimes get an OOM and the PC crashes.
For Thunderbird this crash no longer seems relevant. There is only one recent report which was against Thunder 31 (no longer supported AFAIK).

For Firefox this crash remains an issue.
* Firefox 45 has 151 crashes reported over the last week.
* Firefox 46 has 21 crashes reported over the last week.
* Firefox 47 has 1 crash reported over the last week.
* Firefox 48 has 0 crashes reported over the last week.

Top 4 GPUs representing 53% of the crashes:
21.607% Intel Atom Processor Z36xxx/Z37xxx Series Graphics & Display (0x0f31)
> 92% of these are on Windows 8.1 using a 10.18.10.* driver
12.742% Intel Cherryview HD Graphics (0x22b1)
> 98% of these are on Windows 10 using a 10.18.15.* driver
9.972% AMD Radeon HD 4250 (0x9715)
> 100% of these are on Windows 7 using the 8.850.0.0 driver
8.864% Intel Haswell-ULT Integrated Graphics Controller (0x0a16)
> 70% of these are on Windows 10 using a 10.18.15.* driver
> 30% of these are on Windows 10 using a 20.19.15.* driver
7.202% Intel 3rd Gen Core processor Graphics Controller (0x0166)
> 40% Windows 10 on a 10.18.10.* driver
> 40% Windows 8.* on a 10.18.10.* driver

Top 5 graphics critical errors representing 65% of the crashes
> 45.56% |[0][GFX1]: [D2D1.1] 3CreateBitmap failure Size(1,1) Code: 0x8899000c format 0|[1][GFX1]: Failed to create DrawTarget, Type: 7 Size: Size(1,1)
> 15.83% |[0][GFX1]: No valid D2D factory available.|[1][GFX1]: Failed to create DrawTarget, Type: 1 Size: Size(1,1)
>  8.83% |[0][GFX1]: [D2D1.1] 3CreateBitmap failure Size(1,1) Code: 0x8899000c|[1][GFX1]: Failed to create DrawTarget, Type: 7 Size: Size(1,1)
>  2.78% |[0][GFX1]: [D2D1] Failed to create gfx factory's D2D1 device, code: 0x80004002|[1][GFX1]: Failed to create DrawTarget, Type: 7 Size: Size(1,1)
>  1.04% |[0][GFX1-]: Failed to create the graphics startup lockfile.|[1][GFX1]: No valid D2D factory available.|[2][GFX1]: Failed to create DrawTarget, Type: 1 Size: Size(1,1)

System memory usage seems to indicate this could be triggered by OOM in some, but not all cases.
> 14.404% are using >90% system memory
> 36.010% are using >80% system memory
> 52.076% are using >70% system memory
> 58.724% are using >60% system memory
> 38.504% are using <60% system memory

Hopefully this data helps move the bug forward but at present volume this doesn't even break into the top-300 stability issues.
Whiteboard: [startupcrash][tbird crash] → [startupcrash][tbird crash][gfx-noted]
(In reply to Milan Sreckovic [:milan] from comment #3)
> In at least some of these, we're hitting the path that is gone as of 44 -
> trying to use D2D 1.0, rather than D2D 1.1, or when that is not available,
> dropping out of acceleration.  In this case, we appear to crash because we
> want to do D2D, but we can't find d2d1.dll.

Three of these occurred in Nightly 20160506052823:

https://crash-stats.mozilla.com/report/index/3c818753-9cfa-422d-a0bb-3ee5d2160507
https://crash-stats.mozilla.com/report/index/a3198f99-5ef7-4545-8cb1-341f92160506
https://crash-stats.mozilla.com/report/index/e8eee40e-97a8-4f6c-9a85-ce5742160507

All of them, and all of the older ones I looked at, were hitting this line:

  NS_RUNTIMEABORT("Could not initialize mScreenReferenceDrawTarget");
David, in the first crash listed in comment 7, we are failing to make a D2D 1.1 draw target, but we have "disable D2D" set, as well as "acceleration disabled".
Flags: needinfo?(dvander)
(In reply to Milan Sreckovic [:milan] from comment #8)
> David, in the first crash listed in comment 7, we are failing to make a D2D
> 1.1 draw target, but we have "disable D2D" set, as well as "acceleration
> disabled".

Even worse: the compositor is listed as D3D11? Looking...
Flags: needinfo?(dvander)
The first and third crashes in comment #7 look like they might be the same user. I pulled that rev locally and set the same prefs, and I don't get D3D11 or D2D1. I also grabbed that nightly and couldn't reproduce this.

Unless the user had layers.acceleration.force-enabled set, this should be impossible. But that pref wasn't recorded in Telemetry.

So I'm very confused.

That aside, it might be possible (though I don't see how, yet) that reading the gfxConfig status for DIRECT2D in UpdateBackendPrefs is causing trouble. If for some reason the status is "Enabled" but the Factory isn't initialized, we'll use the wrong draw target. I'll do an instrumentation patch to see if this is happening.
Comment on attachment 8750401 [details] [diff] [review]
bug1116540-instrumentation.patch

Review of attachment 8750401 [details] [diff] [review]:
-----------------------------------------------------------------

Do these show up the same as MOZ_CRASH in the crash reports, so that we can search for them easily?
Attachment #8750401 - Flags: review?(milan) → review+
It looks like they do, with MOZ_RELEASE_ASSERT instead of MOZ_CRASH in the annotation.
https://hg.mozilla.org/mozilla-central/rev/648d9d0fd2f8
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #6)
> For Thunderbird this crash no longer seems relevant. There is only one
> recent report which was against Thunder 31 (no longer supported AFAIK).

Definitely true at the time you commented.  But just recently a bunch of daily channel crashes making it a topcrash for 49.0a1 starting with build 20160501 - all windows 10 and windows 8.
https://crash-stats.mozilla.com/search/?signature=%3Dmozalloc_abort+%7C+NS_DebugBreak+%7C+gfxPlatform%3A%3AInit&product=Thunderbird&_facets=signature&_facets=email&_columns=date&_columns=version&_columns=build_id&_columns=platform&_columns=email&_columns=user_comments#crash-reports

After several days I'll report back what happens with newer builds.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak | gfxPlatform::Init()] [@ mozalloc_abort | NS_DebugBreak | gfxPlatform::Init] → [@ mozalloc_abort(char const* const) | NS_DebugBreak | gfxPlatform::Init()] [@ mozalloc_abort | NS_DebugBreak | gfxPlatform::Init] [@ gfxWindowsPlatform::UpdateBackendPrefs]
Some of the crashes had warning that bug 1272114 dealt with, but we still have these assert/crashes since then (e.g., https://crash-stats.mozilla.com/report/index/1b384e49-230a-42d0-90d7-e1f492160516)
I'm super confused -- indeed we are setting Direct2D to enabled but not giving it a device. On one hand, we're not supposed to use gfxConfig as an indicator for whether a device exists. It's strictly for testing whether or not a feature should be turned off or on.

On the other hand, we do keep the D2D status in lock step with the device, so this is very odd. I think we should just fix the crash but I'd like to add one more bit of instrumentation.
Assignee: nobody → dvander
Status: REOPENED → ASSIGNED
Flags: needinfo?(dvander)
Attached patch patchSplinter Review
Turns out all the crash reports have "gfx.direct2d.force-enabled" set to true, which you can only tell from the encoded pref string in AppNotes. The bug is that, if layers.acceleration.disabled is true, but Direct2D is force-enabled, we don't early return in InitializeD2DConfig so we end up forcing it back on.

That's more of a weirdness than an actual bug, but this patch fixes it anyway. The actual bug (UpdateRenderMode not checking that a D2D device exists) is fixed as well. I also added an early return so D3D9 won't get enabled if HW_COMPOSITING is disabled.
Attachment #8753228 - Flags: review?(milan)
Attachment #8753228 - Flags: review?(milan) → review+
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/68271d9a1c4b08e1539a278012fcb423e4b45e17 for a rather fun result: according to https://treeherder.mozilla.org/logviewer.html#?job_id=28067581&repo=mozilla-inbound that disabled acceleration on our WinXP slaves, but apparently the only failures which result from doing so are that one test and two video reftests, https://treeherder.mozilla.org/logviewer.html#?job_id=28065442&repo=mozilla-inbound
Really stupid typo. Re-pushing with fix.
https://hg.mozilla.org/mozilla-central/rev/cb356a5f82cd
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Hi David, since this is a startup crash, should we consider uplifting to Aurora and/or Beta? I looked at the crash signatures and from a week's worth of data I could not find any occurrences on Nightly49. So we don't have confirmation whether this works or not.
Flags: needinfo?(dvander)
(In reply to Ritu Kothari (:ritu) from comment #26)
> Hi David, since this is a startup crash, should we consider uplifting to
> Aurora and/or Beta? I looked at the crash signatures and from a week's worth
> of data I could not find any occurrences on Nightly49. So we don't have
> confirmation whether this works or not.

The bug that I fixed is specific to Firefox 49 - so no way to uplift. I do see crashes with this signature on Firefox 47, but the reports suggest it's a totally different bug. We would have to land new instrumentation on beta or aurora. Is that something we want to do?
Flags: needinfo?(dvander)
(In reply to David Anderson [:dvander] from comment #27)
> (In reply to Ritu Kothari (:ritu) from comment #26)
> > Hi David, since this is a startup crash, should we consider uplifting to
> > Aurora and/or Beta? I looked at the crash signatures and from a week's worth
> > of data I could not find any occurrences on Nightly49. So we don't have
> > confirmation whether this works or not.
> 
> The bug that I fixed is specific to Firefox 49 - so no way to uplift. I do
> see crashes with this signature on Firefox 47, but the reports suggest it's
> a totally different bug. We would have to land new instrumentation on beta
> or aurora. Is that something we want to do?

If it's not a simple rebasing of patch to uplift to Beta, it might be too late for Beta47. Also this doesn't seem like a high volume crash on Beta or release (46.0.1) so wontfix might be ok.
You need to log in before you can comment on or make changes to this bug.