Firefox 32 startup crash in _VEC_memzero | _VEC_memzero

RESOLVED FIXED in Firefox 32

Status

()

Core
Graphics
--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: Robert Kaiser, Assigned: bjacob)

Tracking

({crash})

Trunk
mozilla35
x86
Windows NT
crash
Points:
---

Firefox Tracking Flags

(firefox32+ verified, firefox33 fixed, firefox34 fixed, firefox35 fixed, firefox-esr31 unaffected)

Details

(crash signature)

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-5e80cac5-a6e4-4d8d-9c98-91d852140902.
=============================================================

In early Firefox 32 stats, we have a startup crash in "_VEC_memzero | _VEC_memzero" at #2 in the topcrash list, at ~40% the rate of the leading OOM|small signature.

See https://crash-stats.mozilla.com/report/list?signature=_VEC_memzero+|+_VEC_memzero&product=Firefox&process_type=browser&version=Firefox%3A32.0

Is this a resurrection of bug 988549 or something else?

The crash reasons seem to all be EXCEPTION_ACCESS_VIOLATION_WRITE and the address patterns sound familiar, possibly from that older bug.
(Reporter)

Comment 1

3 years ago
[Tracking Requested - why for this release]:
#2 top crash in early 32 data.

David, is this related to bug 1062452 and bug 1063052 potentially?

Benoit, is this a similar issue to bug 988549?
tracking-firefox32: --- → ?
Flags: needinfo?(dmajor)
Flags: needinfo?(bjacob)
(Assignee)

Comment 2

3 years ago
No idea, the stack really doesn't tell us much, and _VEC_memzero is very unspecific: it just means we're crashing while zeroing some buffer. Perhaps the best way to compare it to past gfx bugs would be to see how it correlates with AdapterVendorID / AdapterDeviceID.
Flags: needinfo?(bjacob)
Yes, it may be related to bug 1062452, these are mostly switchable Intel+ATI:

  _VEC_memzero | _VEC_memzero|EXCEPTION_ACCESS_VIOLATION_WRITE (267 crashes)
     92% (246/267) vs.   8% (3812/46772) atiuxpag.dll
     98% (261/267) vs.  16% (7352/46772) d3d10.dll
     98% (261/267) vs.  16% (7352/46772) d3d10core.dll
    100% (267/267) vs.  23% (10944/46772) igd10umd32.dll
Flags: needinfo?(dmajor)
status-firefox32: --- → affected
tracking-firefox32: ? → +
Something that caught my eye is that this is 100% Win7 RTM (not SP1). That means these machines have a known crashy driver version -- see bug 988549 comment 34.

    100% (879/880) vs.  23% (12843/55579) igd10umd32.dll
          8% (69/880) vs.   0% (121/55579) 8.15.10.2125
         92% (808/880) vs.   2% (1017/55579) 8.15.10.2141

Benoit, didn't we get the blacklisting for these versions sorted out?
Flags: needinfo?(bjacob)
Also: this is 97% 0x8086/0x0046 "Intel Graphics Media Accelerator HD". A spot-check shows D2D+ on all.
(Assignee)

Comment 6

3 years ago
(In reply to David Major [:dmajor] from comment #4)
> Something that caught my eye is that this is 100% Win7 RTM (not SP1). That
> means these machines have a known crashy driver version -- see bug 988549
> comment 34.
> 
>     100% (879/880) vs.  23% (12843/55579) igd10umd32.dll
>           8% (69/880) vs.   0% (121/55579) 8.15.10.2125
>          92% (808/880) vs.   2% (1017/55579) 8.15.10.2141
> 
> Benoit, didn't we get the blacklisting for these versions sorted out?


(In reply to David Major [:dmajor] from comment #5)
> Also: this is 97% 0x8086/0x0046 "Intel Graphics Media Accelerator HD". A
> spot-check shows D2D+ on all.



Indeed, on the release channel, versions < 8.15.10.2202 should be blacklisted:

http://hg.mozilla.org/releases/mozilla-release/file/tip/widget/windows/GfxInfo.cpp#l936

as device 0x0046 here falls under the "4500HD" category:

http://hg.mozilla.org/releases/mozilla-release/file/tip/widget/xpwidgets/GfxDriverInfo.cpp#l144

So it is mysterious why these would have D2D+. Do these machines have a second GPU (as reported in App Notes) ? Why does the system in comment 0 have an AMD GPU?
Flags: needinfo?(bjacob)
> So it is mysterious why these would have D2D+. Do these machines have a
> second GPU (as reported in App Notes) ? Why does the system in comment 0
> have an AMD GPU?

Comment 3 shows most of these having an ATI module loaded. I don't see dual GPU in App Notes, though. I do see "DriverVersionMismatch" on all of them:

https://crash-stats.mozilla.com/search/?version=32.0&signature=%3D_VEC_memzero+|+_VEC_memzero&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=app_notes
(Assignee)

Comment 8

3 years ago
(In reply to David Major [:dmajor] from comment #7)
> I do see "DriverVersionMismatch" on all of them:

Dang! Excellent find. We used to blacklist on that condition. The second patch on bug 984417 changed that to only reporting it in AppNotes, what you saw there, but not blacklisting.

As a hot fix for Firefox 32, we can easily revert to that behavior. That is, just back out https://hg.mozilla.org/mozilla-central/rev/35ff4bfb198f .

On mozilla-central and IMHO aurora at least, we should go for something smarter than that.

The issue at hand here is that the driver version is given to us in two different places: 1) from the Windows registry, as we report in AppNotes, and 2) from the DLL, as you examined in comment 4.

Our blacklisting logic, which also writes these AppNotes, uses only the value from the Windows registry.

We've long known that the value from the Windows registry is sometimes wrong i.e. different from the value in the DLL. That's what we call "DriverVersionMismatch".

We used to blacklist Direct2D whenever a DriverVersionMismatch happened, which is why we didn't crash. Going back to that behavior is the easy fix for a 32 chemspill. But going forward, we really should stop using this unreliable value from the registry, since we have the value from the DLL anyway.

This is nontrivial, because there are other things that we don't know how to do without the Windows registry. For example, how to get the GPU device id.

Well, we know how to do that --- that's what we've been doing on desktop Linux: start a separate process to query that information by creating an actual device/context and querying it. But we've never prioritize doing the right things around there.
(Assignee)

Comment 9

3 years ago
Created attachment 8486886 [details] [diff] [review]
backout-unblacklisting-DriverVersionMismatch

See above comment. That whole thing, that we sadly have to revert to, is a mess, but is made necessary by the Windows registry containing unreliable information. Will file a follow-up bug, to no longer depend on the Windows registry for blacklisting, but that is a nontrivial engineering project. The present backout patch is the only reasonable thing to do for the aurora/beta/release channels.
Attachment #8486886 - Flags: review?(bas)
Filed bug 1065212 about stopping to rely on the Windows registry to get this information.
Comment on attachment 8486886 [details] [diff] [review]
backout-unblacklisting-DriverVersionMismatch

Review of attachment 8486886 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/windows/GfxInfo.cpp
@@ +1029,5 @@
> +
> +    if (mHasDriverVersionMismatch) {
> +      if (aFeature == nsIGfxInfo::FEATURE_DIRECT3D_10_LAYERS ||
> +          aFeature == nsIGfxInfo::FEATURE_DIRECT3D_10_1_LAYERS ||
> +          aFeature == nsIGfxInfo::FEATURE_DIRECT2D)

Note we're not blocking D3D_11, which we maybe should do. But let's try this first and see if we run into trouble.
Attachment #8486886 - Flags: review?(bas) → review+
Comment on attachment 8486886 [details] [diff] [review]
backout-unblacklisting-DriverVersionMismatch

Note: OK, for channels that have DIRECT3D_11, I will add it and consider that the r+ extends to it.

Approval Request Comment
[Feature/regressing bug #]: bug 984417
[User impact if declined]: lots of crashes - enough to be a chemspill driver
[Describe test coverage new/current, TBPL]: none, in fact none of our Windows test slaves uses Intel graphics, afaik.
[Risks and why]: very, very low risk: just backing out a small, simple patch.
[String/UUID change made/needed]: none
Attachment #8486886 - Flags: approval-mozilla-release?
Attachment #8486886 - Flags: approval-mozilla-beta?
Attachment #8486886 - Flags: approval-mozilla-aurora?
Benoit - Great that you have a simple patch to fix this bug and that you know the long term solution as well (bug 1065212).

I don't see many crashes with this signature on Aurora (30),  Nightly (4), or Beta (0). It may be worth landing on Aurora to see if the crash volume drops to zero but we will have to wait a few days for results. The point of waiting is to confirm to the best of our ability that this fix really does address the issue before pushing out 32.0.1.

Comment 14

3 years ago
Let's ignore the specific signature for the moment, since that can change a lot across channels.

Here's a list of the crashes which have DriverVersionMismatch in them in FF32:

https://crash-stats.mozilla.com/search/?version=32.0&app_notes=DriverVersionMismatch&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=app_notes

Here's the same list in recent nightly:

https://crash-stats.mozilla.com/search/?release_channel=nightly&app_notes=DriverVersionMismatch&build_id=%3E20140901000000&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=app_notes

Ignoring bug 1062612 which is unrelated, we could check whether this patch improves OOM | Small or the abort rates on trunk. But that data will take several days to be reliable, and I think we'd be better off just doing 32.0.1 with this, and updating existing 32 users first to see if this fixes the regression.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5cc6fa50f4b6
(Reporter)

Comment 16

3 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #14)
> Let's ignore the specific signature for the moment, since that can change a
> lot across channels.

Well, the signature in this bug makes sense because we know from other blocklisting issues (see bug 988549) that older Intel drivers crash with this signature when we enable D2D.

> Here's the same list in recent nightly:

Most of those are actually crashes that we know have different causes and just happen to people with driver version mismatches as well.

That said, if this has an impact on OOM|small, I'd love to see that. :)
Comment on attachment 8486886 [details] [diff] [review]
backout-unblacklisting-DriverVersionMismatch

This is the driver for the 32.0.1 desktop release. Approving the backout for aurora, beta, and release.
Attachment #8486886 - Flags: approval-mozilla-release?
Attachment #8486886 - Flags: approval-mozilla-release+
Attachment #8486886 - Flags: approval-mozilla-beta?
Attachment #8486886 - Flags: approval-mozilla-beta+
Attachment #8486886 - Flags: approval-mozilla-aurora?
Attachment #8486886 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-release/rev/227d1d0bf16b

Queued for Aurora/Beta as well.
Assignee: nobody → bjacob
status-firefox32: affected → fixed
status-firefox33: --- → affected
status-firefox34: --- → affected
status-firefox35: --- → fixed
status-firefox-esr31: --- → unaffected
https://hg.mozilla.org/releases/mozilla-aurora/rev/89d9e35fb154
https://hg.mozilla.org/releases/mozilla-beta/rev/d0885f177e37
status-firefox33: affected → fixed
status-firefox34: affected → fixed
https://hg.mozilla.org/mozilla-central/rev/5cc6fa50f4b6
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35

Updated

3 years ago
Blocks: 1062452
(Reporter)

Updated

3 years ago
Duplicate of this bug: 1062452
(Reporter)

Comment 22

3 years ago
I can verify that this is fixed in 32.0.1, judging by the crash data from over the weekend.
status-firefox32: fixed → verified
You need to log in before you can comment on or make changes to this bug.