Closed Bug 1145143 Opened 5 years ago Closed 4 years ago

Firefox 38a2 reliably crashes itself and the NVIDIA driver on my laptop

Categories

(Core :: Graphics: Layers, defect, critical)

37 Branch
x86_64
Windows 8.1
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox36 --- unaffected
firefox37 + wontfix
firefox38 + wontfix
firefox39 + fixed
firefox40 + fixed
firefox41 + wontfix
firefox-esr38 - affected

People

(Reporter: Rincebrain, Assigned: bas.schouten)

References

Details

(Keywords: crash, regression, topcrash-win, Whiteboard: [gfx-noted])

Crash Data

Attachments

(7 files, 3 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36

Steps to reproduce:

> configure Firefox to launch with the NVIDIA GPU using the NVIDIA Control Panel
> launch Firefox 38a2



Actual results:

NVIDIA driver and Firefox both crash (resulting in a system deadlock if Firefox doesn't crash fast enough, sometimes...)


Expected results:

The NVIDIA driver not crashing and Firefox not inducing such a crash.
Additional notes - this does not occur on Firefox 36.0.1 (I'll try doing some degree of bisecting using available builds because I do not have a Firefox build env on this system, to put it mildly), and is distinct from #1125466 in that it still occurs with a renamed Firefox.exe.

NVIDIA driver 341.01, laptop is a Thinkpad W550s, Intel HD 5500 and Quadro K620m GPUs.
e10s-testing set to false does not prevent these crashes, either.
We need more info:

1) Type about:support in the location bar and paste the section "graphics".

2) Type about:crashes in the location bar and paste some crash IDs (bp-...).

3) Does Fireox crash with HWA disabled?
https://support.mozilla.org/en-US/kb/forum-response-disable-hardware-acceleration
Flags: needinfo?(Rincebrain)
I can only do the requested things with the Intel GPU set, but that said...

1)
Adapter Description	Intel(R) HD Graphics 5500
Adapter Description (GPU #2)	NVIDIA Quadro K620M
Adapter Drivers	igdumdim64 igd10iumd64 igd10iumd64 igdumdim32 igd10iumd32 igd10iumd32
Adapter Drivers (GPU #2)	nvd3dumx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um
Adapter RAM	Unknown
Adapter RAM (GPU #2)	2048
Device ID	0x1616
Device ID (GPU #2)	0x137a
Direct2D Enabled	true
DirectWrite Enabled	true (6.3.9600.17415)
Driver Date	11-18-2014
Driver Date (GPU #2)	10-15-2014
Driver Version	10.18.14.4029
Driver Version (GPU #2)	9.18.13.4101
GPU #2 Active	false
GPU Accelerated Windows	1/1 Direct3D 11 (OMTC)
Subsys ID	222517aa
Subsys ID (GPU #2)	222517aa
Vendor ID	0x8086
Vendor ID (GPU #2)	0x10de
WebGL Renderer	Google Inc. -- ANGLE (Intel(R) HD Graphics 5500 Direct3D11 vs_5_0 ps_5_0)
windowLayerManagerRemote	true
AzureCanvasBackend	direct2d 1.1
AzureContentBackend	direct2d 1.1
AzureFallbackCanvasBackend	cairo
AzureSkiaAccelerated	0

2)
bp-b265e5bd-baa1-4abc-971c-d3e272150320
	3/20/2015	11:46 AM
bp-a44ae28a-cc49-4b8a-b033-ade6e2150319
	3/19/2015	6:09 AM
bp-e27df5c2-273f-42b0-aee6-54f062150319
	3/19/2015	6:04 AM

3)
It seems not, no.
Flags: needinfo?(Rincebrain)
Probably a dupe of bug 1116812.
Severity: normal → critical
Crash Signature: [@ mozilla::layers::CompositorD3D11::HandleError(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::Failed(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::UpdateRenderTarget() ]
Component: Untriaged → Graphics: Layers
Keywords: crash
Product: Firefox → Core
Version: Firefox 38 → 36 Branch
Version: 36 Branch → 38 Branch
Probably related, but not a duplicate per se, I think?

That looks like an umbrella bug for "we crash if the Windows driver resets", while this is a specific behavior change inducing a Windows driver crash. (Not yet known what the behavior change _is_, but ...)
If you could bisect this using mozregression that would be valuable.
In case I deadlock, current bisect status:

 3:21.53 LOG: MainThread Bisector INFO Narrowed nightly regression window from [
2014-12-05, 2014-12-12] (7 days) to [2014-12-05, 2014-12-09] (4 days) (~2 steps
left)
At around  4:43.31 LOG: MainThread mozversion INFO application_repository: https://hg.mozi
lla.org/mozilla-central
 4:43.31 LOG: MainThread mozversion INFO application_vendor: Mozilla
 4:43.31 LOG: MainThread mozversion INFO application_version: 37.0a1
 4:43.31 LOG: MainThread mozversion INFO platform_buildid: 20141208030202
 4:43.32 LOG: MainThread mozversion INFO platform_changeset: 035a951fc24a
 4:43.32 LOG: MainThread mozversion INFO platform_repository: https://hg.mozilla
.org/mozilla-central
 4:43.32 LOG: MainThread mozversion INFO platform_version: 37.0a1

this started to stop successfully rendering at all, but does not result in a crash.

It's now going to inbound builds because it's gone as far as it can on nightlies, apparently.
Disabling e10s in each build and restarting allows you to see if this crashes.

12:43.91 LOG: MainThread Bisector INFO Narrowed inbound regression window from [
8878e674, a4b4cd74] (39 revisions) to [8878e674, 47c97b8b] (20 revisions) (~4 st
eps left)
14:57.62 LOG: MainThread Bisector INFO Narrowed inbound regression window from [
8878e674, 24ba8274] (4 revisions) to [2a61df4e, 24ba8274] (2 revisions) (~1 step
s left)
14:57.62 LOG: MainThread Bisector INFO Oh noes, no (more) inbound revisions :(
14:57.62 LOG: MainThread Bisector INFO Last good revision: 2a61df4eaa2d
14:57.62 LOG: MainThread Bisector INFO First bad revision: 24ba8274ed60
14:57.62 LOG: MainThread Bisector INFO Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=2a61df
4eaa2d&tochange=24ba8274ed60
Great. It makes sense that this could be caused by bug 1102499. Thanks for doing that.
Blocks: 1102499
Keywords: regression
Version: 38 Branch → 37 Branch
There's a trybuild coming here which should printf some useful information as to what's going on, if this still crashes your video drive and we can somehow catch the printf output it might help us determine what's going on:

https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/bschouten@mozilla.com-2433d8f2aeaa
Log from running said trybuild attached.
(In reply to Rich from comment #14)
> Created attachment 8581039 [details]
> output from a crash run of the trybuild
> 
> Log from running said trybuild attached.

Thanks a lot, this is very interesting!

Could you paste your about:support, when running with the NVidia GPU, on a build that doesn't crash?
Application Basics
------------------

Name: Firefox
Version: 36.0.3
User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0
Multiprocess Windows: 0/1

Crash Reports for the Last 3 Days
---------------------------------

Report ID: bp-9d113de2-2dc8-4215-9ab4-cf6952150320
Submitted: 23 hours ago

Report ID: bp-b265e5bd-baa1-4abc-971c-d3e272150320
Submitted: 1 day ago

Report ID: bp-a44ae28a-cc49-4b8a-b033-ade6e2150319
Submitted: 2 days ago

Report ID: bp-e27df5c2-273f-42b0-aee6-54f062150319
Submitted: 2 days ago

All Crash Reports (including 1 pending crash in the given time range)

Extensions
----------

Graphics
--------

Adapter Description: Intel(R) HD Graphics 5500
Adapter Description (GPU #2): NVIDIA Quadro K620M
Adapter Drivers: igdumdim64 igd10iumd64 igd10iumd64 igdumdim32 igd10iumd32 igd10iumd32
Adapter Drivers (GPU #2): nvd3dumx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um
Adapter RAM: Unknown
Adapter RAM (GPU #2): 2048
Device ID: 0x1616
Device ID (GPU #2): 0x137a
Direct2D Enabled: true
DirectWrite Enabled: true (6.3.9600.17415)
Driver Date: 11-18-2014
Driver Date (GPU #2): 10-15-2014
Driver Version: 10.18.14.4029
Driver Version (GPU #2): 9.18.13.4101
GPU #2 Active: false
GPU Accelerated Windows: 1/1 Direct3D 11 (OMTC)
Subsys ID: 222517aa
Subsys ID (GPU #2): 222517aa
Vendor ID: 0x8086
Vendor ID (GPU #2): 0x10de
WebGL Renderer: Google Inc. -- ANGLE (NVIDIA Quadro K620M Direct3D9Ex vs_3_0 ps_3_0)
windowLayerManagerRemote: true
AzureCanvasBackend: direct2d
AzureContentBackend: direct2d
AzureFallbackCanvasBackend: cairo
AzureSkiaAccelerated: 0

Important Modified Preferences
------------------------------

accessibility.typeaheadfind.flashBar: 0
browser.cache.disk.capacity: 358400
browser.cache.disk.smart_size.first_run: false
browser.cache.disk.smart_size.use_old_max: false
browser.cache.frecency_experiment: 1
browser.places.smartBookmarksVersion: 7
browser.sessionstore.upgradeBackup.latestBuildID: 20150319201009
browser.startup.homepage_override.buildID: 20150319201009
browser.startup.homepage_override.mstone: 36.0.3
dom.mozApps.used: true
extensions.lastAppVersion: 36.0.3
gfx.direct3d.last_used_feature_level_idx: 0
media.gmp-gmpopenh264.lastUpdate: 1425927689
media.gmp-gmpopenh264.version: 1.3
media.gmp-manager.lastCheck: 1426873510
network.cookie.prefsMigrated: true
places.database.lastMaintenance: 1426873510
places.history.expiration.transient_current_max_pages: 104858
plugin.disable_full_page_plugin_for_types: application/pdf
plugin.importedState: true
privacy.sanitize.migrateFx3Prefs: true
storage.vacuum.last.index: 1
storage.vacuum.last.places.sqlite: 1428756992

Important Locked Preferences
----------------------------

JavaScript
----------

Incremental GC: true

Accessibility
-------------

Activated: false
Prevent Accessibility: 0

Library Versions
----------------

NSPR
Expected minimum version: 4.10.7
Version in use: 4.10.7

NSS
Expected minimum version: 3.17.4 Basic ECC
Version in use: 3.17.4 Basic ECC

NSSSMIME
Expected minimum version: 3.17.4 Basic ECC
Version in use: 3.17.4 Basic ECC

NSSSSL
Expected minimum version: 3.17.4 Basic ECC
Version in use: 3.17.4 Basic ECC

NSSUTIL
Expected minimum version: 3.17.4
Version in use: 3.17.4

Experimental Features
---------------------
This is mysterious you're getting D3D11 acceleration both before and after the WARP patch. I have no idea as of yet how there could be a difference, I'm going to try and make you another build!
Attached file errlog
Output from the second trybuild.

(It crashed.)
OK, that eliminates one theory on the cause. I'll try to come up with something else tomorrow.
Whiteboard: [gfx-noted]
Log from that trybuild (it also crashed)
Updating the crash volume since Lawrence and I were discussing this bug today.  

This is currently the #2 topcrash in 37.0b6 with 3509 crashes; that's 3.53% of crashes in beta6 in the last week. 

I notice we wontfixed bug 1116812, the other bug with this crash signature.
Keywords: topcrash
Ah, sorry, we are at 37.0b7. In beta7, it's 7.43% of crashes with 3121 crashes in the last week. #1 topcrash.
(In reply to Liz Henry (:lizzard) from comment #24)
> Ah, sorry, we are at 37.0b7. In beta7, it's 7.43% of crashes with 3121
> crashes in the last week. #1 topcrash.

The problem described in this bug is not going to be related to any significant portion of those crashes.
Keywords: topcrash
http://www.nvidia.com/download/driverResults.aspx/83304/en-us

Could you try updating to the latest Nvidia Quadro 347.88 WHQL driver? To rule out driver related bugs.
Not ATM; none of the non-OEM versions of the NVIDIA driver support the chip in this laptop. (I tried beta and release versions from the 341.xx and 347.xx lines; none of them do, and the Quadro K620M isn't even listed on their downloads options, so...)
(In reply to NVD from comment #26)
> http://www.nvidia.com/download/driverResults.aspx/83304/en-us
> 
> Could you try updating to the latest Nvidia Quadro 347.88 WHQL driver? To
> rule out driver related bugs.

He's got the Quadro K620M not K620, which does appear to be supported by 347.88. In a Lenovo I am guessing? I verified that geforce.com doesn't show K620M as supported, only K610M. Bummer.
Correct, Lenovo ThinkPad W550s.
(In reply to Rich from comment #29)
> Correct, Lenovo ThinkPad W550s.

Ok, Lenovo is sneaky but has the newer drivers (345.20) here: http://support.lenovo.com/ae/en/downloads/ds102045

Give those a shot and retry.
That's for Windows 7, I'm on Windows 8.
(In reply to Rich from comment #31)
> That's for Windows 7, I'm on Windows 8.

Indeed, should have caught that. Sadly, not yet updated for Win 8 but since the Win 7 ones are early march, I would surmise the Win 8 ones are not far behind.
Are all the crashes with D2D 1.1 and the comment 16 suggesting that when it didn't crash it was D2D 1?
Flags: needinfo?(bas)
I don't see where you're getting D2D 1.1 versus 1 from my output?
Yes, but the difference between those two reports is also that the Intel GPU is the GPU in use in #4, and the NVIDIA GPU is in use in #16. So it's more than just the D2D version that changes...
Right - the few reports that I looked at were D2D 1.1, and I understood that the crashes were with Nvidia GPU, so I jumped to conclusion that the crashes were Nvida GPU + D2D 1.1 combination.  Perhaps a wrong conclusion, but either we have crashes with Intel GPU or somehow Nvidia GPU gives us D2D 1.1 and D2D 1 in different cases.

Bas, this also has a lot of DXGI_ERROR_UNSUPPORTED errors in the log, but it doesn't seem to be WARP related - red herring or...?
(In reply to Milan Sreckovic [:milan] from comment #37)
> Right - the few reports that I looked at were D2D 1.1, and I understood that
> the crashes were with Nvidia GPU, so I jumped to conclusion that the crashes
> were Nvida GPU + D2D 1.1 combination.  Perhaps a wrong conclusion, but
> either we have crashes with Intel GPU or somehow Nvidia GPU gives us D2D 1.1
> and D2D 1 in different cases.
> 
> Bas, this also has a lot of DXGI_ERROR_UNSUPPORTED errors in the log, but it
> doesn't seem to be WARP related - red herring or...?

I suspect that's all just a little bit of a red herring. WARP doesn't even seem to be used here. I don't see any way in which WARP could be related at all (D2D 1.1 vs 1.0 definitely doesn't seem to be related either). This is a very hard driver crash right on the first load, I've tried changing the order and timing of things in my experimental builds because that's the only thing that my WARP patch could've changed. But it doesn't seem to be helping. It's extremely strange but sadly a usecase that is so extremely uncommon it probably doesn't need to be prioritized.

If we get our hands on one of these machines we should see if we can figure out what triggers the device reset though.
Flags: needinfo?(bas)
We kind of do have our hands on one of these machines, they just happen to be Rich's hands and he's been extremely helpful so far so we should keep taking advantage :)

I know WARP isn't used, it's more the pattern of repeatedly losing and recreating the device (D2DERR_RECREATE_TARGET shows up a lot, we get deep into double digits or triple digits of these errors before crashing) - is there an explanation similar the Win 7 + WARP problem with keyed mutex that we may be somehow having with Windows 8?  (We also have a "Attempted to borrow a draw target without locking" in there.)

Interesting thing would be to run these tests with the Intel GPU (when we don't crash) and see if we do get any of these messages - they would show up in about:support, but only with Firefox 38 or later.

Anyway, random thoughts, I have the advantage of not knowing the details of how things work and thus not making assumptions as to what should work but maybe doesn't :)
Any of recent version (including the last stable one) reliably crashes when I reinstall or update the NVIDIA drivers. And this is something that started to occur about a year ago or something.
No longer happens with the NVIDIA 350.05 hotfix drivers for me.

I'm marking it RESOLVED WORKSFORME, but please feel free to change it if some other status is more appropriate.

(For anyone having trouble finding the 350.05 beta driver - http://nvidia.custhelp.com/app/answers/detail/a_id/3647 )
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Milan - Looks like this requires a driver update. Should we blocklist the affected driver or otherwise try to fix the issue?
Flags: needinfo?(milan)
Probably, but, this is blocklisting on the second GPU, and we only added the capability to do that in 37, so I want to be careful with the patch.  It would be GPU2 for:

Adapter Description (GPU #2)	NVIDIA Quadro K620M
Device ID (GPU #2)	0x137a
Driver Date (GPU #2)	10-15-2014
Driver Version (GPU #2)	9.18.13.4101
(In reply to Milan Sreckovic [:milan] from comment #45)
> Probably, but, this is blocklisting on the second GPU, and we only added the
> capability to do that in 37, so I want to be careful with the patch.  It
> would be GPU2 for:
> 
> Adapter Description (GPU #2)	NVIDIA Quadro K620M
> Device ID (GPU #2)	0x137a
> Driver Date (GPU #2)	10-15-2014
> Driver Version (GPU #2)	9.18.13.4101

Users with the Quadro K620M have been waiting on Lenovo (and possibly others) to update the driver and it appears they finally have (3-2-2015 - 9.18.13.4520): http://support.lenovo.com/uu/en/downloads/ds102045
(In reply to Milan Sreckovic [:milan] from comment #45)
> Probably, but, this is blocklisting on the second GPU, and we only added the
> capability to do that in 37, so I want to be careful with the patch.  It
> would be GPU2 for:
> 
> Adapter Description (GPU #2)	NVIDIA Quadro K620M
> Device ID (GPU #2)	0x137a
> Driver Date (GPU #2)	10-15-2014
> Driver Version (GPU #2)	9.18.13.4101

Apologies for the churn. Not for 8.1 it has not.
http://www.nvidia.com/download/driverResults.aspx/84049/en-us

Quadro 350.12 WHQL driver for Quadro cards.
(In reply to NVD from comment #48)
> http://www.nvidia.com/download/driverResults.aspx/84049/en-us
> 
> Quadro 350.12 WHQL driver for Quadro cards.

Looks like K620M is still not supported on that driver.
It doesn't list the K620M as supported, but I assure you, it's in the list of devices in the driver installer, and works well for me.
(In reply to Rich from comment #50)
> It doesn't list the K620M as supported, but I assure you, it's in the list
> of devices in the driver installer, and works well for me.

Thanks for reporting this.
Trying to find out if we can detect that the discrete graphics is being used.  Without that, if we were to blocklist this configuration, it would also block the cases where the integrated graphics is being used, which would be undesirable.
Reopening -- Looks like this works for Rich now because he updated his driver, but we still need to address the issue with the driver that failed.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Noting that some Nvidia driver versions disallow Firefox on the discrete card, not letting the user force it.  So, you have to be in a range of drivers that allow it, but before this got fixed with the latest.  It may not be easy to get this range and have us trust it; on a similar system, I saw 327.68 always lock Firefox into integrated graphics, 347.88 allow forcing discrete (as clearly does 341.01 that Rich was using when things were crashing), and I'm not sure if 350.05 disallows the switch, or fixes the crash.

Rich, what does about:support look like with 350.05, in particular, the WebGL Renderer string would either say Intel or Nvidia.
Google Inc. -- ANGLE (NVIDIA Quadro K620M Direct3D11 vs_5_0 ps_5_0)
And in bug 1143806, the user with 350.12 still crashes, so this is not as simple.  But at least it's clear that 350.* does allow discrete graphics.
There doesn't appear to be a run-time app way to ignore the forcing to discrete if set at the system level.
See Also: → 1143806
Too late for 38 but we will accept a patch for 39. Thanks
http://www.geforce.com/whats-new/articles/geforce-352-86-whql-driver-released

"Windows Vista/Windows 7/Windows 8/Windows 8.1 Fixed Issues
[Firefox] Firefox Nightly crashes when started on the NVIDIA GPU on an Optimus
system. [1609030]"

Release notes says this is fixed now.
(In reply to NVD from comment #58)
> http://www.geforce.com/whats-new/articles/geforce-352-86-whql-driver-released
> 
> "Windows Vista/Windows 7/Windows 8/Windows 8.1 Fixed Issues
> [Firefox] Firefox Nightly crashes when started on the NVIDIA GPU on an
> Optimus
> system. [1609030]"
> 
> Release notes says this is fixed now.


It's bug 1154703.
This one https://crash-stats.mozilla.com/report/index/47537469-de67-49b9-8be4-786702150518 crashes after 6 seconds, and we don't think it's TDR related; just an invalid API call caused by mSwapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), (void**)backBuf.StartAssignment());

One "obvious" thing is that we don't check the return value of VerifyBufferSize() inside of UpdateRenderTarget().  I imagine the invalid api call could be coming from mismatch in size?  Should we report the failed VerifyBufferSize() and be noisy about why it fails?
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(milan)
Flags: needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #60)
> This one
> https://crash-stats.mozilla.com/report/index/47537469-de67-49b9-8be4-
> 786702150518 crashes after 6 seconds, and we don't think it's TDR related;
> just an invalid API call caused by mSwapChain->GetBuffer(0,
> __uuidof(ID3D11Texture2D), (void**)backBuf.StartAssignment());
> 
> One "obvious" thing is that we don't check the return value of
> VerifyBufferSize() inside of UpdateRenderTarget().  I imagine the invalid
> api call could be coming from mismatch in size?  Should we report the failed
> VerifyBufferSize() and be noisy about why it fails?

A mismatch between which sizes do you mean?
Flags: needinfo?(bas) → needinfo?(milan)
Widget's bounds and swap chain's buffer.  I don't know if that's "bad" or not and if it could lead to this crash though.
Flags: needinfo?(milan)
(In reply to Milan Sreckovic [:milan] from comment #60)
> One "obvious" thing is that we don't check the return value of
> VerifyBufferSize() inside of UpdateRenderTarget().  I imagine the invalid
> api call could be coming from mismatch in size?  Should we report the failed
> VerifyBufferSize() and be noisy about why it fails?

Yes, we should check the return value, and log the error in gfxCriticalError. It'd be even cleaner to have VerifyBufferSize return a boolean and use this in BeginFrame instead of having the latter check for mDefaultRT to decide whether things went sideways.
Flags: needinfo?(nical.bugzilla)
(In reply to Milan Sreckovic [:milan] from comment #62)
> Widget's bounds and swap chain's buffer.  I don't know if that's "bad" or
> not and if it could lead to this crash though.

No, this would simply cause windows to scale the surface to the widget bounds.

Of course 0 size surfaces are illegal though.
This shouldn't break anything, and may give us more information.
Attachment #8608253 - Flags: review?(nical.bugzilla)
Attachment #8608253 - Flags: review?(nical.bugzilla) → review+
(In reply to Milan Sreckovic [:milan] from comment #65)
> Created attachment 8608253 [details] [diff] [review]
> Check if VerifyBufferSize failed. r=nical
> 
> This shouldn't break anything, and may give us more information.

Let's not close when this lands. I highly doubt it will make a difference for the actual crash, even if it could indeed be informative.
Keywords: leave-open
milan, is the crash from the try run https://treeherder.mozilla.org/logviewer.html#?job_id=7808296&repo=try related to this checkin, seems this was on all the bc1 tests
Flags: needinfo?(milan)
Keywords: checkin-needed
Attachment #8610152 - Flags: checkin?
Best news ever, that try failure - sorry for not checking for the completion before requesting checkin.

Bas, this assertion means that we are calling UpdateRenderTarget with mSize at (0,0), which could lead to (driver specific) badness - comment 64.  Which comes from widget having client bounds 0?  Is this something we should just handle, or do we still want to crash?
Flags: needinfo?(milan) → needinfo?(bas)
Milan, since this is tracking for 39, is the fix stable enough to request an uplift to Beta/Aurora?
Flags: needinfo?(milan)
Duplicate of this bug: 1172811
Comment on attachment 8613639 [details] [diff] [review]
Check if VerifyBufferSize failed. Carry r=nical

Approval Request Comment
This patch won't hurt either 39 or 40, and we will get additional information if something bad happens instead, but do we actually have any of these crashes on 39 and 40?
Flags: needinfo?(milan)
Flags: needinfo?(bas)
Attachment #8613639 - Flags: approval-mozilla-beta?
Attachment #8613639 - Flags: approval-mozilla-aurora?
Comment on attachment 8613639 [details] [diff] [review]
Check if VerifyBufferSize failed. Carry r=nical

Approved for uplift to aurora and beta, to help diagnose a top crash in beta.
Attachment #8613639 - Flags: approval-mozilla-beta?
Attachment #8613639 - Flags: approval-mozilla-beta+
Attachment #8613639 - Flags: approval-mozilla-aurora?
Attachment #8613639 - Flags: approval-mozilla-aurora+
(In reply to Liz Henry (:lizzard) from comment #77)
> There are a lot of crashes with this signature on 39 beta 1, 2, and 3.  40
> and 41 are also affected. 

The signature is the generic thing we get with TDRs and can have many reasons. But every specific case of it we can fix is a good thing - esp. if we don't end up crashing somewhere else instead.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #79)
> The signature is the generic thing we get with TDRs and can have many
> reasons. But every specific case of it we can fix is a good thing - esp. if
> we don't end up crashing somewhere else instead.
This should be relatively easy to fix, as it has a simple repro:
https://bugzilla.mozilla.org/show_bug.cgi?id=1172811
(In reply to SkyLined from comment #80)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #79)
> > The signature is the generic thing we get with TDRs and can have many
> > reasons. But every specific case of it we can fix is a good thing - esp. if
> > we don't end up crashing somewhere else instead.
> This should be relatively easy to fix, as it has a simple repro:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1172811

This isn't a TDR, it's a separate bug which isn't related to this one although the signature is the same.
Can this bug be resolved now? Also, should we consider this for esr38 as well?
Flags: needinfo?(milan)
Target Milestone: --- → mozilla41
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #83)
> Can this bug be resolved now? Also, should we consider this for esr38 as
> well?

Nope, it cannot, although this patch will improve the situation somewhat, I don't believe it completely fixes it. But perhaps I'm wrong.
(In reply to Bas Schouten (:bas.schouten) from comment #81)
> This isn't a TDR, it's a separate bug which isn't related to this one
> although the signature is the same.

My point was more about "the overall volume of this signature doesn't really say if the specific case in this bug is worth tracking as the overall volume of this signature is probably still dominated by TDRs". But I might be wrong even about that.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #86)
> (In reply to Bas Schouten (:bas.schouten) from comment #81)
> > This isn't a TDR, it's a separate bug which isn't related to this one
> > although the signature is the same.
> 
> My point was more about "the overall volume of this signature doesn't really
> say if the specific case in this bug is worth tracking as the overall volume
> of this signature is probably still dominated by TDRs". But I might be wrong
> even about that.

I think you're right :-). I don't think this patch fixes the problem this bug was reported for, let's put it that way.
I wouldn't bother with 38esr unless we get overwhelming reduction in crashes (which I can't see happening.)
Flags: needinfo?(milan)
In this one https://crash-stats.mozilla.com/report/index/d0e39c73-89fe-4a24-8f71-a57d12150615

We hit invalid arg error, fail to verify the buffer size, then hit the invalid call error and MOZ_CRASH.  This patch will at least show us the size.
Attachment #8624401 - Flags: review?(bas)
Nightly only, let's see if we can get more data when the bad things happen.
Attachment #8624401 - Attachment is obsolete: true
Attachment #8624409 - Flags: review?(bas)
Attachment #8624409 - Flags: review?(bas) → review+
Comment on attachment 8613639 [details] [diff] [review]
Check if VerifyBufferSize failed. Carry r=nical

Just remember what was checked in as this is a leave-open bug for now.
Attachment #8613639 - Flags: checkin+
The reviewed and not checked in patch is non-asserting messages, so only did a build try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fee8d777a4bb
Keywords: checkin-needed
Attachment #8624409 - Flags: checkin? → checkin+
Just a note that this is currently the #7 crash in Firefox 39 looking at the last week of data.
Keywords: topcrash-win
Is there more to be done here?
Flags: needinfo?(bas)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #96)
> Is there more to be done here?

Well, it's still around, I don't know if the bug reporter's problem has gotten better.

This crash is way down on 40 (16th) but it's still high on 39 (5th right now). Not sure there's much concrete to be done right now though.
Flags: needinfo?(bas)
My problem got worked around, in a sense, by the NVIDIA driver update they released.

I can try reinstalling the older driver and retest on a nightly snapshot if it would be of use.
In regards to Comment 88, are you ok with me untracking for ESR 38? I assume that there has been no status change since then.
Flags: needinfo?(bas)
(In reply to Kate Glazko from comment #99)
> In regards to Comment 88, are you ok with me untracking for ESR 38? I assume
> that there has been no status change since then.

I don't know how serious the issue is and if we should keep a close eye on it. But I don't think something concrete will get done on it so I'm also not sure how useful tracking it would be.
Flags: needinfo?(bas)
FWIW, here are the current stats.

Firefox 39: #4 @ 1.30% (6964 crashes last week)
Firefox 40: #4 @ 1.21% (1734 crashes last week)
Firefox 41: #39 @ 0.33% (31 crashes last week)
Firefox 42: #30 @ 0.26% (22 crashes last week)

Crashes by GPU Vendor:
* 8997 crashes with Intel-based GPUs (78.51%)
* 1312 crashes with nVidia-based GPUs (11.45%)
* 1092 crashes with AMD-based GPUs (9.53%)
Note: Intel numbers might be highest due to hybrid GPU systems. I don't know of a good way to separate these out.

Crashes by GPU device:
* 0x8086 :: 0x0102: 1469 crashes (12.758 %) [2nd Gen Core Processor Integrated Graphics]
* 0x8086 :: 0x0046: 1215 crashes (10.552 %) [Core Processor Integrated Graphics]
* 0x8086 :: 0x0116: 1200 crashes (10.422 %) [2nd Gen Core Processor Integrated Graphics]

Based on volume I think this should continue to be tracked. I'm not sure what the best way forward is though. We could try to track down dual GPU systems with the top cards but this is probably just a shot in the dark. I'm willing to help any way I can, any advice on what to do next here?
QA Whiteboard: topcrash-nvidia
QA Whiteboard: topcrash-nvidia → topcrash-intel
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #101)
> FWIW, here are the current stats.
> 
> Firefox 39: #4 @ 1.30% (6964 crashes last week)
> Firefox 40: #4 @ 1.21% (1734 crashes last week)
> Firefox 41: #39 @ 0.33% (31 crashes last week)
> Firefox 42: #30 @ 0.26% (22 crashes last week)
> 
> Crashes by GPU Vendor:
> * 8997 crashes with Intel-based GPUs (78.51%)
> * 1312 crashes with nVidia-based GPUs (11.45%)
> * 1092 crashes with AMD-based GPUs (9.53%)
> Note: Intel numbers might be highest due to hybrid GPU systems. I don't know
> of a good way to separate these out.
> 
> Crashes by GPU device:
> * 0x8086 :: 0x0102: 1469 crashes (12.758 %) [2nd Gen Core Processor
> Integrated Graphics]
> * 0x8086 :: 0x0046: 1215 crashes (10.552 %) [Core Processor Integrated
> Graphics]
> * 0x8086 :: 0x0116: 1200 crashes (10.422 %) [2nd Gen Core Processor
> Integrated Graphics]
> 
> Based on volume I think this should continue to be tracked. I'm not sure
> what the best way forward is though. We could try to track down dual GPU
> systems with the top cards but this is probably just a shot in the dark. I'm
> willing to help any way I can, any advice on what to do next here?

FWIW, I've created some json files that allows grouping gpu's by generation and chipset. That should help us group these things better: https://github.com/jrmuizel/gpu-db
(In reply to Jeff Muizelaar [:jrmuizel] from comment #102)
> FWIW, I've created some json files that allows grouping gpu's by generation
> and chipset. That should help us group these things better:
> https://github.com/jrmuizel/gpu-db

Thanks Jeff. I correlated your db to the top 20 cards (all Intel, accounts for 73% of the crashes).

The top generations of cards:
Gen 6	33.43%
Gen 4	15.19%
Gen 5	13.50%
Gen 7	8.41%
Gen 3	2.86%

The top family of cards:
Sandybridge	33.43% [Gen 6]
GMA4500	        14.02% [Gen 4]
Ironlake	13.50% [Gen 5]
Ivybridge	 5.39% [Gen 7]
Haswell	         2.22% [Gen 7]
GMA3100	         1.90% [Gen 3]
GMA3500	         1.17% [Gen 4]
GMA950	         0.96% [Gen 3]
Baytrail	 0.81% [Gen 7]
I'm not sure if I should post this here or file another bug. Please advise.

I have recently had an increasing occurrence of blackness covering up the browser (mostly content I believe), this escalates until the browser begins becoming unresponsive until it crashes.
Here are two crash reports of such crashes:
bp-cfce7fcf-f66e-470f-9515-dc0dd2150802
bp-fe100153-a38c-47e6-a49b-b1aa42150813

This last time, once the issue had gotten disruptive, I took a memory report and then it escalated further and I took another report. 
I will attach those.

Also, once things had gotten bad, I tried to load a youtube video, it prompted me to load Flash (I have click to play enabled).  I then tried to load an mp4 directly (Big buck bunny http://www.w3schools.com/html/mov_bbb.mp4 ) and received the error "Video cannot be played because the file is corrupt." (which, it's not).

If this fits here or should be filed as a new bug please let me know.

Here's the graphics section of my about:support
======
Graphics
Adapter Description	Intel(R) HD Graphics Family
Adapter Drivers	igdumd64 igd10umd64 igd10umd64 igdumdx32 igd10umd32 igd10umd32
Adapter RAM	Unknown
Asynchronous Pan/Zoom	none
Device ID	0x0116
Direct2D Enabled	true
DirectWrite Enabled	true (6.2.9200.17292)
Driver Date	6-10-2011
Driver Version	8.15.10.2418
GPU #2 Active	false
GPU Accelerated Windows	12/12 Direct3D 11 (OMTC)
Subsys ID	049a1028
Supports Hardware H264 Decoding	false
Vendor ID	0x8086
WebGL Renderer	Google Inc. -- ANGLE (Intel(R) HD Graphics Family Direct3D9Ex vs_3_0 ps_3_0)
windowLayerManagerRemote	true
AzureCanvasBackend	direct2d 1.1
AzureContentBackend	direct2d 1.1
AzureFallbackCanvasBackend	cairo
AzureSkiaAccelerated	0
(In reply to Caspy7 from comment #104)
> Driver Date	6-10-2011
> Driver Version	8.15.10.2418

That's a pretty old driver. Seeing as one of the crashing modules was you video driver, could you try updating to https://downloadcenter.intel.com/download/24970/Intel-HD-Graphics-Driver-for-Windows-7-8-8-1-32bit, which is from June 2015, and see if it improves?
NI tag for the driver update.
Flags: needinfo?(caspy77)
I attempted to install the linked drivers and was given the message, "This computer does not meet the minimum requirements for installing the software."
So I went into the device manager and checked for updated drivers for my display adapter "Intel(R) HD Graphics Family" and it said, "The best driver for your device is already installed" and "Windows has determined the driver software for your device is up to date"

:-/
Flags: needinfo?(caspy77)
(In reply to Caspy7 from comment #110)
> I attempted to install the linked drivers and was given the message, "This
> computer does not meet the minimum requirements for installing the software."
> So I went into the device manager and checked for updated drivers for my
> display adapter "Intel(R) HD Graphics Family" and it said, "The best driver
> for your device is already installed" and "Windows has determined the driver
> software for your device is up to date"
> 
> :-/

Odd. Device ID 0x0116 clearly said it was an HD3000 series. What does it say under Device Manager > Display Adapter & Processors?? Can you use CPU-Z and tell me what model CPU it reports?
Intel(R) HD Graphics Family, Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz

It looks like CPU-Z reports: Intel Core i5 2430M
http://i.imgur.com/mSRd0KA.png

Is this what you're looking for?
(In reply to Caspy7 from comment #104)
> I'm not sure if I should post this here or file another bug. Please advise.
> 
> I have recently had an increasing occurrence of blackness covering up the
> browser (mostly content I believe), this escalates until the browser begins
> becoming unresponsive until it crashes.
> Here are two crash reports of such crashes:
> bp-cfce7fcf-f66e-470f-9515-dc0dd2150802
> bp-fe100153-a38c-47e6-a49b-b1aa42150813
> 
> This last time, once the issue had gotten disruptive, I took a memory report
> and then it escalated further and I took another report. 
> I will attach those.
> 

These memory reports show that we're running out of memory (2.00 MB ── vsize-max-contiguous in the second).

Given that, I'm assuming this is an OOM problem, though it's interesting that this causes d3d11 to error instead of just crashing somewhere.
The IGP is HD Graphics 3000 for this CPU. And the latest driver is https://downloadcenter.intel.com/download/24970/Intel-HD-Graphics-Driver-for-Windows-7-8-8-1-32bit
As I said, I attempted to install that and got the error "This computer does not meet the minimum requirements for installing the software."
http://i.imgur.com/MVpMtCf.png

I just rebooted and tried again and got the same error.

Several weeks ago I reinstalled the OS on this computer and did all the important updates, but I'm uncertain how many of the non-important updates I did.  They can be a headache to sort through and see what unnecessary stuff is getting put on your computer. I'm doing all those updates now and will attempt to install the drivers again after that's finished.
Flags: needinfo?(caspy77)
https://downloadcenter.intel.com/download/24971/Intel-HD-Graphics-Driver-for-Windows-7-8-64-bit

Loic linked the wrong driver, 32bit driver instead of 64bit driver but Intel's driver installer is well known to refuse to install on many notebooks due to its annoying checks.
Ah. Yes. Having the proper installer made quite the difference. :)
I successfully installed the drivers, but there's not much more I can report now other than pasting in the Graphics section of about:support below.  I did not have a way to reliably reproduce this.
Though I'll note that APZ and H264 Decoding are both unavailable (why is that?).

I'm also flummoxed that I can have Windows check for updates and it says I have the most recent drivers - when I obviously didn't.

=======
Graphics
Adapter Description	Intel(R) HD Graphics 3000
Adapter Drivers	igdumd64 igd10umd64 igd10umd64 igdumd32 igd10umd32 igd10umd32
Adapter RAM	Unknown
Asynchronous Pan/Zoom	none
Device ID	0x0116
Direct2D Enabled	true
DirectWrite Enabled	true (6.2.9200.17461)
Driver Date	5-26-2015
Driver Version	9.17.10.4229
GPU #2 Active	false
GPU Accelerated Windows	13/13 Direct3D 11 (OMTC)
Subsys ID	049a1028
Supports Hardware H264 Decoding	false
Vendor ID	0x8086
WebGL Renderer	Google Inc. -- ANGLE (Intel(R) HD Graphics 3000 Direct3D9Ex vs_3_0 ps_3_0)
windowLayerManagerRemote	true
AzureCanvasBackend	direct2d 1.1
AzureContentBackend	direct2d 1.1
AzureFallbackCanvasBackend	cairo
AzureSkiaAccelerated	0
Flags: needinfo?(caspy77)
(In reply to Caspy7 from comment #117)
> I'm also flummoxed that I can have Windows check for updates and it says I
> have the most recent drivers - when I obviously didn't.

To this point specifically, as far as I know Windows Update only checks for WHQL drivers (ie. drivers that have been signed by Microsoft). WHQL drivers may sometimes lag behind what is available directly from the OEM. On my Windows systems I tend to only trust Windows Update for updates to Windows itself, relying on the OEM for driver updates (typically through their software or via their website).
Crash Signature: [@ mozilla::layers::CompositorD3D11::HandleError(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::Failed(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::UpdateRenderTarget() ] → [@ mozilla::layers::CompositorD3D11::HandleError(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::Failed(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::UpdateRenderTarget() ] [@ …
This bug may have outlived its usefulness, with all sorts of different things collecting in it, including some fixes (e.g., 39 and 40 are marked as fixed.)  What's the status?  Is there something actionable left here or should we get more work done in other bugs?  We keep coming back to it because it's tracked, but then walk away because it's unfocused.
(In reply to Milan Sreckovic [:milan] from comment #119)
> This bug may have outlived its usefulness, with all sorts of different
> things collecting in it, including some fixes (e.g., 39 and 40 are marked as
> fixed.)  What's the status?  Is there something actionable left here or
> should we get more work done in other bugs?  We keep coming back to it
> because it's tracked, but then walk away because it's unfocused.

I agree with this conclusion. I don't see anything useful coming out of this bug at this point.
Milan - if there looks like there's something we should add to the blocklist, we could file a new bug for it. If not, I agree, let's at least untrack it.
This is #9 (::HandleError signature) in the 41.0b crash-stats. Based on the bug activity, I do not think we will be able to fix this for 41.
The signatures here remains a topcrash but this is being investigated elsewhere. I nominate that we close this bug report based on previous comments. However this bug is tagged as leave-open. Can we just close this bug report?
Closing this bug since there were no objections to comment 123. Please reopen if there's something more to be done here that's not being taken care of in another bug report.
Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.