Closed Bug 629265 Opened 14 years ago Closed 14 years ago

Crash in NVIDIA driver 260.19 on linux 64bit w/ layers acceleration & flash plugin [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ] [@ zero@0x4166f ] [@ @0x0 | zero@0x5fdd7 ]

Categories

(Core :: Graphics, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: dholbert, Assigned: bjacob)

References

()

Details

(Keywords: crash, crashreportid, Whiteboard: [fixed in bug 634366])

Crash Data

Attachments

(4 files)

STEPS TO REPRODUCE:
 1. Enable layers acceleration (layers.acceleration.force-enabled = true)
 2. Visit http://johnnystimson.bandcamp.com
 3. If you haven't crashed yet, reload.

ACTUAL RESULTS: Crash when page finishes loading [@ zero@0x5fddf ].  

Ocasionally it takes a few reloads in Step 3, but I've been able to reproduce in less than 10 seconds each time I've tried.

Crash reports:
bp-1e1083a9-ce5b-4bd5-93ac-30d272110126
bp-e94449e8-e05a-4d83-a7ac-c68612110126
bp-7ebe808a-b5cc-489f-9c28-b9b212110126

(Some of my crashreports mention noscript & clicking a 'play' button, in the crashreport commments. I've found that neither of those factors actually matter; hence they're not in the STR here.)
Keywords: crashreportid
Summary: Crash [@ zero@0x5fddf ] when viewing johnnystimson.bandcamp.com → Crash [@ zero@0x5fddf ] when viewing johnnystimson.bandcamp.com w/ layers acceleration
This is on:
Mozilla/5.0 (X11; Linux x86_64; rv:2.0b11pre) Gecko/20110126 Firefox/4.0b11pre
Ubuntu 10.10 x86_64
GeForce 9800 GT, w/ nvidia driver version 260.19.06 (from nvidia-settings)
Getting this too, also on Linux64.

bp-c433b4cf-81c2-49e4-9da8-3d06e2110206
bp-4a399fb6-3e88-4527-ab74-18c8b2110206

The URL in comment 0 will crash, as will navigating files on github.com.
Getting this other very similar signature [@ zero@0x5fdd7 ]  also in nvidia driver, assuming it's the same crash.
Summary: Crash [@ zero@0x5fddf ] when viewing johnnystimson.bandcamp.com w/ layers acceleration → Crash when viewing johnnystimson.bandcamp.com w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ]
Can't reproduce the crash here. NVIDIA 195.36.31 on debian sid 64bit.
Summary: Crash when viewing johnnystimson.bandcamp.com w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] → Crash in NVIDIA driver on linux 64bit w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ]
Severity: normal → critical
(In reply to comment #0)
> STEPS TO REPRODUCE:
>  2. Visit http://johnnystimson.bandcamp.com

FWIW, I hit this crash at the bandcamp front page, too: http://bandcamp.com/
What driver version?
Oh, sorry, 260.19. ok.
(yup - system settings still match Comment 1, w/ today's m-c nightly)
Here are 2 crash reports from loading http://bandcamp.com/ , FWIW:
bp-0c41bd61-1c86-47d9-a8a3-ec1d72110209 <-- after 4 reloads
bp-5160ec28-01e4-45a1-91dd-483b42110209 <-- on 1st load, @ Firefox startup
This required me to shuffle things a bit so that we blacklist before doing any nontrivial OpenGL calls.
Attachment #511204 - Flags: review?(vladimir)
New crash signature [@ zero@0x41667 ] appears in beta 11 crash reports. The stack trace is useless, but all these crash reports have libnvidia-glcore.so.260.19.36 in the Modules tab.
Summary: Crash in NVIDIA driver on linux 64bit w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] → Crash in NVIDIA driver on linux 64bit w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ]
Summary: Crash in NVIDIA driver on linux 64bit w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ] → Crash in NVIDIA driver 260.19 on linux 64bit w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ]
Blocks: 601079
Summary: Crash in NVIDIA driver 260.19 on linux 64bit w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ] → Crash in NVIDIA driver 260.19 on linux 64bit w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ] [@ zero@0x4166f ]
Attachment #511204 - Flags: approval2.0+
http://hg.mozilla.org/mozilla-central/rev/b39a1fff95fb
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Backed out due to test failures

http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=80211f053c46
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Kev, can you please report this to NVIDIA? We are forced to blackist their newest 260.19 driver version on linux because of many crashes documented here. The Chromium team is having issues too with this driver version:

https://www.khronos.org/webgl/public-mailing-list/archives/1102/msg00113.html
Whiteboard: need-to-report-to-nvidia
Are you all running the LATEST Nvidia drivers? 260.19.06 is REALLY OLD beta driver according to this post, released in Sept 2010.

http://www.nvnews.net/vbulletin/showthread.php?p=2318734

When 260.19.36 was released on January 21 2011. 

http://www.nvidia.com/object/linux-display-amd64-260.19.36-driver.html
"Release Date: 2011.01.21"

Please don't blocklist wrong driver versions just because some people don't update their drivers to the LATEST version from 2011.
As said above, we also have many crash reports with 260.19.36. (see e.g. comment 10).

If we want to blacklist fewer setups, two things we can do:
 * only blacklist in 64bit code (apparently no crashes in 32bit)
 * investigate if crashes happen only on certain devices. Maybe we should add the glGetString(GL_RENDERER) to crash report annotations, to compensate for the lack of GfxInfo on linux.
OOPS, sorry about that. I apologize, I was looking at the various crash reports which mentioned 260.19.06 and somehow I missed your post.

The latest Nvidia beta drivers is 270.18 beta. Can someone please test this then and report to Nvidia if it also crashes so Nvidia can fix it for the next official non beta driver release?

http://www.nvidia.com/object/linux-display-amd64-270.18-driver.html
Blocks: 633460
Two changes here:

 * we now blacklist sub-sub-versions, as among the 260.19.x series, only .06 and .12 have a clear reason to be blacklisted. The .36 crashes are still shrouded in mystery, and have gone away by themselves on February 10.

 * added 195.36.24 because of bug 633460.
Attachment #512028 - Flags: review?(vladimir)
> NVIDIA 270.18 is a major cause of crashes:
There are only 2 users with this problem.
(In reply to comment #21)
> > NVIDIA 270.18 is a major cause of crashes:
> There are only 2 users with this problem.

Interestingly, 2 users are enough to promote this very high on the top crashers list in linux nightlies.

That might mean that we should only look at betas for data, nightlies on linux don't have enough users to be representative.
On Linux, there are 10,000 users in 4.0b11 or 4.0b12pre (1,700,000 users in 3.6.13).
Do you know how it breaks down between b11 and b12pre?
> Do you know how it breaks down between b11 and b12pre?
For the last week, with a zero signature, there have been no crashes in 4.0b11 and 110 crashes in 4.0b12pre.
So the NVIDIA driver crash could be due either to a code change in GL code or to a driver bug in 260.19.x (x<=36).
But as there have been 2 crashes at libnvidia-glcore.so.260.19.06@0xe2b3a5 in 4.0b11, the second hypothesis seems to be the good one.
This strange: signature zero@0x41667 (driver 260.19.36) does not appear at all in builds newer than February 9:

https://crash-stats.mozilla.com/report/list?product=Firefox&branch=2.0&version=Firefox%3A4.0b12pre&platform=linux&query_search=signature&query_type=exact&query=&date=02%2F14%2F2011%2008%3A48%3A05&range_value=1&range_unit=weeks&hang_type=crash&process_type=browser&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=zero%400x41667

Likewise, signature zero@0x5fddf  (driver 260.19.12) disappeared after February 9 builds, except for one single occurrence in the February 10 build:

https://crash-stats.mozilla.com/report/list?product=Firefox&branch=2.0&version=Firefox%3A4.0b12pre&platform=linux&query_search=signature&query_type=exact&query=&date=02%2F14%2F2011%2008%3A48%3A05&range_value=1&range_unit=weeks&hang_type=crash&process_type=browser&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=zero%400x5fddf

Now the weird thing is that my change blacklisting version 260.19 only appeared in the February 12 nightly build.

So the conclusion is that these crashes, at least in 260.19.12 and 260.19.36,  *WENT AWAY BY THEMSELVES*.

Consequently I would like to un-blacklist this version so 1) we don't uselessly blacklist, if that was just a bug in our code that got fixed, and 2) so we get crash data if it comes back.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
> Now the weird thing is that my change blacklisting version 260.19 only appeared
> in the February 12 nightly build.

Even February 13, actually.
Bad news -- I just updated to NVIDIA driver 260.19.36 and the latest nightly, and I was still able to reproduce this. (after defining MOZ_GLX_IGNORE_BLACKLIST=1 to override the earlier blacklisting)

bp-a3e31699-0913-4dda-8302-3b9702110214
bp-d6a6ba35-fde6-4f04-8e9c-e315c2110214

Also -- when the crash happens, I see this in my terminal:
> ###!!! [Child][SyncChannel] Error: Channel error: cannot send/recv
Not sure if that's related at all.  (probably just a child process getting confused when the parent dies?)

So, I think this contradicts the bjacob's assumption in Comments 26 - 28.
Here's a chat log with Daniel. The conclusion is that, at least in his case, this crash seems purely related to accelerated layers and not WebGL. Since heavy WebGL demos and the WebGL conformance test suite pass without crash, there's no point in blacklisting WebGL. See at the end of this chat log my conclusions on what we should do.


[15:35] <bjacob> does the crash happen only when you define MOZ_GLX_IGNORE_BLACKLIST=1 ?
[15:36] <bjacob> next question: what revision is your build from?
[15:37] <dholbert> 2nd question: Built from http://hg.mozilla.org/mozilla-central/rev/26421a3b68f3
[15:38] <dholbert> yup, seems like no crash w/out that env variable
[15:38] <bjacob> ok
[15:38] <dholbert> (I have layers.force-accel on, too, but I assume it has no effect w/o that variable, since my driver is blacklisted)
[15:38] <bjacob> so the blacklisting is actually, for sure, what makes the crash go away for you?
[15:39] <bjacob> indeed
[15:39] <bjacob> now, next question
[15:39] <dholbert> Disabling layer accel is actually what for sure makes the crash go away, from my experience
[15:39] <bjacob> can you remove layers.force-accel
[15:39] <bjacob> ah
[15:39] <bjacob> next question
[15:39] <dholbert> and blacklisting seems to be a way of achieving that
[15:39] <bjacob> without layers accel, if you go to some heavy webgl app
[15:39] <bjacob> does it crash?
[15:39] <bjacob> with MOZ_GLX_IGNORE_BLACKLIST
[15:40] <dholbert> Not in the past - I've gone all the way through Flight of the Navigator multiple times
[15:40] <dholbert> (before this was blacklisted)
[15:40] <bjacob> can you try now?
[15:40] <dholbert> yup
[15:40] <bjacob> http://webglsamples.googlecode.com/hg/aquarium/aquarium.html
[15:41] <bjacob> and most importantly, the test suite: https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/webgl-conformance-tests.html
[15:41] <dholbert> aquarium is running fine
[15:41] <dholbert> (disabled layers accel, enabled env variable)
[15:41] <bjacob> ok (confirm it's the settings i want)
[15:41] <bjacob> confirming
[15:41] <dholbert> Trying that test suite
[15:42] <dholbert> no crash so far... (nearly halfway through)
[15:42] <bjacob> let it finish
[15:43] <dholbert> done
[15:43] <dholbert> no crash
[15:45] <bjacob> ok cool
[15:46] <bjacob> so my guess is that the crash only happens in layers accel
[15:46] <bjacob> which is a non-default config on linux
[15:46] <dholbert> Yup, that's been my experience s far
[15:46] <bjacob> so
[15:46] <dholbert> *so far
[15:46] <bjacob> what we should do is
[15:46] <bjacob> 1) remove the blacklisting, at least from there
[15:46] <bjacob> 2) consider moving it to layers code
[15:47] <bjacob> 3) add crash-report annotations whenever people either force-enable layers, or define MOZ_GLX_IGNORE_BLACKLIST
[15:47] <dholbert> That all sounds good to me!
[15:47] <bjacob> 4) whenever someone reports this on bugzilla, ask if they force-enabled layers
Attachment #512173 - Flags: review? → review?(joe)
Depends on: 627464
Attachment #512173 - Flags: review?(joe) → review+
Attachment #512173 - Flags: approval2.0+
Removed code blocking 260.19:

http://hg.mozilla.org/mozilla-central/rev/c58b7b8bc5ef

Instead, working in bug 627464 to add crash-report app-notes about force-enabled features so we can start to make sense of these crashes.
(In reply to comment #31)
> Removed code blocking 260.19:
> 
> http://hg.mozilla.org/mozilla-central/rev/c58b7b8bc5ef
> 
> Instead, working in bug 627464 to add crash-report app-notes about
> force-enabled features so we can start to make sense of these crashes.

Would it make more sense to just ignore preferences under Linux to enable features that we are not ready for testing under Linux at this point?  If we are really not interested in results of such testing at this point, and setting the preferences is likely to just lead to crashes we really have no interest in investigating at this point, why give users this option?
(adding a signature for what looks like another instance of this -- bp-6b279ab4-dc0d-40fe-8a61-3f35c2110215 -- at http://www.frequency.com/video/judge-rules-cops-cannot-make-you-stop/2640551 )
Summary: Crash in NVIDIA driver 260.19 on linux 64bit w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ] [@ zero@0x4166f ] → Crash in NVIDIA driver 260.19 on linux 64bit w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ] [@ zero@0x4166f ] [@ @0x0 | zero@0x5fdd7 ]
When I try to reproduce this (with original URL) in my debug build, I get tons of copies of this assertion on page-reload:
> ###!!! ASSERTION: Failed to make GL context current!: 'succeeded', file gfx/thebes/GLContextProviderGLX.cpp, line 353
with some copies of this warning interspersed:
> WARNING: Failed to create GLXContext!: file gfx/thebes/GLContextProviderGLX.cpp, line 307
Thanks, this is very interesting, because that means that at that point we know that something is going wrong and maybe we should do something about it to avoid crashing later.
Summary: Crash in NVIDIA driver 260.19 on linux 64bit w/ layers acceleration [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ] [@ zero@0x4166f ] [@ @0x0 | zero@0x5fdd7 ] → Crash in NVIDIA driver 260.19 on linux 64bit w/ layers acceleration & flash plugin [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ] [@ zero@0x4166f ] [@ @0x0 | zero@0x5fdd7 ]
Here's a gdb backtrace of the first instance of the assertion-failure from Comment 34.
http://www.nvnews.net/vbulletin/showthread.php?p=2392215

The latest Nvidia Linux driver is 270.26 beta, can someone test this driver and see if it still crashes?
I'm now running 270.18 Beta, the current latest version available via the "official" driver download page at http://www.nvidia.com/Download/Find.aspx .  It still crashes for me.

However, I just tried a debug build from before & after mattwoodrow's checkin for bug 634366, and I've confirmed that his checkin fixes both the crash and the assertion-failures here.  (Assertion-failures were fixed by removing the "CreateForNativePixmapSurface" call which is at level 6 in the backgrace from Comment 37.)

(Since comment 34, I (and I think bjacob) have been pretty sure this isn't an NVIDIA driver bug after all.)

Marking this as a FIXED by bug 634366's checkin.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Depends on: 634366
Resolution: --- → FIXED
Whiteboard: need-to-report-to-nvidia
David: In case that was in response to Comment 39 -- note that the checkin that fixed this in comment 39 has only *just* landed, so it won't show up in nightly builds until tomorrow morning.
(In reply to comment #40)
> Still crashes.
> 
> https://crash-stats.mozilla.com/report/index/e95d8f51-da03-4551-9362-32ef32110217

The "still crashes" is in reply to Comment 38, not 39. I tested it using today's nightly and I don't know whether it has the 634366 checkin.
(In reply to comment #41)
> David: In case that was in response to Comment 39 -- note that the checkin that
> fixed this in comment 39 has only *just* landed, so it won't show up in nightly
> builds until tomorrow morning.

Okay, I'll test it then, and I'll be more careful about mid-air collisions.
Sounds great, thanks! (and thanks for the data on the new nvidia driver!)
Whiteboard: [fixed in bug 634366]
Assignee: nobody → bjacob
Crash Signature: [@ zero@0x5fddf ] [@ zero@0x5fdd7 ] [@ zero@0x41667 ] [@ zero@0x4166f ] [@ @0x0 | zero@0x5fdd7 ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: