Closed Bug 1513225 Opened 5 years ago Closed 5 years ago

Crash in libnvidia-glcore.so.415.22@0xdab7e0

Categories

(External Software Affecting Firefox :: Other, defect, P3)

Unspecified
Linux
defect

Tracking

(firefox65 fixed, firefox66 fixed, firefox67 fixed)

RESOLVED FIXED
Tracking Status
firefox65 --- fixed
firefox66 --- fixed
firefox67 --- fixed

People

(Reporter: jan, Unassigned)

References

()

Details

(Keywords: crash, nightly-community)

Crash Data

This bug was filed from the Socorro interface and is
report bp-686ba713-862d-4c34-a220-e44e40181211.
=============================================================

Top 10 frames of crashing thread:

0 libnvidia-glcore.so.415.22 libnvidia-glcore.so.415.22@0xdab7e0 
1 libGLX_nvidia.so.415.22 libGLX_nvidia.so.415.22@0x530de 
2 firefox-bin Allocator<MozJemallocBase>::malloc memory/build/malloc_decls.h:37
3 libGLX_nvidia.so.415.22 libGLX_nvidia.so.415.22@0x80a9a 
4 libGLX_nvidia.so.415.22 libGLX_nvidia.so.415.22@0xaef15 
5 libGLX_nvidia.so.415.22 libGLX_nvidia.so.415.22@0xb2d74 
6 libGLX_nvidia.so.415.22 libGLX_nvidia.so.415.22@0x88d26 
7 libGLX_nvidia.so.415.22 libGLX_nvidia.so.415.22@0x85f77 
8 firefox-bin arena_t::MallocSmall memory/build/mozjemalloc.cpp:2809
9 firefox-bin arena_dalloc memory/build/mozjemalloc.cpp:3232

=============================================================
I don't see any Wayland relevance here, it was running on X11 display.
No longer blocks: wayland
Priority: -- → P3
Crash Signature: [@ libnvidia-glcore.so.415.22@0xdab7e0] [@ libnvidia-glcore.so.415.18@0xdab640 ] → [@ libnvidia-glcore.so.415.22@0xdab7e0] [@ libnvidia-glcore.so.415.18@0xdab640 ] [@ libnvidia-glcore.so.415.23@0xdab7e0]
I filed a bug on nVidia's tracker since this seems to be a problem on their side.
Does anyone know the steps required to reproduce the problem?
Crash Signature: [@ libnvidia-glcore.so.415.22@0xdab7e0] [@ libnvidia-glcore.so.415.18@0xdab640 ] [@ libnvidia-glcore.so.415.23@0xdab7e0] → [@ libnvidia-glcore.so.415.22@0xdab7e0] [@ libnvidia-glcore.so.415.18@0xdab640 ] [@ libnvidia-glcore.so.415.23@0xdab7e0] [@ libnvidia-glcore.so.415.18@0xded3de ] [@ libnvidia-glcore.so.415.22@0xfb6197 ] [@ libnvidia-glcore.so.415.18@0xfb5f17 ] [@ li…
Is it expected that the addresses point at the middle of instructions?
If I look at libGLX_nvidia.so.415.22@0x530de, 0x530de is in the middle of an instruction:

   530d9:       ff 92 38 03 00 00       callq  *0x338(%rdx)
   530df:       31 ff                   xor    %edi,%edi

Is that a known quirk of Mozilla's crash report system, potentially?
(The stack is still valid and the crash likely happens on that call, but it's strange not to be pointing at the xor).
(In reply to Arthur Huillet from comment #5)
> Is it expected that the addresses point at the middle of instructions?
> If I look at libGLX_nvidia.so.415.22@0x530de, 0x530de is in the middle of an
> instruction:

That address is in the second frame of the crashed stack and it's been obtained via stack-scanning so it's not 100% reliable. In this crashes only the first frame is obtained via the context, all the remaining frames are found via an heuristic because there's no frame-pointer available and we have no way of unwinding the stack in a reliable way.
The first frame is also off by one byte. I assume something is doing -1 across the board in order to point at the "right" instruction. 
Anyway, we're trying to get a reproduction of the problem, but so far we do not have one. If anyone has more information, for example a nvidia-bug-report.log.gz file, that would be useful. I believe it may be related to Xinerama being in use.
Crash Signature: libnvidia-glcore.so.415.22.01@0xdabfc0 ] → libnvidia-glcore.so.415.22.01@0xdabfc0 ] [@ libnvidia-glcore.so.415.23@0xfb6247 ]
I tried to repro issue by opening multiple firefox browsers  with below configuration but no luck. I had enabled Xinerama as well.

Ubuntu 16.04.5 LTS +  4.15.0-43-generic  +  GeForce GTX 1080  +  Driver 415.23 

Could someone please provide repro steps and nvidia bug report for further analysis ?
Hi, I am still awaiting for repro steps to reproduce issue internally and nvidia bug report for further analysis.
I've sifted through the comments but there's no clear pattern as to when this is happening. Some users are getting the crash on startup, some are getting it in tabs that they're not even looking at while others mention the crash happened right after the computer woke up from standby. The crashing URLs are all over the place too, but facebook.com comes up more often than not. The stacks are all the same but they're a bit weird. The topmost interesting frame is here:

https://hg.mozilla.org/mozilla-central/annotate/6e96c7ec0d1187c1b488dd4ba645df9cfd68ec16/gfx/gl/GLXLibrary.h#l128

Then we have a few frames in nVidia's userspace driver:

Ø 10 	libGLX.so.0.0.0 	libGLX.so.0.0.0@0x4c7b
Ø 9 	libGLX_nvidia.so.415.18 	libGLX_nvidia.so.415.18@0x77605 	
Ø 8 	libGLX_nvidia.so.415.18 	libGLX_nvidia.so.415.18@0x54d3d 	

Then we go back into our code because of an allocation because we redirect all of them:

https://hg.mozilla.org/mozilla-central/annotate/6e96c7ec0d1187c1b488dd4ba645df9cfd68ec16/memory/build/mozjemalloc.cpp#l2988

Then we go back into nVidia's code which is odd because we're (apparently) within a memset():


Ø 3 	libGLX_nvidia.so.415.18 	libGLX_nvidia.so.415.18@0x85f77 	
Ø 2 	libGLX_nvidia.so.415.18 	libGLX_nvidia.so.415.18@0x80a9a 	
Ø 1 	libGLX_nvidia.so.415.18 	libGLX_nvidia.so.415.18@0x530de 	
Ø 0 	libnvidia-glcore.so.415.18 	libnvidia-glcore.so.415.18@0xdab640
We have been unable to reproduce the problem internally.

The stack makes only limited sense (as pointed out in comment #10), but I have been able to decode the NVIDIA-related symbols to something that suggests that the problem might be a duplicate of a NVIDIA bug 200469111, which appeared in r415+ driver releases on Xinerama configurations. But even with a Xinerama setup we have so far not obtained a reproduction of the problem.
We have a fix for that bug, so given the lack of reproduction of the issue at hand and the likely similarity with that other issue, the best course of action is to revert to r410 for the time being, and our next releases (including in the r415 branch) should carry the fix. I expect the bug will then disappear.

There haven't been anymore crashes with driver version past 415.x so I guess this must have been fixed in new driver versions.

We never managed to reproduce this problem internally, so we haven't fixed it on purpose. But we did fix a different crash between 415.23 and 415.27 that could conceivably also apply here.

(In reply to Arthur Huillet from comment #13)

We never managed to reproduce this problem internally, so we haven't fixed it on purpose. But we did fix a different crash between 415.23 and 415.27 that could conceivably also apply here.

It sounds like it must have been that, I could find no crashes on version 415.27. Closing.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

Since the fix ultimately came from an upstream driver update, moving this to a component which reflects that.

Component: Graphics → Other
Product: Core → External Software Affecting Firefox
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.