crash [@WebGLContext::ValidateProgram] in NVIDIA driver on Mac

RESOLVED FIXED in mozilla9

Status

()

--
critical
RESOLVED FIXED
8 years ago
2 years ago

People

(Reporter: posidron, Assigned: bjacob)

Tracking

(Blocks: 1 bug, {crash, testcase})

Trunk
mozilla9
x86_64
Mac OS X
crash, testcase
Points:
---

Firefox Tracking Flags

(blocking2.0 final+, status1.9.2 unaffected, status1.9.1 unaffected)

Details

(Whiteboard: [sg:critical], crash signature)

Attachments

(8 attachments)

(Reporter)

Description

8 years ago
Created attachment 472472 [details]
testcase

The testcase is a bit large. FF crashes always at a different function but glReadPixels() is always called before.
(Reporter)

Comment 1

8 years ago
Created attachment 472473 [details]
callstack
(Reporter)

Updated

8 years ago
Summary: WebGL gl_xxx crash [@glReadPixels] → WebGL gl_xxx crash [@glReadPixels/glrCompExecuteKernel]
(Assignee)

Comment 2

8 years ago
I can't reproduce the crash here (linux x86-64 nvidia proprietary driver 195.36)

This page doesn't display or do anything here, all I can see is WebGL errors:

WebGL: framebufferRenderbuffer: renderbuffer: deleted object passed as argument
WebGL: texParameter: no texture is bound to this target
WebGL: GetFramebufferAttachmentParameter: pname: invalid enum value 0x8cd3
WebGL: GetFramebufferAttachmentParameter: pname: invalid enum value 0x8cd3
(Reporter)

Comment 3

8 years ago
I haven't tested it on Linux. On MacOSX it's reproducible.
(Assignee)

Comment 4

8 years ago
the assembly says it's crashing when trying to write into a buffer. Seems to mean the allocated buffer isn't big enough for what it's trying to write to it. Strange because the code seems safe in that respect:

from GLContext.cpp:

void
GLContext::ReadPixelsIntoImageSurface(GLint aX, GLint aY,
                                      GLsizei aWidth, GLsizei aHeight,
                                      gfxImageSurface *aDest)
{
    MakeCurrent();

    if (aDest->Format() != gfxASurface::ImageFormatARGB32 &&
        aDest->Format() != gfxASurface::ImageFormatRGB24)
    {
        NS_WARNING("ReadPixelsIntoImageSurface called with invalid image format");
        return;
    }

    if (aDest->Width() != aWidth ||
        aDest->Height() != aHeight ||
        aDest->Stride() != aWidth * 4)
    {
        NS_WARNING("ReadPixelsIntoImageSurface called with wrong size or stride surface");
        return;
    }

    GLint currentPackAlignment = 0;
    fGetIntegerv(LOCAL_GL_PACK_ALIGNMENT, &currentPackAlignment);
    fPixelStorei(LOCAL_GL_PACK_ALIGNMENT, 4);

    [SNIP]

    fReadPixels(0, 0, aWidth, aHeight,
                format, datatype,
                aDest->Data());

    [SNIP]
}

Seems like a question for graphics people, not WebGL-specific. Reassigning...
(Assignee)

Updated

8 years ago
Component: Canvas: WebGL → Graphics
QA Contact: canvas.webgl → thebes
(Assignee)

Updated

8 years ago
Summary: WebGL gl_xxx crash [@glReadPixels/glrCompExecuteKernel] → crash [@GLContext::ReadPixelsIntoImageSurface]
(Assignee)

Comment 5

8 years ago
Are you running with OpenGL-accelerated layers enabled? As in layers.accelerate_all or MOZ_ACCELERATED? That could explain things. This is considered "not yet ready".
(Reporter)

Comment 6

8 years ago
No, default build, default config except webgl.enabled_for_all_sites;true
(Assignee)

Comment 7

8 years ago
OK. I tried with OSMesa in valgrind, no memory error reported.

What is your graphics card and driver?
Putting this back to webgl, since that's where it belongs because it's seen via webgl.
Component: Graphics → Canvas: WebGL
QA Contact: thebes → canvas.webgl
(Reporter)

Comment 9

8 years ago
Modell-Identifizierung:	MacBookPro6,2

  Chipsatz-Modell:	NVIDIA GeForce GT 330M
  Typ:	GPU
  Bus:	PCIe
  PCIe-Lane-Breite:	x16
  VRAM (gesamt):	512 MB
  Hersteller:	NVIDIA (0x10de)
  Geräte-ID:	0x0a29
  Versions-ID:	0x00a2
  ROM-Version:	3540
  gMux-Version:	1.9.21


  Chipsatz-Modell:	Intel HD Graphics
  Typ:	GPU
  Bus:	Integriert
  VRAM (gesamt):	288 MB
  Hersteller:	Intel (0x8086)
  Geräte-ID:	0x0046
  Versions-ID:	0x0018
  gMux-Version:	1.9.21
Component: Canvas: WebGL → Graphics
seems like there's been a mid-air collision and the category change was overwritten
Component: Graphics → Canvas: WebGL
(Reporter)

Comment 11

8 years ago
Created attachment 472490 [details]
testcase-reduced

Was able to reduce the testcase to a minimum.
(Reporter)

Comment 12

8 years ago
It may also crash at the following location:

#0  0x00000001225eac8a in gleUpdateFragmentFallbackProgram ()
#1  0x00000001225db674 in gleUpdateDeferredState ()
#2  0x00000001225dc949 in gleDoSelectiveDispatchNoErrorCore ()
#3  0x000000012251ea7d in glFlush_Exec ()
#4  0x0000000102b3aa0c in mozilla::layers::BasicCanvasLayer::Updated (this=0x1220ee360, aRect=@0x7fff5fbf9970) at /Users/cdiehl/Mozilla/trunk/gfx/layers/basic/BasicLayers.cpp:719
#5  0x0000000100d136ef in mozilla::WebGLContext::GetCanvasLayer (this=0x121d31e00, aOldLayer=0x0, aManager=0x12245ebb0) at /Users/cdiehl/Mozilla/trunk/content/canvas/src/WebGLContext.cpp:554
#6  0x0000000100e1be2b in nsHTMLCanvasElement::GetCanvasLayer (this=0x121d2af20, aOldLayer=0x0, aManager=0x12245ebb0) at /Users/cdiehl/Mozilla/trunk/content/html/content/src/nsHTMLCanvasElement.cpp:545
(Reporter)

Comment 13

8 years ago
Created attachment 473087 [details]
cwlog-gldUnbindPipelineProgram - exploitable:no
(Reporter)

Comment 14

8 years ago
Created attachment 473089 [details]
cwlog-glrCompExecuteKernel - exploitable:yes

Updated

8 years ago
Group: core-security
Whiteboard: [sg:critical]
The two stack traces say it's crashing when BasicCanvasLayer::Updated(nsIntRect const&) calls glFlush(). A glFlush() crash inside of the driver means it's just crashing as the result of earlier calls. So the stack traces are actually not useful.

One thing that would be useful would be to make a "synchronous GL" mode where we'd call glFinish() after every GL call. Would make GL stack traces actually useful...

If I make you a patch doing that will you build and try it?
(Reporter)

Comment 16

8 years ago
Sure
Created attachment 473169 [details]
test case with finish() and window.dump() calls

Good news / bad news time.

Good news: no need to rebuild firefox, here's a new version of your test case that calls gl.finish() after every WebGL call. Indeed, I had forgotten, but it turns out that WebGL does have a finish() function! This new testcase also prints stuff in the terminal, so make sure to launch Minefield from a terminal to see the debug output. When you get a crash, paste here the terminal output, that should tell us which WebGL function crashed.

Bad news: this only helps debug into WebGL. If the crash is not in WebGL, for example if it is in the OpenGL layers code, this won't help. What you could do is disable OpenGL-accelerated layers (preferences layers.accelerate_xxx and environment variable MOZ_ACCELERATED) and see if the crash persists.
Ah, I had forgotten about comment 6. So ignore the part of my previous comment about accelerated layers.
(Reporter)

Comment 19

8 years ago
--- WebGL context created: 0x11cc2b000
before deleteProgram
after deleteProgram
begin add
before createShader
before shaderSource
before compileShader
before getShaderParameter
before attachShader
end add
begin add
before createShader
before shaderSource
before compileShader
before getShaderParameter
before attachShader
end add
before linkProgram
before getProgramParameter
before useProgram
end initWebGL
before deleteProgram
after deleteProgram
before createProgram
after createProgram
before validateProgram

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x000000020001e903 in gldUnbindPipelineProgram ()
OK, great! so validateProgram() is crashing. Can you confirm that the crash goes away if you remove the validateProgram() call?

The funny thing is that its code looks simple and fine:

NS_IMETHODIMP
WebGLContext::ValidateProgram(nsIWebGLProgram *pobj)
{
    WebGLuint progname;
    if (!GetGLName<WebGLProgram>("validateProgram", pobj, &progname))
        return NS_OK;

    MakeContextCurrent();

    gl->fValidateProgram(progname);

    return NS_OK;
}

When it crashes, can you attach a debugger an print the values of pobj and of progname?
Summary: crash [@GLContext::ReadPixelsIntoImageSurface] → crash [@WebGLContext::ValidateProgram]
Anyway... it's almost certainly a driver bug making it crash in glValidateProgram (what we call as gl->fValidateProgram)

Perhaps we should just let WebGLContext::ValidateProgram be a no-operation when this driver is used...
(Reporter)

Comment 22

8 years ago
Yes, doesn't crash when validateProgram is removed.

gdb $ p pobj
$14 = (nsIWebGLProgram *) 0x12617b2d0
gdb $ p progname
$15 = 4
Keywords: crash, testcase
Let's block on figuring out what's going on here - we can always unblock on it later.
Assignee: nobody → bjacob
blocking2.0: --- → final+
OK, we do know what's going on here: NVIDIA driver crash in glValidateProgram(), not our fault.

Solutions by order of decreasing goodness and increasing quickness:

 * NVIDIA fixes their driver. We could convert this testcase into a C program using OpenGL, if that might help.

 * We use ANGLE to emulate glValidateProgram. Not realistic until certain ANGLE bugs are fixed.

 * We just avoid calling glValidateProgram, which means that we make webgl.validateProgram be a dummy no-op function, on NVIDIA. After all this function is not needed for rendering.
Wait, it's not NVIDIA --- it's Apple I guess who makes their own NVIDIA driver. right?
Christoph: 2 questions:

   * are you using Apple driver for your NVIDIA card ? (Appled makes their own NVIDIA driver, right? )

   * can you reproduce the webkit nightly builds? download from
         http://nightly.webkit.org/
     and type this into a terminal:
         defaults write com.apple.Safari WebKitWebGLEnabled -bool YES
(Reporter)

Comment 27

8 years ago
Yes, I am using the default provided drivers by Apple 
- http://www.nvidia.de/page/macintosh.html
I am not able to reproduce it against WebKit (r69183)
Created attachment 484032 [details] [diff] [review]
work around the crash

This patch works around the crash by implementing WebGLContext::ValidateProgram as a no-op on Mac/NVIDIA, and printing an informatice message.
Attachment #484032 - Flags: review?(vladimir)
(In reply to comment #27)
> Yes, I am using the default provided drivers by Apple 
> - http://www.nvidia.de/page/macintosh.html
> I am not able to reproduce it against WebKit (r69183)

It's interesting that you can't reproduce with WebKit as they are definitely doing the glValidateProgram call; I wonder what difference between our and their GL setups causes us to trigger this crash; in any case it's a driver bug so all we can do is report it and work around it until it's fixed.
Vlad: ping, can you review the patch here?
Kev, do you know how to contact Apple about this?

This is a NVIDIA driver crash, but as far as I understand, Apple is the author of the NVIDIA driver on Mac.
(Assignee)

Updated

8 years ago
Summary: crash [@WebGLContext::ValidateProgram] → crash [@WebGLContext::ValidateProgram] in NVIDIA driver on Mac
Comment on attachment 484032 [details] [diff] [review]
work around the crash

Do we know that this is NVIDIA-only, or is it in the common OSX driver layer?  Note that NVIDIA does not write the OSX drivers, Apple does. r+'ing this, but this might need to just be #ifdef XP_MACOSX without the vendor check.
Attachment #484032 - Flags: review?(vladimir) → review+
In the stack traces attached to this bug, we are in GeforceGLDriver, called from GLEngine.

I don't know if that's any conclusive, and I can't find this crash on crash-stats.

I can find some Mac GL crashes on crash-stats, and most of them are NVIDIA, but that might just reflect what Mac computers have. Their stacks are probably meaningless as was the first stack here before we started calling glFinish().

Christoph: if you can still reproduce, can you please go to about:crashes and give us a link to your crash?
Benoit: Are these types of crashes you are referencing in Comment 33: http://tinyurl.com/36ac89l. I just noticed them in crash stats today.
No, they're a different crash: they are crashing on GL initialization, while the present bug is about a crash that occurs while running GL commands long after initialization.

These 4 crash reports are in all likeliness 4 times the same user crashing. The reason why it's happening now could easily be that until bug 604395 was landed, some (no data, actually) users got no GL at all, and now they do.

Right now the best thing to do about these 4 crashes would be to report to Apple; unless you get a hold of the person for whom it's crashing (then send him/her to me!)
http://hg.mozilla.org/mozilla-central/rev/0518ede13821
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Group: core-security
status1.9.1: --- → unaffected
status1.9.2: --- → unaffected
Christoph, out of curiosity, what Mac OS X version was this on? 10.5 or 10.6 or both?
(Reporter)

Comment 38

8 years ago
10.5
Can you tell me your precise 10.5.x version?
(Reporter)

Comment 40

8 years ago
Apple released an upgrade to 10.6 a few weeks back.

ProductName:	Mac OS X
ProductVersion:	10.6.6
BuildVersion:	10J56
(Reporter)

Comment 41

8 years ago
Sorry. I tested this on 10.6.4 as far as I remember.
Forwarded to Apple: bug 9129482
(Reporter)

Updated

8 years ago
Blocks: 658170
Crash Signature: [@WebGLContext::ValidateProgram]
Created attachment 557327 [details] [diff] [review]
Patch v1.0, copyTexImage2D.html fix

The fix previously done in the patch for this bug created a new bug where Macs with non-Nvidia GPU's were having their ValidateProgram() function disabled, but their getProgramParameter() function wasn't changed accordingly. Thus, validation would never be done, and the status would always be non-success even if the program was valid. This patch fixes that.
Attachment #557327 - Flags: review?(bjacob)
(Assignee)

Updated

7 years ago
Attachment #557327 - Flags: review?(bjacob) → review+

Comment 45

7 years ago
http://hg.mozilla.org/mozilla-central/rev/651f1df3e9d7
Target Milestone: --- → mozilla9

Comment 46

2 years ago
The workaround that was added in this issue is about 5 years old now. Marked down bug 1284425 to discuss if the workaround is relevant any more on recent OS X versions.
You need to log in before you can comment on or make changes to this bug.