Closed Bug 814716 Opened 8 years ago Closed 8 years ago

WebGL conformance test locks up GPU under Linux with NVIDIA graphics using nouveau drivers

Categories

(Core :: Graphics, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla20
Tracking Status
firefox17 --- wontfix
firefox18 + wontfix
firefox19 + wontfix
firefox20 + fixed
firefox-esr10 --- unaffected
firefox-esr17 18+ fixed
b2g18 --- unaffected

People

(Reporter: wgianopoulos, Assigned: bjacob)

References

()

Details

(Keywords: sec-moderate, Whiteboard: [adv-main20-][adv-esr17-][qa-])

Attachments

(2 files, 4 obsolete files)

The textures/texture-size-limit.html test fo the WebGL conformance test locks up NVIDIA GPU to the point where the Xorg server can not be restarted if using the nouveau drivers.  I can not duplicate this issue with ADM/ATI graphics.





  Graphics

        Adapter Description
        nouveau -- Gallium 0.4 on NVC1

        Device ID
        Gallium 0.4 on NVC1

        Driver Version
        2.1 Mesa 8.0.4

        GPU Accelerated Windows
        0/1 Basic

        Vendor ID
        nouveau

        WebGL Renderer
        nouveau -- Gallium 0.4 on NVC1

        AzureCanvasBackend
        cairo

        AzureContentBackend
        none

        AzureFallbackCanvasBackend
        none
Summary: WebGL conformance test lock up GPU under Linux with NVIDIA graphics using nouveau drivers → WebGL conformance test locks up GPU under Linux with NVIDIA graphics using nouveau drivers
I should mention this is using 32-bit Firefox nightly on a 64-bit system running 64-bit fedora 18 with the latest kernel and xorg and mesa drivers.
(In reply to Bill Gianopoulos [:WG9s] from comment #0)
> The textures/texture-size-limit.html test fo the WebGL conformance test
> locks up NVIDIA GPU to the point where the Xorg server can not be restarted
> if using the nouveau drivers.  I can not duplicate this issue with ADM/ATI
                                                                     ^^^
                                                                     AMD
Severity: normal → critical
Version: unspecified → Trunk
I have decided this should probably be security sensitive because it discloses how a malicious website deficiency to me).
Flags: needinfo?
Flags: needinfo?
(In reply to Bill Gianopoulos [:WG9s] from comment #3)
> I have decided this should probably be security sensitive because it
> discloses how a malicious website deficiency to me).

Was trying to say before auto-correct got involved.  Disclosed how a malicious website could crash the X server impacting other applications besides Firefox and get the OS in a loop trying to restart the x server such that the only obvious way to fix would be a hard boot which could potentially render the OS non-bootable.
That would qualify as a denial-of-service and following https://wiki.mozilla.org/Security_Severity_Ratings it seems like this is sec-moderate.

It just seems that this driver is too optimistic as to the max texture size that it supports, so a work-around would be to artificially limit the texture size.  Please go to webglreport.sf.net and paste the results (I want the 2 MAX_TEXTURE_SIZE values) here.
waht I see for textures is this:

Max Texture Size: 	16384
Max Cube Map Texture Size: 	16384
Max Combined Texture Image Units: 	32
Max Anisotropy: 	16
This makes me think either it is too optimistic, or something that should be defined as a uint16 is defined as an int16,
Or possibly vise-versa.
Does this help?
You can use these try builds to test:
  https://tbpl.mozilla.org/?tree=Try&rev=fede02ff2e83
Attachment #684781 - Flags: review?(jgilbert)
This is a patch that we want anyways, regardless of this bug. The issue is that when we tweak max sizes, we don't adjust accordingly the MAX_ values we return in GetParameter.
Attachment #684782 - Flags: review?(jgilbert)
It is better with these patches.  The nouveau driver still hangs, but instead of the Xorg process crashing in a loop, it is still running and I can get the system back running by killing the Xorg process.

I will try to find time later today to try different values for the Texture size maximum to see if that helps.  If I figure out good values I will report back here.
Thanks! To do this faster, rather than editing and recompiling Gecko everytime, you'll be better off editing the testcase. It must be getting the max texture size with a gl.getParameter(gl.MAX_TEXTURE_SIZE) somewhere. Replace that by constants like 2048, etc.
NO I already fond the correct patch I am just now doing a new build form a clean clone fromthe tip to make sure it works.
(In reply to Benoit Jacob [:bjacob] from comment #9)
> Created attachment 684781 [details] [diff] [review]
> limit texture size on nouveau to 4k (instead of 16k claimed by driver)
> 
> Does this help?
> You can use these try builds to test:
>   https://tbpl.mozilla.org/?tree=Try&rev=fede02ff2e83

Well seems that With my new patch we complete the conformance suite without crashing.
This patch is both necessary and sufficient to fix the issue in this bug.  It also fixes and issue with a comment semi-related to this issues that has a typo in the bug number.
Attachment #684864 - Flags: review?(jgilbert)
Attachment #684781 - Flags: review?(jgilbert)
Oh I and i do not understand shy I am no longer able to mark previous patches as being obsolete.  What is that about????
The checkbox for that is under 'edit details'
Attachment #684864 - Attachment is patch: true
Comment on attachment 684864 [details] [diff] [review]
limit texture size on nouveau to 2k (instead of 16k claimed by driver) for MaxCubeMapTextureSize

Review of attachment 684864 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContext.cpp
@@ +523,5 @@
> +#ifdef MOZ_X11
> +        if (mWorkAroundDriverBugs &&
> +            mVendor == VendorNouveau) {
> +            // see bug 814716. Clamp MaxCubeMapTextureSize at 2K for Nouveau.
> +            mMaxCubeMapTextureSize = NS_MIN(mMaxCubeMapTextureSize, 2048);

Oh, so it is enough to limit the size of cube map textures? Good.
Yes but it seems that because I am not part of the security group, I am not able to mark  a previous patch as being obsolete.   I guess that actually makes sense.
Attachment #684781 - Attachment is obsolete: true
Oh I also notice this bug is not assigned to anyone.  Should it be assigned to you or to me.  Seems assigned to no one is not really productive toward getting it resolved.
It doesn't matter, since the patches are already written. We just need to get this reviewed and landed. Let's assign to you since we'll have to choose one for landing.
Assignee: nobody → bill
Well might be better to assign to you since I have no write access to the tree.
That said, i also have can reproduce the issue so if review comments end up making the fix no longer resolving the issue I guess I am in the best position to identify that type of issue.  So I guess it makes most sense for this to be assigned to me.
found one more comment with typo in the bug number.
Attachment #684864 - Attachment is obsolete: true
Attachment #684864 - Flags: review?(jgilbert)
Attachment #684884 - Flags: review?(jgilbert)
Attachment #684884 - Attachment is patch: true
Oh I should mention that my daily builds at http://www.wg9s.com/mozilla/firefox/ include these patches.
Comment on attachment 684782 [details] [diff] [review]
correct the way that we tweak max sizes

Review of attachment 684782 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContextGL.cpp
@@ +1990,5 @@
>                  return JS::NullValue();
>              }
>  
> +        case LOCAL_GL_MAX_TEXTURE_SIZE:
> +            return JS::Int32Value(mGLMaxTextureSize);

Oops, looks like the min-capability-mode patch missed these. Good catch.

@@ -3373,5 @@
>      if (width < 0 || height < 0)
>          return ErrorInvalidValue("renderbufferStorage: width and height must be >= 0");
>  
> -    if (!mBoundRenderbuffer || !mBoundRenderbuffer->GLName())
> -        return ErrorInvalidOperation("renderbufferStorage called on renderbuffer 0");

Why was this removed?

::: content/canvas/src/WebGLContextValidate.cpp
@@ +910,5 @@
>          mGLMaxVertexTextureImageUnits = MINVALUE_GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS;
>      } else {
> +        mGLMaxTextureSize = gl->GetMaxTextureSize();
> +        mGLMaxCubeMapTextureSize = gl->GetMaxCubeMapTextureSize();
> +        mGLMaxRenderbufferSize = gl->GetMaxRenderbufferSize();

This is not necessary, since this is what fGetIntegerv should be doing anyways.

Also, add a line for mGLMaxRenderbufferSize to the MinCapabilityMode branch above.

::: gfx/gl/GLContext.h
@@ +1690,5 @@
>                                   const char *extension);
>  
> +    GLint GetMaxTextureSize() const { return mMaxTextureSize; }
> +    GLint GetMaxCubeMapTextureSize() const { return mMaxCubeMapTextureSize; }
> +    GLint GetMaxRenderbufferSize() const { return mMaxRenderbufferSize; }

These aren't really necessary, and I don't know as we use them enough to warrant having functions for them. The 'proper' way to ask these questions through GetInteger should suffice.

If you have a good reason to add these, they should assert that the values they get back are the same as the values one would get through GetInteger.
Attachment #684782 - Flags: review?(jgilbert) → review-
Comment on attachment 684884 [details] [diff] [review]
limit texture size on nouveau to 2k (instead of 16k claimed by driver) for MaxCubeMapTextureSize

Review of attachment 684884 [details] [diff] [review]:
-----------------------------------------------------------------

Close, but the security fix should be more minimal.

::: gfx/gl/GLContext.cpp
@@ +512,5 @@
>  
>  #ifdef XP_MACOSX
>          if (mWorkAroundDriverBugs &&
>              mVendor == VendorIntel) {
> +            // see bug 737182 for 2D textures, bug 684882 for cube map textures.

Thanks for fixing this.

@@ +517,4 @@
>              mMaxTextureSize        = NS_MIN(mMaxTextureSize,        4096);
>              mMaxCubeMapTextureSize = NS_MIN(mMaxCubeMapTextureSize, 512);
>              // for good measure, we align renderbuffers on what we do for 2D textures
>              mMaxRenderbufferSize   = NS_MIN(mMaxRenderbufferSize,   4096);

[1]

@@ +523,5 @@
> +#ifdef MOZ_X11
> +        if (mWorkAroundDriverBugs &&
> +            mVendor == VendorNouveau) {
> +            // see bug 814716. Clamp MaxCubeMapTextureSize at 2K for Nouveau.
> +            mMaxCubeMapTextureSize = NS_MIN(mMaxCubeMapTextureSize, 2048);

[2]

::: gfx/gl/GLContext.h
@@ +1948,5 @@
>      bool mWorkAroundDriverBugs;
>  
>      bool IsTextureSizeSafeToPassToDriver(GLenum target, GLsizei width, GLsizei height) const {
> +#if defined XP_MACOSX || defined MOZ_X11
> +        if (mWorkAroundDriverBugs) {

This expands the platforms we work around on from {mac}x{intel} to effectively {mac,linux}x{intel,nvidia,amd}, when it seems we should only be adding the case for linux+nouveou.

I can see why it's tempting not to check the platforms here as well, we should either only run this code on affected platforms, or run this code everywhere (in the spirit of "don't trust the driver"). The latter is not a bad idea, but it should be done as a separate bug. That way this fix limits changes to functionality, which would make this riskier to uplift into Aurora and Beta.

In the spirit of what keeping how we do things currently:
I recommend adding a |bool mNeedsTextureSizeChecks|, and setting in to true at [1] and [2], and checking that instead of doing the check for affected platforms in two places.
Attachment #684884 - Flags: review?(jgilbert) → review-
This addresses the previous review comments.
Attachment #684884 - Attachment is obsolete: true
Attachment #685444 - Flags: review?(jgilbert)
Reassigning this back to bjacob as I think most of the remaining work is likely to be on the other patch.
Assignee: bill → bjacob
Attachment #685444 - Flags: review?(jgilbert) → review+
Comment on attachment 685444 [details] [diff] [review]
Limit MaxCubeMapTextureSize for Nouveau to 2K

[Security approval request comment]
How easily can the security issue be deduced from the patch?
Yes, easily.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
The code itself is enough.

Which older supported branches are affected by this flaw?
Firefox >= 7 (before we didn't allow running on this driver).

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
Should be trivial enough to backport.

How likely is this patch to cause regressions; how much testing does it need?
Relatively simple patch; any regression caused by it would likely be something of the form "an existing driver work-around is no longer taking effect". Specifically, the existing work-around there is the Mac Intel bug work arounds, bug 684882 and bug 737182. So you could always re-test them to check we didn't regress them.
Attachment #685444 - Flags: sec-approval?
(In reply to Jeff Gilbert [:jgilbert] from comment #26)
> @@ -3373,5 @@
> >      if (width < 0 || height < 0)
> >          return ErrorInvalidValue("renderbufferStorage: width and height must be >= 0");
> >  
> > -    if (!mBoundRenderbuffer || !mBoundRenderbuffer->GLName())
> > -        return ErrorInvalidOperation("renderbufferStorage called on renderbuffer 0");
> 
> Why was this removed?

Because we were doing this check twice --- look 10 lines above.
Attachment #684782 - Attachment is obsolete: true
Attachment #685921 - Flags: review?(jgilbert)
Comment on attachment 685921 [details] [diff] [review]
correct the way that we tweak max sizes

Review of attachment 685921 [details] [diff] [review]:
-----------------------------------------------------------------

Nits.

::: content/canvas/src/WebGLContextValidate.cpp
@@ +910,4 @@
>          mGLMaxTextureImageUnits = MINVALUE_GL_MAX_TEXTURE_IMAGE_UNITS;
>          mGLMaxVertexTextureImageUnits = MINVALUE_GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS;
>      } else {
>          gl->fGetIntegerv(LOCAL_GL_MAX_TEXTURE_SIZE, &mGLMaxTextureSize);

If these vars aren't initialized previously, they should be.

::: gfx/gl/GLContext.cpp
@@ +946,5 @@
>  {
> +    if (!(aFlags & TextureImage::ForceSingleTile) && mGL->WantsSmallTiles()) {
> +      mTileSize = 256;
> +    } else {
> +      mGL->fGetIntegerv(LOCAL_GL_MAX_TEXTURE_SIZE, (GLint*) &mTileSize);

Initialize mTileSize before, since GetInteger is technically fallible.

::: gfx/layers/opengl/CanvasLayerOGL.cpp
@@ +161,5 @@
>    mBounds.SetRect(0, 0, aData.mSize.width, aData.mSize.height);
>        
>    // Check the maximum texture size supported by GL. glTexImage2D supports
>    // images of up to 2 + GL_MAX_TEXTURE_SIZE
> +  GLint texSize;

Initialize me.
Attachment #685921 - Flags: review?(jgilbert) → review+
Comment on attachment 685444 [details] [diff] [review]
Limit MaxCubeMapTextureSize for Nouveau to 2K

sec-approval+ but please don't check in until December 15 or later. 

We should get patches for supported and affected branches and nominate for landing.
Attachment #685444 - Flags: sec-approval? → sec-approval+
Comment on attachment 685444 [details] [diff] [review]
Limit MaxCubeMapTextureSize for Nouveau to 2K

If this is not a sec:{high,crit} bug, please state case for ESR consideration: relatively simple fix, avoids a real driver issue. I wouldn't propose this for a late ESR (like ESR10 is now) but I think it makes sense for a brand new ESR like ESR17 so we won't drag this around for another year.
User impact if declined: severe crash, triggerable by content, affecting certain linux users
Fix Landed on Version: will land on m-c around Dec 15
Risk to taking this patch (and alternatives if risky): see above sec-approval comment. Low risk.
String or UUID changes made by this patch: none
Attachment #685444 - Flags: approval-mozilla-esr17?
Attachment #685444 - Flags: approval-mozilla-beta?
Attachment #685444 - Flags: approval-mozilla-aurora?
(In reply to Al Billings [:abillings] from comment #33)
> Comment on attachment 685444 [details] [diff] [review]
> Limit MaxCubeMapTextureSize for Nouveau to 2K
> 
> sec-approval+ but please don't check in until December 15 or later. 

So, are you begging for me to disclose this issue?
Whiteboard: [embargoed until 12/15]
(In reply to Bill Gianopoulos [:WG9s] from comment #35)
> So, are you begging for me to disclose this issue?

Not sure what you mean. This is a new policy --- I hadn't encountered it before --- and I believe that its point is that as soon as we land this on mozilla-central, we are effectively disclosing this vulnerability, so we should limit the time window between that and when the fix gets deployed on the release channel.
(In reply to Bill Gianopoulos [:WG9s] from comment #35)
> (In reply to Al Billings [:abillings] from comment #33)
> > Comment on attachment 685444 [details] [diff] [review]
> > Limit MaxCubeMapTextureSize for Nouveau to 2K
> > 
> > sec-approval+ but please don't check in until December 15 or later. 
> 
> So, are you begging for me to disclose this issue?

When this code lands, it will self-disclose the issue, as said in comment 30. So, we don't land such code until later in the current development cycle in order to shorten the window of disclosure.
After reading the better explanation of the wait till Dec 15th in bug 814407 I understood the issue.  Was going to comment here to that affect, but had issues with my laptop battery going dead because for some odd reason it was not charging even though I had the laptop plugged in.  Anyway I now understand the issue.  We don;t want to land this until we have patches for all the affected branches.
Actually, regardless of whether we have branch patches or not, we don't want to land this on mozilla-central until 12/15 and it can't land elsewhere until it lands there, per policy. We don't want to expose easy to find issues if we don't need to do so early (due to stability risks, regression, etc).
(In reply to Al Billings [:abillings] from comment #39)
> Actually, regardless of whether we have branch patches or not, we don't want
> to land this on mozilla-central until 12/15 and it can't land elsewhere
> until it lands there, per policy. We don't want to expose easy to find
> issues if we don't need to do so early (due to stability risks, regression,
> etc).

OK. I am therefore beginning tomorrow, no longer including these patches in my builds that I make available for download.  I cannot continue to provide builds that include patches that I don't provide and still be consistent with this being open source.  If this had been for a shorter term, then that would not have been as much of an issue.
The bug was reported to us on 11/23 (five days ago) and will be fixed on 12/15 or so and released on 1/8. I'm not sure how much shorter term you expect it to be when it is a six week development cycle. It isn't going to go live until 1/8 no matter what. This is purely a decision of when we check it in during our development cycle.

I don't know you so I really don't know what your builds are or why that should matter to me. Not zero daying our users is more important than almost all other considerations. I'm not putting more than 100 million people at risk because a six week cycle isn't fast enough.
Tracking on all branches, esr 18+ given comment 41 and the target for landing.
Whiteboard: [embargoed until 12/15]
(In reply to Al Billings [:abillings] from comment #41)
> The bug was reported to us on 11/23 (five days ago) and will be fixed on
> 12/15 or so and released on 1/8. I'm not sure how much shorter term you
> expect it to be when it is a six week development cycle. It isn't going to
> go live until 1/8 no matter what. This is purely a decision of when we check
> it in during our development cycle.

Correction here - we'll be landing on 12/10 to make it into our fourth beta (to prevent the risk of regression without the opportunity to fix).
(In reply to Alex Keybl [:akeybl] from comment #44)
 
> Correction here - we'll be landing on 12/10 to make it into our fourth beta
> (to prevent the risk of regression without the opportunity to fix).

That's acceptable. :-) This came up yesterday in our triage with you.
Attachment #685444 - Flags: approval-mozilla-esr17?
Attachment #685444 - Flags: approval-mozilla-esr17+
Attachment #685444 - Flags: approval-mozilla-beta?
Attachment #685444 - Flags: approval-mozilla-beta+
Attachment #685444 - Flags: approval-mozilla-aurora?
Attachment #685444 - Flags: approval-mozilla-aurora+
ESR10 is unaffected because Mesa is blacklisted there.
Whiteboard: [adv-main18-][adv-esr17-]
Keywords: verifyme
Hi Bill, if you have drivers/configs that are still affected by this, it would be great if you could run the WebGL conformance tests again on them to make sure there are no crashes.

It would be ideal if you could check beta, Aurora and trunk, if you have time. 

Thank you for reporting this issue and for providing helpful information.
Whiteboard: [adv-main18-][adv-esr17-] → [adv-main18-][adv-esr17-][qa-]
I have verified that this issue is resolved in current trunk and aurora builds.  The actual builds I used to perform the test are:

Mozilla/5.0 (X11; Linux i686 on x86_64; rv:21.0) Gecko/20130112 Firefox/21.0 ID:20130112030947
Mozilla/5.0 (X11; Linux i686 on x86_64; rv:20.0) Gecko/20130112 Firefox/20.0 ID:20130112042018

Unfortunately the beta channel appears to still be affected.   The build I used for that test is:

Mozilla/5.0 (X11; Linux i686 on x86_64; rv:19.0) Gecko/20100101 Firefox/19.0 ID:20130109111322
It appears that the  "correct the way that we tweak max sizes" patch is missing from the beta branch.
Oh dear, it seems to be missing on the release branch as well.
Form comment #46 and comment #49, both patches landed on inbound, which migrated to mozilla-central and got uplifted to Aurora where everything works.  Yet looking at comment #47, only one out for the 2 patches ever landed on Aurora and beta, which have now been uplifted to beta and release.  Oops!
OIC.  only the one patch was actually nominated for inclusion in aurora and beta.  The problem is that that patch is virtually ineffective without the fix in the second patch which did not land.
Oh. That's unfortunate. I didn't realize that and didn't read comment 52 until now. We should backport it to the beta channel, then.
Comment on attachment 685921 [details] [diff] [review]
correct the way that we tweak max sizes

[Approval Request Comment]

See comment 34; we need to take this patch as well on beta otherwise the fix is ineffective, according to comment 52.

The patch has been on central for a long time and is on aurora now; no issue has been reported about it, and the patch is relatively straightforward, so I think I can say it's low risk.
Attachment #685921 - Flags: approval-mozilla-beta?
(In reply to Benoit Jacob [:bjacob] from comment #57)
> Oh. That's unfortunate. I didn't realize that and didn't read comment 52
> until now. We should backport it to the beta channel, then.

I should have tested this when it landed on aurora.
Attachment #685921 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Hi Bill, would you mind checking this in the latest beta and reporting back? Thank you.
Hmm.  I guess I was too quick in jumping tot he conclusion that the issue here was the absence of the other code.

It seems this still fails under the current beta and webglreport.sf.net still reports:

Max Texture Size: 	16384
Max Cube Map Texture Size: 	16384
Max Combined Texture Image Units: 	32
Max Anisotropy: 	16
Bill can you please confirm that the Beta you used to test was built after January 25? You can find the information in the about:support page.
Buildid is 20130130080006.
Thanks, this should definitely have the fix. Benoit, should Bill reopen or file a follow-up bug for comment 62?
But this does all work on Aurora, and mozilla-central.
Since it works on m-c, don't reopen.

My take: since this is already fixed on aurora (comment 66) and this is only a sec-moderate, we could just move on and wait for firefox 20 to be released.
Kind of what I thought.
(In reply to Benoit Jacob [:bjacob] from comment #67)
> Since it works on m-c, don't reopen.
> 
> My take: since this is already fixed on aurora (comment 66) and this is only
> a sec-moderate, we could just move on and wait for firefox 20 to be released.

Fair enough. Should the status flags be updated to reflect this state?
I imagine ESR-17 isn't really fixed either... could someone test that?
We just blacklisted Mesa altogether on esr17: bug 838413
Whiteboard: [adv-main18-][adv-esr17-][qa-] → [adv-main20-][adv-esr17-][qa-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.