Closed Bug 604395 Opened 10 years ago Closed 10 years ago

OpenGL contexts use uninitialized format, often resulting in failure to create

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta7+

People

(Reporter: kdevel, Assigned: jseward)

References

()

Details

Attachments

(5 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:2.0b8pre) Gecko/20100101 Firefox/4.0b8pre
Build Identifier: Mozilla/5.0 (X11; Linux x86_64; rv:2.0b8pre) Gecko/20100101 Firefox/4.0b8pre

WebGL does no longer work.

Reproducible: Always

Steps to Reproduce:
1. Open URL
Actual Results:  
"Could not initialise WebGL, sorry :-(" appears

Expected Results:  
Run the WebGL code

Die erste fehlerhafte Revision ist:
Änderung:        55267:085538a08eb2
Nutzer:          Vladimir Vukicevic <vladimir@pobox.com>
Datum:           Mon Sep 13 08:53:52 2010 -0700
Zusammenfassung: b=582053; integrate webgl confromance suite as mochitest, angle followup; r=bjacob
"Die erste fehlerhafte Revision ist" means "The first bad Revision is" 
That change looks like a test only change and no change in the code itself.
Component: General → Canvas: WebGL
Product: Firefox → Core
QA Contact: general → canvas.webgl
It is probably a dupe of bug 604392.
Can you give you graphic section of the "about:support" page ?
This changeset does touch WebGLContext.cpp code trying to create a WebGL context so it could be the culprit: CC'ing Vlad.
Can you see anything in the Error Console ?
(You may have to enable it by setting devtools.errorconsole.enabled = true)
Ah, also: this  bug is on linux, so there is no GfxInfo at play here, no blacklisting issue.

Can you go to about:config, filter webgl, and make sure that your only non-default option is:

   webgl.enabled_for_all_sites = true

(since we still dont enable webgl by default on linux 64 because of ANGLE issues)
So it is a dupe of
Depends on: 578877
Summary: Could not initialise WebGL, sorry :-( → Could not initialise WebGL, sorry :-( on Linux-x86_64
... bug 578877 ?
Status: UNCONFIRMED → NEW
Ever confirmed: true
@2: GPU Accelerated Windows: 0/1
@4: console output is "WebGL: Can't get a usable WebGL context"
    error console output is the same plus 
"Error: gl is undefined
Source File: http://learningwebgl.com/lessons/lesson03/index.html
Line: 71"
@5: this option was already set to true.
(In reply to comment #7)
> ... bug 578877 ?

No no, WebGL is working on linux x86-64 provided you have set webgl.enabled_for_all_sites to true. That bug is only blocking enabling it by default.
(In reply to comment #0)
> Die erste fehlerhafte Revision ist:
> Änderung:        55267:085538a08eb2

@ Stefan: can you tell me exactly what you mean by this: do you mean that you actually bisected it and that you know that this changeset is the first non-working revision?
@Benoit: Yes. It is the output of hg bisect unfortunately with LANG=de.
Great, thanks a lot for bisecting this. I'll have a look at this changeset ASAP!
Stefan: can you set these prefs:
   webgl.verbose = true   (warning, this makes webgl slower)
   devtools.errorconsole.enabled = true

and tell me if you see some message in the error console? (Ctrl+Shift+J)

If you have made a debug build and can use a debugger, a very useful thing you can do is set a breakpoint at the beginning of WebGLContext::SetDimensions()  (here it's WebGLContext.cpp:272) and step through this function, tell me what path it's taking. I understand if you don't want to do this, I mention this just in case.
Warning: WebGL: WebGL: Can't get a usable WebGL context
Source File: http://learningwebgl.com/lessons/lesson03/index.html
Line: 43
Error: gl is undefined
Source File: http://learningwebgl.com/lessons/lesson03/index.html
Line: 71
My build seems not to be a debug build. I'll check this.
Thanks a lot! I could never have guessed this without this information.

The last line you hit in WebGLContext::SetDimensions is WebGLContext.cpp:306:

    nsCOMPtr<nsIPrefBranch> prefService = do_GetService(NS_PREFSERVICE_CONTRACTID);
    NS_ENSURE_TRUE(prefService != nsnull, NS_ERROR_FAILURE);

This do_GetService(NS_PREFSERVICE_CONTRACTID) is failing.

Can you retry on a fresh new profile? Do this to open the profile manager:

    firefox -P

Vlad, do you know an explanation why do_GetService(NS_PREFSERVICE_CONTRACTID) fails ?? And why it only started failing with your commit ?? Notice that your commit doesn't change these lines or lines above, it only affects lines below.
trace2.txt is with fresh profile. firefox -P does open the profile manager.
Reverting 55267:085538a08eb2 re-enables WebGL up to revision 55384:d3f448b20f20. More recent revisions (i.e. the current 56394:ec6b12294371) require reversion of the changesets with these revision numbers. The reversion of 55384:d3f448b20f20 does not apply cleanly, some additional changes are necessary: 03-additional.dif.
Attached file cleanup
Er, are you sure you weren't getting an OSMesa context previously?  The one thing that patch changes is that it disables ever trying OSMesa unless it's forced.  You can try setting the webgl.force_osmesa pref to true to see if that's the case.
Attachment #485509 - Attachment mime type: video/dv → text/plain
Vlad: what about the do_GetService failure?
I don't think there's a do_GetService failure.
(In reply to comment #24)
> I don't think there's a do_GetService failure.

It was my impression from this attachment, https://bugzilla.mozilla.org/attachment.cgi?id=483771 that there was a do_GetService failure: specifically:


mozilla::WebGLContext::SetDimensions (this=0x7fffd4065800, width=500, 
    height=500) at /md1/tmp/2010-10-16/content/canvas/src/WebGLContext.cpp:303
303         format.depth = 16;
(gdb) s
304         format.minDepth = 1;
(gdb) s
306         nsCOMPtr<nsIPrefBranch> prefService = do_GetService(NS_PREFSERVICE_CONTRACTID);
(gdb) s
do_GetService (aContractID=0x7ffff6d396b0 "@mozilla.org/preferences-service;1")
    at ../../../../dist/include/nsServiceManagerUtils.h:62
62          return nsGetServiceByContractID(aContractID);
(gdb) s
nsGetServiceByContractID (this=0x7fffffffa930, 
    aContractID=0x7ffff6d396b0 "@mozilla.org/preferences-service;1")
    at ../../../../dist/include/nsCOMPtr.h:397
397             : mContractID(aContractID)
(gdb) s
400             }
(gdb) s
do_GetService (aContractID=0x7ffff6d396b0 "@mozilla.org/preferences-service;1")
    at ../../../../dist/include/nsServiceManagerUtils.h:63
63      }
(gdb) s
nsCOMPtr (this=0x7fffffffa9d0, gs=
      {mContractID = 0x7ffff6d396b0 "@mozilla.org/preferences-service;1"})
    at ../../../dist/include/nsCOMPtr.h:623
623                 : NSCAP_CTOR_BASE(0)
(gdb) s
627               assign_from_gs_contractid(gs, NS_GET_TEMPLATE_IID(T));
(gdb) s
nsCOMPtr<nsIPrefBranch>::assign_from_gs_contractid (this=0x7fffffffa9d0, gs=
      {mContractID = 0x7ffff6d396b0 "@mozilla.org/preferences-service;1"}, 
    aIID=@0x7ffff6b6e390) at ../../../dist/include/nsCOMPtr.h:1252
1252        if ( NS_FAILED( gs(aIID, &newRawPtr) ) )
(gdb) s
Speicherzugriffsfehler
ff-t1@machine:~> JavaScript warning: http://learningwebgl.com/lessons/lesson03/index.html, line 43: WebGL: WebGL: Can't get a usable WebGL context
Apparently Speicherzugriffsfehler is memory access error, but that log just shows the user stepping into a gs() call wrapped inside an NS_FAILED check, which doesn't say anything about its success or not.
OK but right after this we have the WebGL-failed message printed. I had assumed that this was with a release build, hence stuff could be inlined, explaining why one would immediately see this message. I admit it's a bit weird.

Stefan: are you sure that this was with a debug build and that you didn't edit the "step through" text that you attached?
Let's wait on further diagnosis until Stefan responds to comment #22.
I think I've been seeing the same problem.  It happens because
ContextFormat::ContextFormat(const StandardContextFormat cf) doesn't
initialise all the fields all the time, and in this case returns a
struct with an uninitialised ::stencil, which causes later
initialisation of mesa (??) to fail.

The following change fixes it for me.

diff --git a/gfx/thebes/GLContext.h b/gfx/thebes/GLContext.h
--- a/gfx/thebes/GLContext.h
+++ b/gfx/thebes/GLContext.h
@@ -290,20 +290,21 @@ struct THEBES_API ContextFormat
         StrictBasicRGBA32,
         BasicRGB24,
         StrictBasicRGB24,
         BasicRGB16_565,
         StrictBasicRGB16_565
     };
 
     ContextFormat() {
-        memset(this, 0, sizeof(this));
+        memset(this, 0, sizeof(*this));
     }
 
     ContextFormat(const StandardContextFormat cf) {
+        memset(this, 0, sizeof(*this));
         switch (cf) {
         case BasicRGBA32:
             red = green = blue = alpha = 8;
             minRed = minGreen = minBlue = minAlpha = 1;
             break;
 
         case StrictBasicRGBA32:
             red = green = blue = alpha = 8;
Great, thanks Julian!!

And Vlad --- yet another bug we wouldn't have had if we had factored this zeroing code once and for all, for example in my ZeroBase struct!
Attachment #487156 - Flags: review?(vladimir)
WebGL works with https://bugzilla.mozilla.org/attachment.cgi?id=487156 
I had to set webgl.force_osmesa to "true".
(In reply to comment #32)
> I had to set webgl.force_osmesa to "true".

Are you saying that if you only change webgl.force_osmesa=false, it does not even pick up OSMesa anymore?

This would be another bug.
Here's a patch reverting to the behavior of trying OSMesa as last chance, regardless of force_osmesa.

Notice that if you don't want to do that, then force_osmesa is useless (since it would be enough to test if osmesalib is defined), its name is misleading; and the current behavior of not trying OSMesa unless force_osmesa is true, breaks users' expectations (Stefan here).
Attachment #487162 - Flags: review?(vladimir)
fixed missing {
Attachment #487162 - Attachment is obsolete: true
Attachment #487169 - Flags: review?(vladimir)
Attachment #487162 - Flags: review?(vladimir)
Fuzz 7 required!

patch -F7 -p1 -i ../try-osmesa-at-last.dif 
patching file content/canvas/src/WebGLContext.cpp
Hunk #1 succeeded at 394 with fuzz 7.
that's because I have some other patches applied.
Attachment #487156 - Flags: superreview+
Attachment #487156 - Flags: review?(vladimir)
Attachment #487156 - Flags: review+
Comment on attachment 487169 [details] [diff] [review]
add back code trying OSMesa as last resort

So the only problem I have with the OSMesa piece (and why it moved up above) is that it's really a very bad experience with anything but trivial code; I don't really want anyone to get it by accident.  But as you say, they have to set the lib path to get it, so maybe it's not so bad.

And yes, I knew there would be some gloating about ZeroBase :-)  We should just turn the memset into a macro (ZERO_THIS();) or something, though I don't think it really will come up more often than here and in GLContext function ptr stuff.  Everything else just explicitly initializes its members, and this could have as well...
Attachment #487169 - Flags: review?(vladimir) → review+
Component: Canvas: WebGL → Graphics
OS: Linux → All
QA Contact: canvas.webgl → thebes
Hardware: x86_64 → All
Summary: Could not initialise WebGL, sorry :-( on Linux-x86_64 → OpenGL contexts use uninitialized format, often resulting in failure to create
(In reply to comment #38)
> Comment on attachment 487169 [details] [diff] [review]
> add back code trying OSMesa as last resort
> 
> So the only problem I have with the OSMesa piece (and why it moved up above) is
> that it's really a very bad experience with anything but trivial code; I don't
> really want anyone to get it by accident.  But as you say, they have to set the
> lib path to get it, so maybe it's not so bad.

Yes, that's what I was thinking.

> 
> And yes, I knew there would be some gloating about ZeroBase :-)  We should just
> turn the memset into a macro (ZERO_THIS();) or something

How about a function like this:

  template<typename T> void Zero(T *ptr) { memset(ptr,0,sizeof(T)); }

so in constructors we could call:

  Zero(this);
Can someone please mark this a b7 blocker?

Since it's not just WebGL and not just linux64, but really affects all OpenGL on all platforms.
blocking2.0: --- → ?
(In reply to comment #39)
> How about a function like this:
> 
>   template<typename T> void Zero(T *ptr) { memset(ptr,0,sizeof(T)); }
> 
> so in constructors we could call:
> 
>   Zero(this);

Is there some reason you want to have special machinery for zeroing
out objects?  This seems like a bad idea from a performance point of
view, and also from an error checking point of view: it makes it
impossible to use Valgrind/Purify etc to find places where
constructors don't initialise all fields.  It isn't in keeping with
most of the rest of the Mozilla codebase, afaics, which just
initialises fields ad-hoc and appears to work OK.

Also .. doesn't Zero(this) kill the vtable pointer, in general?  (Bas
mentioned something like that on irc)

And if (for whatever reason) the compiler identifies T as a superclass
of the class that 'this' really is at run time, then Zero will only
zero out the first sizeof(T) bytes of the object, instead of the real
size.
OK, this class is non-virtual, but the rest of your points stand, so let's initialized the individual fields.
Ah, well, there is at least the GLContextSymbols struct (in GLContextSymbols.h) that is too big to manually initialize all the members, so memset really makes sense there.

So I don't know.
We have NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW already.

Personally I'm sympathetic to comment #42, though.
Zeroing operator new won't help for stack allocation, no?

It's a struct that we need to zero out; we can either do it as part of the constructor, or we can require every time someone creates one on the stack that they do a memset (or ZeroMemory, like the win32 APIs require).  That seems like a silly requirement to put on callers.  We can certainly initialize very member in this particualar struct, but there are others where we can't (like the symbols struct); but maybe we just treat that as a special case.
blocking2.0: ? → -
(In reply to comment #46)
> Zeroing operator new won't help for stack allocation, no?

Right, I'm just pointing out that we already do something like this in places.

> It's a struct that we need to zero out; we can either do it as part of the
> constructor, or we can require every time someone creates one on the stack that
> they do a memset (or ZeroMemory, like the win32 APIs require).  That seems like
> a silly requirement to put on callers.  We can certainly initialize very member
> in this particualar struct, but there are others where we can't (like the
> symbols struct); but maybe we just treat that as a special case.

It makes sense to decide on a case-by-case basis; I just agree with Julian in comment #42 that memset zeroing can hide bugs.
So, I agree with implementing your idea i.e. initializing individual members when possible and only keeping memset() in special cases like GLContextSymbols; just for the record, I wondered what could be the proper C++ way here, since ZeroBase<Derived> fails in the case of virtual classes, and the only way that I can think of is...

... to have each member take care of initializing itself by zero.

That is, remember how convenient nsRefPtr is in this respect (if the data members here were nsRefPtrs we wouldn't have anything to do), we can do the same for any type that is integer-constructible:

template<typename T> struct InitByZero {
  T value;
  operator T() const { return value; }
  InitByZero() : value(0) {}
};

We then just need to declare our members as InitByZero:

struct ContextFormat {
  InitByZero<int> red, green, blue;
};
Now that Julian's patch is landed, this doesn't need to block beta7 anymore.

Can someone make that block final instead?
Even better, mark as fixed since what we're still discussing is cosmetic. Will land the OSMesa patch asap.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee: nobody → jseward
Duplicate of this bug: 608386
Depends on: 745880
You need to log in before you can comment on or make changes to this bug.