Closed Bug 683710 Opened 13 years ago Closed 13 years ago

Should probably strip comments from shader sources before validating bad character and passing to the GL

Categories

(Core :: Graphics: CanvasWebGL, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: bjacob, Assigned: drs)

References

()

Details

Attachments

(3 files, 5 obsolete files)

This demo:
  http://demos.vicomtech.org/volren/

doesn't work anymore, with this warning:

Warning: WebGL: shaderSource: string contains the illegal character '34'
Source File: http://demos.vicomtech.org/volren/volumerc.js
Line: 72

The reason is that it has a comment containing the " character (ASCII 34) which is one of the illegal characters we're now rejecting in shader sources.

See bug 680722.

We need to do like WebKit which, in my understanding, strips comments out of shader sources before validating against bad characters and before passing the string to the GL.

See the StripComments class here:
http://trac.webkit.org/browser/trunk/Source/WebCore/html/canvas/WebGLRenderingContext.cpp?rev=93625#L139

We probably should borrow that code and give appropriate credit, rather than try to reinvent the wheel.
Copying the WebKit code seemed like a good idea to me on first glance, but after trying it, it's a total mess of dependencies to objects that we don't have good replacements for. For now, I think it's a better idea to rewrite this. I'm going to get started on it, although I suspect that I'll be pulled to do something else before I make significant progress on it.
I had a look at their code an clearly a lot of thought has gone into it: preservation of newlines so that line numbers remain meaningful, preservation of certain preprocessor directives.

The only thing that will need to be adapted, as far as I can see, is the strings API. They have a neat string API and we need to port that to our sucky nsString.

What other dependencies did you see?
Ok, I figured porting over their strings was off limits because of how relatively massive that collection of code would be. There are a ton of other dependencies, most of which are easy to resolve (like UChar, stuff like that). I'm also not aware of what the "WTF" set of headers are exactly, other than a funny name.
Of course we don't want to port their strings classes; instead we want to port this parser to use our own string classes, **** as they are.
This code is copied mostly from WebKit. It strips out comments from shader source code before actually compiling it, so that the comments can have illegal characters. It was benchmarked and it was noted that a test attached to the bug ticket took about twice as long with these changes.
Assignee: nobody → dsherk
Attachment #557726 - Flags: review?(bjacob)
Takes ~8.5s according to Instruments without comment stripping, ~16.5s with.
This is a version of the previous patch with two trivial optimizations which cut down the runtime of comment-stripping-speed-test.html from ~16.5s to ~15.5s according to my benchmarks. I don't actually believe these are worth applying because they make the code more confusing with little performance improvement.
Comment on attachment 557726 [details] [diff] [review]
Patch v1.0, shader source comment validation fix.

Review of attachment 557726 [details] [diff] [review]:
-----------------------------------------------------------------

Good work; some comments:

::: content/canvas/src/WebGLContextGL.cpp
@@ +4133,5 @@
>      WebGLuint shadername;
>      if (!GetConcreteObjectAndGLName("shaderSource: shader", sobj, &shader, &shadername))
>          return NS_OK;
>  
> +    nsString cleanSource = StripComments(nsString(source)).result();

Why don't you declare cleanSource as a const nsString& so that you avoid a useless deep copy.

::: content/canvas/src/WebGLValidateStrings.h
@@ +35,2 @@
>  // which can be found here:
>  // http://trac.webkit.org/browser/trunk/Source/WebCore/html/canvas/WebGLRenderingContext.cpp#L123

When pointing to a precise line number, it's best to link to a precise revision number otherwise the line number end up dangling.

@@ +35,3 @@
>  // which can be found here:
>  // http://trac.webkit.org/browser/trunk/Source/WebCore/html/canvas/WebGLRenderingContext.cpp#L123
> +// Note that some modifications were done to adapt it to the Mozilla Core API.

I'd just say "adapt it to Mozilla" or some such, as "Core API" is not a very common phrase around here. You could also say "strings API", I don't know.

@@ +66,5 @@
> +        {
> +            parse();
> +        }
> +
> +        nsString result()

return a const nsString& here, so you don't make a deep-copy of the string ;-)

I don't know how WebKit's String class works, maybe it has copy-on-write as after all WebKit comes from the KDE world which is addicted to copy-on-write but even so it's still a better idea to return a const reference to avoid doing any unnecessary work.

@@ +146,5 @@
> +
> +        ParseState m_parseState;
> +        nsString m_sourceString;
> +        unsigned m_length;
> +        unsigned m_position;

Here, replace unsigned [int] by size_t.

You really want to address arrays with offsets that have the same size as pointers. The integer type that's guaranteed to have pointer size is size_t (for unsigned) or ptrdiff_t (for signed). What's really happening is that on x86-64, arrays can only be addressed using 64bit offsets, so if you provide a 32bit value, you force the compiler to emit an additional instruction to clear the remaining high 32bits in the register used as offset.

See this discussion:
http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/94a564192d4b4f71
Attachment #557726 - Flags: review?(bjacob) → review-
(In reply to Benoit Jacob [:bjacob] from comment #8)
> > +    nsString cleanSource = StripComments(nsString(source)).result();
> 
> Why don't you declare cleanSource as a const nsString& so that you avoid a
> useless deep copy.

(possibly better: use "const nsAString&")

> > +        nsString result()
> 
> return a const nsString& here, so you don't make a deep-copy of the string
> ;-)

(that applies here, too)
Reference: top entry on https://developer.mozilla.org/en/String_Quick_Reference
Version: unspecified → Trunk
@dholbert I tried that and got some strange errors, I don't think nsAString is designed to operate in this way.
(In reply to Doug Sherk from comment #11)
> @dholbert I tried that and got some strange errors, I don't think nsAString
> is designed to operate in this way.

The crashes we were getting were because the StripComments temporary object was dying too soon; perhaps now that this problem is fixed any other problem you had around that code is fixed too.


@ Daniel:

> (possibly better: use "const nsAString&")

Do you know why nsAString would be better here? This is just a little piece of self-contained code and we know that the string we're dealing with here is really a nsString. Why not just use a nsString reference?
(In reply to Doug Sherk from comment #11)
> @dholbert I tried that and got some strange errors, I don't think nsAString
> is designed to operate in this way.

Perhaps that was from using "nsAString" rather than "nsAString&"? (nsAString is abstract -- you'll get compile errors if you declare an instance of it, IIRC, but references should work fine and be assignable from nsString)

Anyway, doesn't matter much, and I wouldn't waste much time on it if it's causing weird errors.

(In reply to Benoit Jacob [:bjacob] from comment #12)
> Do you know why nsAString would be better here? [...] Why not just use a nsString reference?

More flexibility (ability to swap out underlying string impl) & cross-codebase consistency, I guess?  Regardless, I agree that it's not a big deal in a self-contained piece of code -- I'd just been refreshing my memory on our string guide, and I thought it might be a useful style nit.  I defer to your judgement in this code, of course. :)
Fixes for previous items, also fixed getShaderSource() to return correct source. This patch is not deployable yet and is mostly here just for reference since more optimization is needed.
Attachment #557726 - Attachment is obsolete: true
Attachment #557742 - Attachment is obsolete: true
So I did a little bit of quick research, it was indeed incorrect to be storing the source as UTF-8 as we were doing. gl-getshadersource.html was fixed with the previous patch because of the port to UTF-16.
This patch should fix all issues we've been having with comments that have illegal characters, etc. It also fixes gl-getshadersource.html. According to my benchmarks, StripComments.parse() is now ~15% faster (it now takes ~14s to complete comment-stripping-speed-test.html), although in a realistic applications, StripComments.parse() is now called twice. This is because we must validate the source on each call to shaderSource() and compileShader(), so we're now doing comment stripping twice. This was a cost of getting getShaderSource() to work correctly and is also identical to the way WebKit works.

I question whether or not replacing nsString with nsTArray is worth it. The nsTArray implementation is significantly more complex and difficult to understand, and also breaks some encapsulation, while giving little performance benefit. Also, I previously submitted an optimized version of patch v1.0 which equated the buffer string to the input string, then set it back to empty again. This seemed to speed it up almost as much as this implementation, but I didn't debug it to see if what I was doing was even working as intended.

Note that I considered storing copies of both the commented and uncommented shader source, but I assumed this would be bad for mobile devices and it would be preferable to load slower than use more memory.
Attachment #557978 - Attachment is obsolete: true
Attachment #558014 - Flags: review?(bjacob)
I just tested a version of the comment stripping test with compilation after it and it took 20s to complete, so it's not as bad as I thought. It's a little over twice as slow than no comment stripping at all, and without the fix for gl-getshadersource.html.
(In reply to Daniel Holbert [:dholbert] from comment #13)
> I defer to your judgement in this code, of course. :)

Don't do that, I don't know much about our string classes!
(In reply to Doug Sherk from comment #15)
> So I did a little bit of quick research, it was indeed incorrect to be
> storing the source as UTF-8 as we were doing. gl-getshadersource.html was
> fixed with the previous patch because of the port to UTF-16.

Here instead of UTF-8 you meant ASCII. nsCString is ASCII, not unicode. UTF-8 is a flavor of unicode that allow many usual characters to be stored with just 8 bits.
(In reply to Benoit Jacob [:bjacob] from comment #19)
> (In reply to Doug Sherk from comment #15)
> > So I did a little bit of quick research, it was indeed incorrect to be
> > storing the source as UTF-8 as we were doing. gl-getshadersource.html was
> > fixed with the previous patch because of the port to UTF-16.
> 
> Here instead of UTF-8 you meant ASCII. nsCString is ASCII, not unicode.
> UTF-8 is a flavor of unicode that allow many usual characters to be stored
> with just 8 bits.

Yep, my bad, thanks for the correction.
(In reply to Doug Sherk from comment #16)
> in a realistic
> applications, StripComments.parse() is now called twice. This is because we
> must validate the source on each call to shaderSource() and compileShader(),
> so we're now doing comment stripping twice.

I don't understand why that is. Since shader sources can only be set by shaderSource(), if we already validate them in shaderSource(), why would be also need to validate them in compileShader() ?
(In reply to Benoit Jacob [:bjacob] from comment #21)
> (In reply to Doug Sherk from comment #16)
> > in a realistic
> > applications, StripComments.parse() is now called twice. This is because we
> > must validate the source on each call to shaderSource() and compileShader(),
> > so we're now doing comment stripping twice.
> 
> I don't understand why that is. Since shader sources can only be set by
> shaderSource(), if we already validate them in shaderSource(), why would be
> also need to validate them in compileShader() ?

To explain this in more detail, we basically had 3 choices that I could think of in terms of design:

1) When shaderSource() is called, we validate the source, then store the unvalidated source and re-validate it in compileShader().
2) When shaderSource() is called, we validate the source, then store the validated source and use it in compileShader().
3) When shaderSource() is called, we validate the source, then store both the validated and unvalidated source and use the validated source in compileShader().

I chose 1 because 2 breaks getShaderSource() so it's basically out of the question, and 3 uses too much memory. I figured most people complain about memory usage and would be more willing to live with a very slightly slower run (which in most real programs you would never notice) than the extra memory for as long as the context exists. Also, 1 is the way WebKit implements this so maybe they thought of something I didn't.
Thanks for the explanation. I agree now with the choice of 1. The memory usage penalty in 3 is not big, but neither is the performance penalty in 1.
Comment on attachment 558014 [details] [diff] [review]
Patch v1.2, shader source comment validation fix.

Review of attachment 558014 [details] [diff] [review]:
-----------------------------------------------------------------

Just need 1 last quick iteration!

::: content/canvas/src/WebGLContextGL.cpp
@@ +3971,5 @@
>                                         SH_WEBGL_SPEC,
>                                         gl->IsGLES2() ? SH_ESSL_OUTPUT : SH_GLSL_OUTPUT,
>                                         &resources);
>  
> +        StripComments stripComments(shader->Source());

Please add a quick comment explaining why we're constructing a named object here as opposed to just using a StripComments() temporary.

@@ +3972,5 @@
>                                         gl->IsGLES2() ? SH_ESSL_OUTPUT : SH_GLSL_OUTPUT,
>                                         &resources);
>  
> +        StripComments stripComments(shader->Source());
> +        const nsAString& cleanSource = nsString(&stripComments.result()[0], stripComments.length());

instead of &stripComments.result()[0] do stripComments.result().Elements()

@@ +3979,5 @@
> +
> +        const nsPromiseFlatString& flatSource = PromiseFlatString(cleanSource);
> +
> +        if (!NS_IsAscii(flatSource.get()))
> +            return ErrorInvalidValue("compileShader: non-ascii characters found in source");

Since shaderSource already checks that the source stripped from comments is already in the 7bit ascii range, there is no need for this check here.

(Yes I realize that we had this problem before already)

Just add a comment here explaining that so it's safe to call NS_LossyConvertUTF16toASCII(flatSource)

@@ +4144,5 @@
>      if (!GetConcreteObjectAndGLName("shaderSource: shader", sobj, &shader, &shadername))
>          return NS_OK;
>  
> +    StripComments stripComments(source);
> +    const nsAString& cleanSource = nsString(&stripComments.result()[0], stripComments.length());

Same comments as above.
Attachment #558014 - Flags: review?(bjacob) → review-
(In reply to Doug Sherk from comment #16)
> I question whether or not replacing nsString with nsTArray is worth it. The
> nsTArray implementation is significantly more complex and difficult to
> understand, and also breaks some encapsulation, while giving little
> performance benefit.

There, feel free to either go ahead with your nsTArray code since I find it sane, or keep the nsString approach for now if you favor not doing optimizations until they're backed by profiling. Both approaches work with me. I don't find the nsTArray approach too awkward.
(In reply to Benoit Jacob [:bjacob] from comment #25)
> (In reply to Doug Sherk from comment #16)
> > I question whether or not replacing nsString with nsTArray is worth it. The
> > nsTArray implementation is significantly more complex and difficult to
> > understand, and also breaks some encapsulation, while giving little
> > performance benefit.
> 
> There, feel free to either go ahead with your nsTArray code since I find it
> sane, or keep the nsString approach for now if you favor not doing
> optimizations until they're backed by profiling. Both approaches work with
> me. I don't find the nsTArray approach too awkward.

I did do some profiling (although not as 100% in-depth as it could have been) and found that the nsTArray approach was faster. Since you prefer the nsTArray implementation, and it's already done, I think it's better to just stick with it.
Fixes issues brought up by bjacob.
Attachment #558014 - Attachment is obsolete: true
Attachment #558617 - Flags: review?(bjacob)
Attachment #558617 - Flags: review?(bjacob) → review+
+r carried from patch 1.2 (r=bjacob)

Folded version of the same patch with changes from https://bugzilla.mozilla.org/show_bug.cgi?id=680722
Attachment #558617 - Attachment is obsolete: true
Attachment #558692 - Flags: review+
(looks like the burning is in code from another patch in the same push -- bug 681026)
http://hg.mozilla.org/integration/mozilla-inbound/rev/d04d43c81294
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9
Attachment #558692 - Attachment is patch: true
Comment on attachment 558692 [details] [diff] [review]
Patch v1.3, shader source comment validation fix.

Drive-by string usage review:

>+        // We're storing an actual instance of StripComments because, if we don't, the 
>+        // cleanSource nsAString instance will be destroyed before the reference is
>+        // actually used.
>+        StripComments stripComments(shader->Source());
>+        const nsAString& cleanSource = nsString(stripComments.result().Elements(), stripComments.length());
nsString is a class, so you can write this as
nsString cleanSource(stripComments.result().Elements(), stripComments.length());
However, this makes a copy of the string and null-terminates it, which is inefficient. A more efficient solution is to do
nsDependentSubstring cleanSource(stripComments.result().Elements(),
                                 stripComments.length());
This references the characters in the StripComments member array, but as written, it won't outlive it, so that's OK. It doesn't null-terminate the string, but ValidateGLSLString doesn't need a null-terminated string, so that's also OK.

>+        if (!ValidateGLSLString(cleanSource, "compileShader"))
>+            return NS_OK;
>+
>+        const nsPromiseFlatString& flatSource = PromiseFlatString(cleanSource);
1. NS_LossyConvertUTF16toASCII doesn't require a flat string, so no need to flatten it.
2. Unless you use the nsDependentSubstring, then the string was already flat anyway.

>+        const nsCString& sourceCString = NS_LossyConvertUTF16toASCII(flatSource);
NS_LossyConvertUTF16toASCII is a class, so you can write this as
NS_LossyConvertUTF16toASCII sourceCString(flatSource);
> >+        StripComments stripComments(shader->Source());
> >+        const nsAString& cleanSource = nsString(stripComments.result().Elements(), stripComments.length());
> nsString is a class

Right... this bit escaped my attention during review.

>, so you can write this as
> nsString cleanSource(stripComments.result().Elements(),
> stripComments.length());
> However, this makes a copy of the string and null-terminates it, which is
> inefficient. A more efficient solution is to do
> nsDependentSubstring cleanSource(stripComments.result().Elements(),
>                                  stripComments.length());

FWIW, with GCC 4.6 I've run into lots of compilation trouble with this; the #including scheme around nsTDependentSubstring.h makes this hard to figure but even after switching to the Substring functions I couldn't get this form to compile, so I used the form taking 2 character pointers instead.

> >+        const nsPromiseFlatString& flatSource = PromiseFlatString(cleanSource);
> 1. NS_LossyConvertUTF16toASCII doesn't require a flat string, so no need to
> flatten it.

The code immediately below that does require a flat string, so I'm keeping this.

> 2. Unless you use the nsDependentSubstring, then the string was already flat
> anyway.

Well, I do use nsDependentSubstring now :-)

> 
> >+        const nsCString& sourceCString = NS_LossyConvertUTF16toASCII(flatSource);
> NS_LossyConvertUTF16toASCII is a class, so you can write this as
> NS_LossyConvertUTF16toASCII sourceCString(flatSource);

that's what I didn't know. I wish that our function/class naming conventions made it a little more clear what is a class.
Attachment #560848 - Flags: review?(neil)
(In reply to Benoit Jacob from comment #35)
> > A more efficient solution is to do
> > nsDependentSubstring cleanSource(stripComments.result().Elements(),
> >                                  stripComments.length());
> 
> FWIW, with GCC 4.6 I've run into lots of compilation trouble with this; the
> #including scheme around nsTDependentSubstring.h makes this hard to figure
> but even after switching to the Substring functions I couldn't get this form
> to compile, so I used the form taking 2 character pointers instead.
You're right; there are an alarming number of Substring methods but the one I was thinking of only exists in the external string API.

> > >+        const nsPromiseFlatString& flatSource = PromiseFlatString(cleanSource);
> > 1. NS_LossyConvertUTF16toASCII doesn't require a flat string, so no need to
> > flatten it.
> The code immediately below that does require a flat string, so I'm keeping
> this.
Sorry, I don't see where; as far as I can see, flatSource is only used as a parameter to NS_LossyConvertyUTF16toASCII which doesn't need a flat string.
> > > >+        const nsPromiseFlatString& flatSource = PromiseFlatString(cleanSource);
> > > 1. NS_LossyConvertUTF16toASCII doesn't require a flat string, so no need to
> > > flatten it.
> > The code immediately below that does require a flat string, so I'm keeping
> > this.
> Sorry, I don't see where; as far as I can see, flatSource is only used as a
> parameter to NS_LossyConvertyUTF16toASCII which doesn't need a flat string.

We need this NS_LossyConvertyUTF16toASCII to be flat, see just below this code. Do you mean that a NS_LossyConvertyUTF16toASCII is always flat even if the string it was constructed from wasn't?
(In reply to Benoit Jacob from comment #37)
> Do you mean that a NS_LossyConvertUTF16toASCII is always flat even if
> the string it was constructed from wasn't?
Indeed, NS_LossyConvertUTF16toASCII is a subclass of nsCAutoString whose constructor calls LossyAppendUTF16toASCII (doesn't need to be LossyCopyUTF16toASCII because the new string is obviously already empty.)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: