Closed Bug 951799 Opened 10 years ago Closed 10 years ago

cache uniform locations in OGLShaderProgram; optimize

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: vlad, Assigned: vlad)

Details

(Whiteboard: [qa-])

Attachments

(3 files, 2 obsolete files)

Attached patch Cache uniform locations + values (obsolete) — Splinter Review
OGLShaderProgram got changed a while ago to use strings to reference each uniform, even at runtime, doing a strcmp() in a loop across all the shaders in a program until the location is found.  This is wasteful and silly; it doesn't even simplify the code any, especially since we have a finite number of uniforms that we can set.

Additionally, we're changing uniform data even if the previous values are what was already there; that was always an issue, but we may as well fix it.

The attached patch does both of these things -- it introduces a KnownUniform struct that tracks the location and value of each uniform, and only issues gl commands to update the uniform if needed.
Attachment #8349587 - Flags: review?(bjacob)
Comment on attachment 8349587 [details] [diff] [review]
Cache uniform locations + values

Review of attachment 8349587 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/OGLShaderProgram.h
@@ +48,5 @@
>  };
>  
> +class KnownUniform {
> +public:
> +  enum KnownUniformType {

Pro tip: we now have in mfbt a MOZ_BEGIN_ENUM_CLASS ... MOZ_END_ENUM_CLASS thing that gives you proper namespacing and strong typing if the compiler supports it ... consider using it the next time you add new enums.

@@ +86,5 @@
> +    mName = NotAKnownUniform;
> +    mNameString = nullptr;
> +    mType = UniformType1f;
> +    mLocation = -1;
> +    memset(&mValue.f1, sizeof(mValue), 0);

I am so happy --- literally jumping with joy. For years I've seen you initialize classes with memset and was thinking, some day he's going to get the args order wrong and I'll never r+ any such memset usage anymore. That just happened :)
Attachment #8349587 - Flags: review?(bjacob) → review-
I feel dumb.  We still have to send the *first* glUniform call, even if the value is 0!  This explains the randomness too.  This took a long time to track down, and ultimately involved dumping all uniform sets during reftest runs.

As a bonus, I'm able to quantify the number of redundant uniform sets, and it's huge.  In the first 1455 reftests (which is the number of one of the only two randomly failing reftests!), the old code was issuing 44255 uniform calls, almost all of which were matrix (16 float value) sets.  The new code with caching is issuing only 296 calls.  I'm looking forward to seeing if this moves the perf needle, on mobile especially!

Tryserver reftest run:

  https://tbpl.mozilla.org/?tree=Try&rev=88bbbcc2be18
Spoke too soon, that wasn't the problem (the initial value for uniforms is 0 anyway).  Arg, back to hunting.
Attached patch updated patch (obsolete) — Splinter Review
This is the updated patch, which still has the one OSX reftest failure that I'm trying to figure out.

Some notes... this just looks up all known uniforms in every shader.  In practice (and I compared with logs), this means that the solid color layer shader gets the tex coord multiplier uniform looked up and set where it didn't before; it's ignored in the fragment shader.  Otherwise, the uniforms that are found are identical.
Attachment #8361237 - Flags: review?(bjacob)
Comment on attachment 8361237 [details] [diff] [review]
updated patch

Review of attachment 8361237 [details] [diff] [review]:
-----------------------------------------------------------------

A timid r+ with several comments; I hope that you can address them while debugging your bug, and even that the mType issue might help understanding your bug.

::: gfx/layers/opengl/OGLShaderProgram.h
@@ +5,5 @@
>  
>  #ifndef GFX_OGLSHADERPROGRAM_H
>  #define GFX_OGLSHADERPROGRAM_H
>  
> +#include "GLContext.h"

Do you really need the expensive GLContext.h in this _header_ ? Is GLContextTypes.h not enough? What can't you forward-declare?

@@ +48,5 @@
>  };
>  
> +class KnownUniform {
> +public:
> +  enum KnownUniformType {

The following comment doesn't block getting r+, but it is appreciated if you can take it into account:

Consider the following two nice properties that an enum might have:
 1. Being strongly typed;
 2. Being declarable without having the entire class defined.

You can achieve 1. by using MOZ_BEGIN_ENUM_CLASS from mfbt/TypedEnum.h.
You can achieve 2. by taking this nested enum out of this class, so it can be declared independently of having the class definition.

Although 1. and 2. look unrelated, they are actually best done together, because:
 A. Having the enum in the class gave you namespacing, which you lose by taking it out of the class, but which you regain from using MOZ_BEGIN_ENUM_CLASS, which gives you nice namespacing of the enum values by the enum type name (MyEnum::MyValue).
 B. MOZ_BEGIN_ENUM_CLASS by itself doesn't support nesting in a class. If you want to nest one in a class, you need MOZ_BEGIN_NESTED_ENUM_CLASS instead.

@@ +82,5 @@
> +  };
> +
> +  KnownUniform()
> +  {
> +    mName = NotAKnownUniform;

I would prefer using initializer lists where applicable to initialize members. More rigid structure == better.

@@ +90,5 @@
> +    memset(&mValue, 0, sizeof(mValue));
> +  }
> +
> +  bool UpdateUniform(int32_t i1) {
> +    if (mLocation == -1) return false;

I am not a coding style ninja, but I thought that you had to write this

  if (condition) {
    return false;
  }

@@ +91,5 @@
> +  }
> +
> +  bool UpdateUniform(int32_t i1) {
> +    if (mLocation == -1) return false;
> +    if (mValue.i1 != i1) {

i1 is uint32_t but mValue.i1 is int. If there is a reason for this type mismatch, please explain, otherwise, please settle on one type.

(I suppose that the compiler generates a warning on this comparison between signed and unsigned).

@@ +143,5 @@
> +    case 2:
> +    case 3:
> +    case 4:
> +    case 16:
> +      if (memcmp(mValue.f16v, fp, sizeof(float) * cnt) != 0) {

Since UpdateUniform overloads return whether the oldvalue!=newvalue, notice that the detail of using a raw bits comparison here, instead of a floating-point equality comparison, leaks out. Is this intended? Acceptable? You decide.

I would also suggest size_t for this 'cnt' parameter (and elsewhere) to convey intention.

@@ +154,5 @@
> +    NS_NOTREACHED("cnt must be 1 2 3 4 or 16");
> +    return false;
> +  }
> +
> +  KnownUniformName mName;

Since we are doing fairly dangerous things in this struct (with the union thing), how about making it a class and protecting its members?

@@ +156,5 @@
> +  }
> +
> +  KnownUniformName mName;
> +  const char *mNameString;
> +  KnownUniformType mType; // XXX this needs to be updated and used

This --- given the trouble you've had with this patch, and the dangerous-lookin enum, I would like this mType checking to be fully implemented.
Attachment #8361237 - Flags: review?(bjacob) → review+
s/dangerous-lookin enum/dangerous-looking union/
Okay, now fully working and passing tests.  This is part 1 of 3.

This changes ShaerProgramOGL so that uniforms are referred to by enum, and caches values to avoid unnecessary glUniform* calls.
Attachment #8349587 - Attachment is obsolete: true
Attachment #8361237 - Attachment is obsolete: true
Attachment #8366014 - Flags: review?(bjacob)
This is special, and so was pulled out; the projection matrix is set when the program is *not* active, and is expected to get activated the next time it does become active.

This also renames the method to DelayedSetProjectionMatrix (instead of "CheckAndSetProjectionMatrix"), to make it clear what it's doing.
Attachment #8366016 - Flags: review?(bjacob)
Move the uniform set calls to the header so they can be inlined at call sites.
Attachment #8366017 - Flags: review?(bjacob)
Comment on attachment 8366014 [details] [diff] [review]
cache uniforms, part 1

Review of attachment 8366014 [details] [diff] [review]:
-----------------------------------------------------------------

Note that some suggestions made in earlier review comments, e.g. about typed enums, still hold. But none of that is mandatory.

One important remark below about moving that array of string pointers to function-scope instead of file-scope to avoid making libxul slower to load.

::: gfx/layers/opengl/OGLShaderProgram.cpp
@@ +22,5 @@
>  
>  typedef ProgramProfileOGL::Argument Argument;
>  
>  // helper methods for GetProfileFor
> +static const char *sKnownUniformNames[] = {

Important: since sKnownUniformNames is only used in AddUniforms, move it there.

What difference does it make? Here you're not just defining literal strings, you are also taking their addresses and storing them in this array of pointers. Not your fault: C/C++ doesn't allow you to define an array of literal strings. But the consequence of that is that you get N relocations when loading libxul, where N is the number of strings in this array. By moving this to be instead a file-scope static in AddUniforms, you defer that work to when AddUniforms is first called.
Attachment #8366014 - Flags: review?(bjacob) → review+
Attachment #8366016 - Flags: review?(bjacob) → review+
Comment on attachment 8366017 [details] [diff] [review]
cache uniforms, part 3 - make calls inline

Review of attachment 8366017 [details] [diff] [review]:
-----------------------------------------------------------------

Temporarily cancelling the review until I understand why we want inlining badly enough that we're willing to churn the implementation of these methods back into the headers, and have GLContext.h creep some of its way back into more translation units.

::: gfx/layers/opengl/OGLShaderProgram.h
@@ +5,5 @@
>  
>  #ifndef GFX_OGLSHADERPROGRAM_H
>  #define GFX_OGLSHADERPROGRAM_H
>  
> +#include "GLContext.h"                  // for fast inlines of glUniform*

Does inlining of these GL calls actually matter? In all the GL's I've seen, even the cheapest GL call has to punch through several non-inlined function calls anyway. I'm interested partly because the answer to that question determines what is acceptable to do in GLContext; and I went through a bit of work last fall to minimize the number of compilation units #including GLContext, which sped up recompiles because GLContext.h is a header that we often touch.

@@ +519,5 @@
> +    NS_ASSERTION(aKnownUniform >= 0 && aKnownUniform < KnownUniform::KnownUniformCount, "Invalid known uniform");
> +
> +    KnownUniform& ku(mProfile.mUniforms[aKnownUniform]);
> +    if (ku.UpdateUniform(aLength, aFloatValues)) {
> +      if (aLength == 1) {

This is a good candidate for a switch statement, which the compiler would easily optimize with a call table. If this is performance sensitive enough to care about inlining the GL call, it might also be performance sensitive enough for that ;-)
Attachment #8366017 - Flags: review?(bjacob)
Comment on attachment 8366017 [details] [diff] [review]
cache uniforms, part 3 - make calls inline

r+ post IRC conversation. I dont expect it to make any significant difference, and I don't care strongly.
Attachment #8366017 - Flags: review+
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: