Closed Bug 999713 Opened 6 years ago Closed 2 years ago

Call MakeCurrent universally, but lazily

Categories

(Core :: Canvas: WebGL, defect)

defect
Not set

Tracking

()

RESOLVED DUPLICATE of bug 1390386
mozilla31

People

(Reporter: jgilbert, Assigned: jgilbert)

References

Details

Attachments

(8 files, 10 obsolete files)

28.46 KB, patch
kamidphish
: review+
jgilbert
: checkin+
Details | Diff | Splinter Review
13.42 KB, patch
jgilbert
: review+
Details | Diff | Splinter Review
1.61 KB, patch
jgilbert
: review+
Details | Diff | Splinter Review
6.13 KB, patch
Details | Diff | Splinter Review
16.96 KB, patch
bjacob
: review-
Details | Diff | Splinter Review
28.88 KB, patch
bjacob
: review-
mattwoodrow
: review+
Details | Diff | Splinter Review
5.37 KB, patch
kamidphish
: review+
Details | Diff | Splinter Review
19.24 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
Currently, we need to make sure we call MakeCurrent before making GL calls. In DEBUG builds, we get an assert which makes sure we are current.

One path we investigated in bug 972668 was to statically assure that we were calling MakeCurrent in our WebGL code.

An alternative, investigated here, is to instead MakeCurrent in GLContext.h's fFoo() functions. This guarantees we call MakeCurrent before running GL calls, but would be quite slow, especially since the compositor doesn't have this same overhead, though WebGL nearly does this already.

So, we should try to cache whether or not we're the current context on the thread, and only MakeCurrent if we aren't.

One complication for this is plugin code. When we enter plugin code, the plugin can MakeCurrent something on our thread. To work with this, we need to invalidate our cache when we exit plugin execution. However, since plugins can re-enter into Gecko code, we also need to invalidate our cache on plugin *entry*, not just exit.
Attached patch patch 1: Cache MakeCurrent calls (obsolete) — Splinter Review
Attachment #8410547 - Flags: review?(bjacob)
Attached patch patch 2: Always call MakeCurrent (obsolete) — Splinter Review
Attachment #8410548 - Flags: review?(bjacob)
Attachment #8410550 - Flags: review?(matt.woodrow)
Attachment #8410550 - Flags: review?(bjacob)
Attachment #8410553 - Flags: review?(bjacob)
See Also: → 972668
Without the last printf patch:
https://tbpl.mozilla.org/?tree=Try&rev=b244fafa7280
Green except printf-during-shutdown?

With:
https://tbpl.mozilla.org/?tree=Try&rev=48dd95dbcb79
Comment on attachment 8410545 [details] [diff] [review]
patch 0: Kill whitespace in files we're going to touch

Review of attachment 8410545 [details] [diff] [review]:
-----------------------------------------------------------------

Death to trailing whitespace.
Attachment #8410545 - Flags: review?(dglastonbury) → review+
Attachment #8410545 - Flags: checkin+
https://hg.mozilla.org/mozilla-central/rev/b06254e402b2
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
It seems that a [leave open] was intended here.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [leave open]
Comment on attachment 8410547 [details] [diff] [review]
patch 1: Cache MakeCurrent calls

Review of attachment 8410547 [details] [diff] [review]:
-----------------------------------------------------------------

R+ with mandatory nits on gfx/ parts.

You need another reviewer than me for NPAPI parts. Perhaps :BenWa?

::: gfx/gl/GLContext.h
@@ +2483,5 @@
>      typedef gfx::SharedSurfaceType SharedSurfaceType;
>      typedef gfx::SurfaceFormat SurfaceFormat;
>  
> +    // Current context tracking (by thread)
> +    static unsigned sTLSId_CurrentContext;

I would recommend that you also port this to MFBT ThreadLocal, but it's good to save that for a separate patch.

@@ +2491,5 @@
> +        PR_NewThreadPrivateIndex(&sTLSId_CurrentContext, nullptr);
> +    }
> +
> +    static const GLContext* GetCurrentContext() {
> +        return (const GLContext*)PR_GetThreadPrivate(sTLSId_CurrentContext);

static_cast, not C cast.

@@ +2505,4 @@
>      virtual bool MakeCurrentImpl(bool aForce) = 0;
>  
> +private:
> +    uint32_t mSkippedMakeCurrents;

I don't suppose that you want to keep this for landing?

@@ +2520,5 @@
> +                ++mSkippedMakeCurrents;
> +                return true;
> +            }
> +        }
> +        printf_stderr("Couldn't skip MakeCurrent. (%u skips)\n", mSkippedMakeCurrents);

Remove this printf or restrict it somehow so it's not so noisy?
Attachment #8410547 - Flags: review?(bjacob)
Attachment #8410547 - Flags: review?(bgirard)
Attachment #8410547 - Flags: review+
I would quite strongly recommend that you insert the port to MFBT ThreadLocal as part 1.5 here, so that you immediately limit the overhead of thread local storage. I seem to remember that (perhaps depending on the platform) MFBT ThreadLocal can be significantly lower overhead than NSPR.
Attachment #8410548 - Flags: review?(bjacob) → review+
Comment on attachment 8410550 [details] [diff] [review]
patch 3: Split MakeCurrent(bool) into MaybeMakeCurrent and ForceMakeCurrent

Review of attachment 8410550 [details] [diff] [review]:
-----------------------------------------------------------------

MaybeMakeCurrent is a bad name because it doesn't reflect the fact that it actually is meant to honor a precise contract:

The contract (IIUC) is that when MaybeMakeCurrent returns, the given GLContext is current on the current thread.

How about renaming MaybeMakeCurrent to EnsureCurrent ?

ForceMakeCurrent is fine.

::: gfx/gl/GLContext.cpp
@@ -459,5 @@
>  
>      };
>  
>      mInitialized = LoadSymbols(&symbols[0], trygl, prefix);
> -    MakeCurrent();

This has me concerned. What happens if we create a context, destroy it, and create a new context, and that new context happens to have exactly the same address as the old one? We wouldn't want to skip the initial makecurrent then. I think it would be safer to keep this makecurrent and make it a ForceMakeCurrent.

OK, I realize (now) that MarkDestroyed below does null the TLS pointer so that shouldn't actually ever happen. Still, this seems dangerous enough, and the benefit of skipping one makecurrent during init is so low, that I would rather have us ForceMakeCurrent here.

::: gfx/gl/GLContextProviderEGL.cpp
@@ +370,5 @@
>          Screen()->AssureBlitted();
>      }
>  
>      mSurfaceOverride = surf ? (EGLSurface) surf : mSurface;
> +    ForceMakeCurrent();

good catch,

@@ +432,5 @@
>      mSurface = mozilla::gl::CreateSurfaceForWindow(nullptr, mConfig); // the nullptr here is where we assume Android.
>      if (mSurface == EGL_NO_SURFACE) {
>          return false;
>      }
> +    return ForceMakeCurrent();

and another good catch. I wonder if we could go full monty and also TLS the surface that we're current against. But that might be overkill, and totally unneeded outside of EGL. The concept itself may be mostly EGL specific.

::: gfx/layers/opengl/CompositorOGL.cpp
@@ +661,5 @@
>    // to make sure that GL sees the updated widget size.
>    if (mWidgetSize.width != width ||
>        mWidgetSize.height != height)
>    {
> +    mGLContext->ForceMakeCurrent();

I'm surprised that we're not checking the return value here (to detect context loss etc).
Attachment #8410550 - Flags: review?(bjacob) → review+
(In reply to Jeff Gilbert [:jgilbert] from comment #0)
> One complication for this is plugin code. When we enter plugin code, the
> plugin can MakeCurrent something on our thread. To work with this, we need
> to invalidate our cache when we exit plugin execution. However, since
> plugins can re-enter into Gecko code, we also need to invalidate our cache
> on plugin *entry*, not just exit.

What about widget code, on platforms where we may have in-main-thread widget calls that may use OpenGL? Thinking about Mac here, and perhaps some flavors of Linux.
Need an reply to comment 16.
Flags: needinfo?(jgilbert)
Attachment #8410551 - Flags: review?(bjacob) → review+
Attachment #8410553 - Flags: review?(bjacob) → review+
Attachment #8410554 - Flags: review?(bjacob) → review+
I'm concerned that having the Skipped counter on in release builds is still like wasting 1 cache line for little good reason.
Comment on attachment 8410547 [details] [diff] [review]
patch 1: Cache MakeCurrent calls

Review of attachment 8410547 [details] [diff] [review]:
-----------------------------------------------------------------

r- because we're still missing possible calls to MakeCurrent.

::: dom/plugins/base/nsNPAPIPluginInstance.cpp
@@ +1845,5 @@
> +  if (aReentryState == NS_PLUGIN_CALL_UNSAFE_TO_REENTER_GECKO) {
> +    ++gInUnsafePluginCalls;
> +  }
> +
> +  mozilla::gl::GLContext::SetCurrentContext(nullptr);

You're assuming that all plugins execution will go through this Begin which is incorrect. A plugin is free to post a event to the native event loop to set a timer.

It's probably very rare however for plugins to be loaded in the main thread since plugins are generally off (but not always) out of process. You might just want to have a way to disable your optimization if we have a gecko main thread plugin loaded.

Don't forget that native widget (or more generally native code) may also post events that call MakeCurrent or even as a response/callback to the current gecko event invoke MakeCurrent. We have to be very careful with this kind of optimization.
Attachment #8410547 - Flags: review?(bgirard) → review-
Attachment #8410550 - Flags: review?(matt.woodrow) → review+
:bjacob and I talked about this. For now, we want to basically restrict it to B2G, which doesn't have dangerous plugins.

We should also be able to enable it for platforms where we won't have widgets doing GL, so Windows.

Based on the Try run, which includes an assert that our cached value is correct, I'm pretty sure we're being over-cautious about widget code clobbering GL state. If we're worried about this in production, we can force-make-current once in the compositor, so at least that won't ever be messed up.

:bgirard's point about plugins posting directly to the main thread is well taken. In light of that concern, we should hook glMakeCurrent, and intercept calls to it. This should handle everything perfectly. (and I think we support overriding functions on all platforms?)
Status: REOPENED → ASSIGNED
(In reply to Benoit Jacob [:bjacob] from comment #18)
> I'm concerned that having the Skipped counter on in release builds is still
> like wasting 1 cache line for little good reason.

Sorry, Hijacking:

Regarding wasting a cacheline. I think we should consider doing some manual PGO and having a 'hot' segment in our binary. This would be code that is hot in our vanilla (no options flipped) builds. The profiler's state tracking is another thing that is constantly adding cache pressure just to check the state.

(In reply to Jeff Gilbert [:jgilbert] from comment #21)

B2G is probably a safe place to start (and good bang for our buck). Note that hooking glMakeCurrent is dangerous because it might just be an external entry point. The driver and/or native platform may have change the current context through any mean via an internal symbol.
Whiteboard: [leave open]
Comment on attachment 8410548 [details] [diff] [review]
patch 2: Always call MakeCurrent

Review of attachment 8410548 [details] [diff] [review]:
-----------------------------------------------------------------

Doesn't this add overhead to every compositor GL call when we know that nobody will ever clobber the current context?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #24)
> Comment on attachment 8410548 [details] [diff] [review]
> patch 2: Always call MakeCurrent
> 
> Review of attachment 8410548 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Doesn't this add overhead to every compositor GL call when we know that
> nobody will ever clobber the current context?
A very small amount, sure. We can add an override for this.
Blocks: 1014011
No longer blocks: 1014011
Attached file patch 7: Fixups (obsolete) —
Attachment #8431923 - Flags: review?(bjacob)
Comment on attachment 8431923 [details]
patch 7: Fixups

Nevermind, combined these into previous patches. After I get a building Try run, I'll post the new patch queue.
Attachment #8431923 - Attachment is obsolete: true
Attachment #8431923 - Attachment is patch: false
Attachment #8431923 - Flags: review?(bjacob)
I'm booked this week anyway, due to the 'bootcamp' week going on in Toronto,
https://intranet.mozilla.org/Bootcamp1
Attached patch patch 1: Cache MakeCurrent calls (obsolete) — Splinter Review
Should be mostly the same. Please check that I did the TLS correctly.
Attachment #8410547 - Attachment is obsolete: true
Attachment #8434482 - Flags: review?(bjacob)
Sorry this is big again/still.
Attachment #8410548 - Attachment is obsolete: true
Attachment #8410550 - Attachment is obsolete: true
Attachment #8434486 - Flags: review?(bjacob)
Attachment #8434488 - Attachment description: patch 4: Remove IsCurrent asserts, since we're lazily-implicitly current → patch 3: Remove IsCurrent asserts, since we're lazily-implicitly current
r=bjacob
Attachment #8410553 - Attachment is obsolete: true
Attachment #8434490 - Flags: review+
Attachment #8410554 - Attachment is obsolete: true
Attachment #8434491 - Flags: review?(bjacob)
Oops, that didn't build.
Attachment #8434491 - Attachment is obsolete: true
Attachment #8434491 - Flags: review?(bjacob)
Attachment #8434557 - Flags: review?(bjacob)
Attachment #8434482 - Flags: review?(bjacob)
Attachment #8434486 - Flags: review?(bjacob)
Attachment #8434557 - Flags: review?(bjacob)
Sorry about the long delay, and the decision that might sound disappointing: I shouldn't anymore be the one reviewing these patches.

I believe that the people doing nontrivial/design reviews like this, contributing nontrivial decisions, should be the _same_ people who would suffer the consequences if the decision is wrong i.e the people on the hook to fix regressions and dealing with that code in their daily work.

So as I'm moving out of gfx, I should not be doing such reviews anymore. I can still give feedback, but I've already given positive feedback here earlier.

I really like the idea of these patches and I expect that they're going to be all right, but I'm not 100% certain. More eyes on these patches would be a good thing.
Comment on attachment 8434482 [details] [diff] [review]
patch 1: Cache MakeCurrent calls

Review of attachment 8434482 [details] [diff] [review]:
-----------------------------------------------------------------

You will need a review from :BenWa for the plugin stuff.

::: gfx/gl/GLContext.h
@@ +617,5 @@
>  
>      void BeforeGLCall(const char* glFunction)
>      {
>          MOZ_ASSERT(IsCurrent());
> +

Instead of asserting on IsCurrent, just call MakeCurrent here ...

@@ +657,5 @@
>      static void AssertNotPassingStackBufferToTheGL(const void* ptr);
>  
>  #define BEFORE_GL_CALL                              \
>              do {                                    \
> +                MakeCurrent();                      \

...rather than here? It's always nice to avoid hiding code in macros. Plus, this removes the need for the assertion above.

@@ +678,5 @@
>  
> +#define BEFORE_GL_CALL                              \
> +            do {                                    \
> +                MakeCurrent();                      \
> +            } while (0)

Since that BEFORE_GL_CALL macro is now nontrivial even in non-DEBUG, I think that you should just remove it and instead just call BeforeGLCall (which should be inlined by the compiler), and move the #ifdef DEBUG to there.

@@ +2516,5 @@
>      typedef gfx::SurfaceFormat SurfaceFormat;
>  
> +public:
> +    static void StaticInit() {
> +        DebugOnly<bool> ok = sTLS_CurrentContext.init();

What is the point of that DebugOnly variable? Did you mean to assert on it? (Would be a good idea).

@@ +2520,5 @@
> +        DebugOnly<bool> ok = sTLS_CurrentContext.init();
> +    }
> +
> +    static const GLContext* GetCachedCurrentContext() {
> +        return (const GLContext*)sTLS_CurrentContext.get();

What is the point of this pointer cast? None I think!

Also, C casts for pointer types are an unconditional r- :-)

@@ +2532,4 @@
>      virtual bool MakeCurrentImpl(bool aForce) = 0;
>  
> +private:
> +    DebugOnly<uint32_t> mSkippedMakeCurrents;

This is neat for experimentation to help prove that this optimization is useful. Is it worth having in mozilla-central?
Attachment #8434482 - Flags: review-
(In reply to Benoit Jacob [:bjacob] from comment #37)
> Comment on attachment 8434482 [details] [diff] [review]

We should also get performance numbers for this work. I'm worried about the unnecessary tls lookups.
Oh yes, great point, do get performance numbers before we go on looking at code.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #38)
> (In reply to Benoit Jacob [:bjacob] from comment #37)
> > Comment on attachment 8434482 [details] [diff] [review]
> 
> We should also get performance numbers for this work. I'm worried about the
> unnecessary tls lookups.

LayersBench(? BenWa's thing) showed no difference.
(In reply to Jeff Gilbert [:jgilbert] from comment #40)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #38)
> > (In reply to Benoit Jacob [:bjacob] from comment #37)
> > > Comment on attachment 8434482 [details] [diff] [review]
> > 
> > We should also get performance numbers for this work. I'm worried about the
> > unnecessary tls lookups.
> 
> LayersBench(? BenWa's thing) showed no difference.

On which platform? It would also be good to know the actual cost of the tls lookup.
Attachment #8434482 - Attachment is obsolete: true
Attachment #8443778 - Flags: review?(bjacob)
Attachment #8434486 - Attachment is obsolete: true
Attachment #8443779 - Flags: review?(matt.woodrow)
Attachment #8443779 - Flags: review?(bjacob)
Attachment #8443780 - Flags: review?(dglastonbury)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #41)
> (In reply to Jeff Gilbert [:jgilbert] from comment #40)
> > (In reply to Jeff Muizelaar [:jrmuizel] from comment #38)
> > > (In reply to Benoit Jacob [:bjacob] from comment #37)
> > > > Comment on attachment 8434482 [details] [diff] [review]
> > > 
> > > We should also get performance numbers for this work. I'm worried about the
> > > unnecessary tls lookups.
> > 
> > LayersBench(? BenWa's thing) showed no difference.
> 
> On which platform? It would also be good to know the actual cost of the tls
> lookup.

Nexus 10.

In these new patches, I add a manual mode for the compositor thread.

Given this manual mode, I'm tempted to just s/EnsureCurrent/MakeCurrent/, so we don't need to do a busywork change to all callers.
Attachment #8443779 - Flags: review?(matt.woodrow) → review+
Attachment #8443782 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8443778 [details] [diff] [review]
patch 1: Cache MakeCurrent calls

Review of attachment 8443778 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContext.h
@@ +2518,5 @@
> +        MOZ_ASSERT(ok);
> +    }
> +
> +    static inline const GLContext* GetCachedCurrentContext() {
> +        return (const GLContext*)sTLS_CurrentContext.get();

I r-'d an earlier version of this patch because this is a C cast of pointer types. I still think that that's a sufficient reason to r- :-)

There are other things in this patch that seem to ignore my earlier review comments, but I can't guess which ones are intentional (because my suggestions were bad?) For instance, does BEFORE_GL_CALL really need to be a macro still? If that's only needed to get the FUNCTION_NAME, then the nontrivial work could all be deferred to an inline BeforeGLCall function taking the function name literal string as parameter. Generally, I think that there is a lot of value in minimizing the amount of interesting work hidden behind macros.

Finally (and most importantly) I thought we agreed that what's really needed here, before we get to discuss reviewing details, is performance analysis on some real-world workloads on real typical B2G hardware (which is less fast than a Nexus 10).
Attachment #8443778 - Flags: review?(bjacob) → review-
Comment on attachment 8443779 [details] [diff] [review]
patch 2.0: Update callers in gfx/gl

Review of attachment 8443779 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContextProviderEGL.cpp
@@ +371,5 @@
>          Screen()->AssureBlitted();
>      }
>  
>      mSurfaceOverride = surf ? (EGLSurface) surf : mSurface;
> +    MOZ_ALWAYS_TRUE(ForceMakeCurrent());

Defaulting to r- as there is something basic that I'm not understanding here. This patch makes us crash debug builds when we fail to ensure that a GLContext is current. Do I correctly understand that if for whatever reason eglMakeCurrent fails (say, context loss), this now makes us crash debug builds? We definitely want to (try to) gracefully handle context loss...

The rest of this patch looks fine and I would r+ if this concern goes away.
Attachment #8443779 - Flags: review?(bjacob) → review-
(In reply to Benoit Jacob [:bjacob] from comment #48)
> Comment on attachment 8443778 [details] [diff] [review]
> patch 1: Cache MakeCurrent calls
> 
> Review of attachment 8443778 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/gl/GLContext.h
> @@ +2518,5 @@
> > +        MOZ_ASSERT(ok);
> > +    }
> > +
> > +    static inline const GLContext* GetCachedCurrentContext() {
> > +        return (const GLContext*)sTLS_CurrentContext.get();
> 
> I r-'d an earlier version of this patch because this is a C cast of pointer
> types. I still think that that's a sufficient reason to r- :-)
Fixed. It's a duplicate cast anyways.

> 
> There are other things in this patch that seem to ignore my earlier review
> comments, but I can't guess which ones are intentional (because my
> suggestions were bad?) For instance, does BEFORE_GL_CALL really need to be a
> macro still? If that's only needed to get the FUNCTION_NAME, then the
> nontrivial work could all be deferred to an inline BeforeGLCall function
> taking the function name literal string as parameter. Generally, I think
> that there is a lot of value in minimizing the amount of interesting work
> hidden behind macros.
I don't actually think this makes things cleaner. It also requires a s/Foo/Bar/ patch.
Also, we know 100% that macros are inlined, whereas we've definitely had trouble force-inlining
things in the past.
> 
> Finally (and most importantly) I thought we agreed that what's really needed
> here, before we get to discuss reviewing details, is performance analysis on
> some real-world workloads on real typical B2G hardware (which is less fast
> than a Nexus 10).

I don't know what numbers would be useful to have here. I would like to avoid the game of "I make up numbers and get told why they aren't useful". Do we have a list of good performance metric methods?

FWIW, with these new patches, we do not eat a TLS lookup in the compositor path.
Comment on attachment 8443780 [details] [diff] [review]
patch 2.1: Update callers in content/canvas/src

Review of attachment 8443780 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.

::: content/canvas/src/WebGLContext.cpp
@@ +1336,5 @@
>  
>  void
> +WebGLContext::MakeContextCurrent() const
> +{
> +    // It's automatic now!

Awesome. I assume the next patch will remove this function and all calls to it?
Attachment #8443780 - Flags: review?(dglastonbury) → review+
(In reply to Jeff Gilbert [:jgilbert] from comment #50)
> (In reply to Benoit Jacob [:bjacob] from comment #48)
> > Comment on attachment 8443778 [details] [diff] [review]
> > patch 1: Cache MakeCurrent calls
> > 
> > Review of attachment 8443778 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: gfx/gl/GLContext.h
> > @@ +2518,5 @@
> > > +        MOZ_ASSERT(ok);
> > > +    }
> > > +
> > > +    static inline const GLContext* GetCachedCurrentContext() {
> > > +        return (const GLContext*)sTLS_CurrentContext.get();
> > 
> > I r-'d an earlier version of this patch because this is a C cast of pointer
> > types. I still think that that's a sufficient reason to r- :-)
> Fixed. It's a duplicate cast anyways.
> 
> > 
> > There are other things in this patch that seem to ignore my earlier review
> > comments, but I can't guess which ones are intentional (because my
> > suggestions were bad?) For instance, does BEFORE_GL_CALL really need to be a
> > macro still? If that's only needed to get the FUNCTION_NAME, then the
> > nontrivial work could all be deferred to an inline BeforeGLCall function
> > taking the function name literal string as parameter. Generally, I think
> > that there is a lot of value in minimizing the amount of interesting work
> > hidden behind macros.
> I don't actually think this makes things cleaner. It also requires a
> s/Foo/Bar/ patch.
> Also, we know 100% that macros are inlined, whereas we've definitely had
> trouble force-inlining
> things in the past.

As someone who's maintained an expression-templates matrix libraries relying entirely on inlining on 20+-deep call chains, I know what problems you're talking about, but I also think that in the present case, on current compilers that we have to support (the oldest MSVC that matters is 2010) the likeliness of such problems is easily low enough that the benefits of inline functions (if only for debugging ease) are worth it. But, it doesn't matter hugely, I guess.

> > 
> > Finally (and most importantly) I thought we agreed that what's really needed
> > here, before we get to discuss reviewing details, is performance analysis on
> > some real-world workloads on real typical B2G hardware (which is less fast
> > than a Nexus 10).
> 
> I don't know what numbers would be useful to have here. I would like to
> avoid the game of "I make up numbers and get told why they aren't useful".
> Do we have a list of good performance metric methods?

Since you're the one proposing this approach and these patches, it's for you to come up with a proposal of what would be good enough performance measurements and convince the rest of us that they are sufficient :-)

> 
> FWIW, with these new patches, we do not eat a TLS lookup in the compositor
> path.

(Limited colloquial english skills here) what is meant by "eat" here?
(In reply to Benoit Jacob [:bjacob] from comment #52)
> (In reply to Jeff Gilbert [:jgilbert] from comment #50)
> > I don't know what numbers would be useful to have here. I would like to
> > avoid the game of "I make up numbers and get told why they aren't useful".
> > Do we have a list of good performance metric methods?
> 
> Since you're the one proposing this approach and these patches, it's for you
> to come up with a proposal of what would be good enough performance
> measurements and convince the rest of us that they are sufficient :-)

But to give you a hint in a way that shouldn't bind me: AFAICS, it would be good enough to measure that there is no significant negative impact on typical games (preferably real commercial games that we've been tracking the performance of) on B2G on typical B2G hardware (hamachi etc, anything MSM7225 based).

That leaves somewhat open the definition of "no significant negative impact". Ideally, you'd look at a workload where we're not quite at 60FPS, so that any significant impact would be seen in the FPS counter. IIRC, on the WMW game, we are not quite at 60FPS (hope this is still the case) so that would make it a good testcase.

I think that you need 3 testcases, one in each of the following categories:
 - WebGL
 - Canvas2D game using Skia/GL (default on B2G)
 - no canvas, just DOM compositing.
So to wrap it up:
 - WebGL: try WMW
 - Skia/GL: try Cut The Rope
 - DOM: try some CSS transforms demo such as one of the bigger molecules on http://mrdoob.github.io/three.js/examples/css3d_molecules.html
(In reply to Benoit Jacob [:bjacob] from comment #52)
> > > 
> > > Finally (and most importantly) I thought we agreed that what's really needed
> > > here, before we get to discuss reviewing details, is performance analysis on
> > > some real-world workloads on real typical B2G hardware (which is less fast
> > > than a Nexus 10).
> > 
> > I don't know what numbers would be useful to have here. I would like to
> > avoid the game of "I make up numbers and get told why they aren't useful".
> > Do we have a list of good performance metric methods?
> 
> Since you're the one proposing this approach and these patches, it's for you
> to come up with a proposal of what would be good enough performance
> measurements and convince the rest of us that they are sufficient :-)

I actually disagree with this being the general rule.

I don't have good expertise in layout/layerization/compositor to propose good benchmarks.

This sort of lack of expertise is why we often see people using poorly designed demos as benchmarks, or the ever-common "we spend a lot of time in glFinish/glClientWaitSync, and it seems to work fine when I comment them out".

I don't want to throw together benchmarks and have them be rejected when I imagine that I could just ask 'what makes a good benchmark here' and get a useful response. (Indeed, you have kindly given me some paths forward!) We should 'pull' and not 'push' in this regard.

> 
> > 
> > FWIW, with these new patches, we do not eat a TLS lookup in the compositor
> > path.
> 
> (Limited colloquial english skills here) what is meant by "eat" here?

"eat" is synonymous with "incur a penalty of" here, sorry.
(In reply to Dan Glastonbury :djg :kamidphish from comment #51)
> Comment on attachment 8443780 [details] [diff] [review]
> patch 2.1: Update callers in content/canvas/src
> 
> Review of attachment 8443780 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> LGTM.
> 
> ::: content/canvas/src/WebGLContext.cpp
> @@ +1336,5 @@
> >  
> >  void
> > +WebGLContext::MakeContextCurrent() const
> > +{
> > +    // It's automatic now!
> 
> Awesome. I assume the next patch will remove this function and all calls to
> it?

I want to do this in a follow-up patch, so I don't keep needing to rebase it.
(In reply to Jeff Gilbert [:jgilbert] from comment #55)
> (In reply to Benoit Jacob [:bjacob] from comment #52)
> > > > 
> > > > Finally (and most importantly) I thought we agreed that what's really needed
> > > > here, before we get to discuss reviewing details, is performance analysis on
> > > > some real-world workloads on real typical B2G hardware (which is less fast
> > > > than a Nexus 10).
> > > 
> > > I don't know what numbers would be useful to have here. I would like to
> > > avoid the game of "I make up numbers and get told why they aren't useful".
> > > Do we have a list of good performance metric methods?
> > 
> > Since you're the one proposing this approach and these patches, it's for you
> > to come up with a proposal of what would be good enough performance
> > measurements and convince the rest of us that they are sufficient :-)
> 
> I actually disagree with this being the general rule.
> 
> I don't have good expertise in layout/layerization/compositor to propose
> good benchmarks.
> 
> This sort of lack of expertise is why we often see people using poorly
> designed demos as benchmarks, or the ever-common "we spend a lot of time in
> glFinish/glClientWaitSync, and it seems to work fine when I comment them
> out".
> 
> I don't want to throw together benchmarks and have them be rejected when I
> imagine that I could just ask 'what makes a good benchmark here' and get a
> useful response. (Indeed, you have kindly given me some paths forward!) We
> should 'pull' and not 'push' in this regard.

Why not just measure the time that the tls lookup takes and then count the number of typical gl calls in some scenarios. We can then extrapolate the performance impact from that.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #57)
> (In reply to Jeff Gilbert [:jgilbert] from comment #55)
> > (In reply to Benoit Jacob [:bjacob] from comment #52)
> > > > > 
> > > > > Finally (and most importantly) I thought we agreed that what's really needed
> > > > > here, before we get to discuss reviewing details, is performance analysis on
> > > > > some real-world workloads on real typical B2G hardware (which is less fast
> > > > > than a Nexus 10).
> > > > 
> > > > I don't know what numbers would be useful to have here. I would like to
> > > > avoid the game of "I make up numbers and get told why they aren't useful".
> > > > Do we have a list of good performance metric methods?
> > > 
> > > Since you're the one proposing this approach and these patches, it's for you
> > > to come up with a proposal of what would be good enough performance
> > > measurements and convince the rest of us that they are sufficient :-)
> > 
> > I actually disagree with this being the general rule.
> > 
> > I don't have good expertise in layout/layerization/compositor to propose
> > good benchmarks.
> > 
> > This sort of lack of expertise is why we often see people using poorly
> > designed demos as benchmarks, or the ever-common "we spend a lot of time in
> > glFinish/glClientWaitSync, and it seems to work fine when I comment them
> > out".
> > 
> > I don't want to throw together benchmarks and have them be rejected when I
> > imagine that I could just ask 'what makes a good benchmark here' and get a
> > useful response. (Indeed, you have kindly given me some paths forward!) We
> > should 'pull' and not 'push' in this regard.
> 
> Why not just measure the time that the tls lookup takes and then count the
> number of typical gl calls in some scenarios. We can then extrapolate the
> performance impact from that.

The compositor does not use TLS in the new patch.
(In reply to Jeff Gilbert [:jgilbert] from comment #58)
> The compositor does not use TLS in the new patch.

But a webgl application would.
Something curious I saw. Looking at the WGL context handling there is a comment that says that we don't have our own TLS version of the current context pointer because that's what WGL does. http://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLContextProviderWGL.cpp#308

When I was using CodeXL from AMD to debug issues on Win7, MakeCurrent was reported as being a "slow function" because it could cause the same effects as glFinish. I couldn't find any more information in the help, but it is curious.
Let's dupe this forward to the new iteration of the same idea.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1390386
You need to log in before you can comment on or make changes to this bug.