Closed Bug 1013647 Opened 10 years ago Closed 10 years ago

WebGL operation still calls to eglGetError() after almost each operation.

Categories

(Core :: Graphics: CanvasWebGL, defect)

32 Branch
ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: jujjyl, Assigned: jgilbert)

References

Details

(Whiteboard: [games] webgl-perf)

Attachments

(1 file, 2 obsolete files)

I rooted my Nexus 10 device today, and installed the ARM Mali Graphics Debugger on it, and looking at the traces it provides, I'm seeing almost each GL function call operation to call to eglGetError(). We thought the related bug #987845 would have fixed this, but it looks like there are more such places.
Whiteboard: [games] webgl-perf
Is this in a DEBUG or OPT build? This is (IIRC) expected behavior in a DEBUG build.
I downloaded a Firefox Nightly apk from http://nightly.mozilla.org . Which flags do those have?
That's OPT, weird. I'll take a look.
Assignee: nobody → jgilbert
GLContextEGL::MakeCurrentImpl calls eglGetCurrentContext() which calls clearError() which calls eglGetError().
This call to clearError() happens because "eglGetError() returns information about the success or failure of the most recent EGL function called in a specific thread". This spec compliance was added by nvidia in commit 4aea6bff
Yeah, so this needs to be fixed by removing the MakeCurrent() calls around everything.
Blocks: 1014011
Attached patch patch (obsolete) — Splinter Review
Attachment #8430201 - Attachment is obsolete: true
Attachment #8430260 - Flags: review?(jgilbert)
No longer blocks: 1014011
Interestingly, even with this patch I see the tls lookups showing up in a profile.
This is more-or-less bug 999713.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #10)
> Interestingly, even with this patch I see the tls lookups showing up in a
> profile.

How badly do they show up? They should be really quick, compared to the cost of a GL command.
(In reply to Jeff Gilbert [:jgilbert] from comment #11)
> This is more-or-less bug 999713.

Yeah, but with less overhead and less risk. I wanted something that would be landable on 1.4

> (In reply to Jeff Muizelaar [:jrmuizel] from comment #10)
> > Interestingly, even with this patch I see the tls lookups showing up in a
> > profile.
> 
> How badly do they show up? They should be really quick, compared to the cost
> of a GL command.

Not badly, 3 out of 859 samples, but that includes gl commands that are doing expensive resolves.
Comment on attachment 8430260 [details] [diff] [review]
patch

Review of attachment 8430260 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContextProviderEGL.cpp
@@ +381,5 @@
>  
>      // Assume that EGL has the same problem as WGL does,
>      // where MakeCurrent with an already-current context is
>      // still expensive.
> +    if (aForce || (sEGLLibrary.CachedCurrentContext() != mContext && sEGLLibrary.fGetCurrentContext() != mContext)) {

Don't pack this much logic on one line, or even in one conditional.
Add an assert that CachedCurrentContext equals GetCurrentContext.
Attachment #8430260 - Flags: review?(jgilbert) → review-
(In reply to Jeff Muizelaar [:jrmuizel] from comment #12)
> (In reply to Jeff Gilbert [:jgilbert] from comment #11)
> > This is more-or-less bug 999713.
> 
> Yeah, but with less overhead and less risk. I wanted something that would be
> landable on 1.4
> 
> > (In reply to Jeff Muizelaar [:jrmuizel] from comment #10)
> > > Interestingly, even with this patch I see the tls lookups showing up in a
> > > profile.
> > 
> > How badly do they show up? They should be really quick, compared to the cost
> > of a GL command.
> 
> Not badly, 3 out of 859 samples, but that includes gl commands that are
> doing expensive resolves.

This all sounds fine.
Is there a reason we don't do this also for fennec?
(In reply to Jeff Gilbert [:jgilbert] from comment #15)
> Is there a reason we don't do this also for fennec?

I didn't want to deal with breaking plugins yet.
The untangled conditional makes things much clearer and also fixes a bug where we wouldn't set sCachedCurrentContext if the actual current context didn't match.
Attachment #8430260 - Attachment is obsolete: true
ThreadLocal<> is really not ideal; in fact we should really rename it to SlowThreadLocal or something, for easy cross-platform usage where you need a thread local variable but don't care about its performance.  gcc provides __thread and MSVC provides __declspec(thread), both of which have significantly less overhead (basically they're as fast as a regular global variable).

ThreadLocal<> is built on top of pthread_getspecific/pthread_setspecific or TlsGetValue/TlsSetValue on windows; the Windows Tls calls might get optimized out internally, but I don't know if the pthread ones can be since they require a lookup of the key (the Tls functions require a TlsAlloc first to return an index).
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #18)
> ThreadLocal<> is really not ideal; in fact we should really rename it to
> SlowThreadLocal or something, for easy cross-platform usage where you need a
> thread local variable but don't care about its performance.  gcc provides
> __thread and MSVC provides __declspec(thread), both of which have
> significantly less overhead (basically they're as fast as a regular global
> variable).

__declspec(thread) doesn't work for DLLs on XP so it's not really an option for us.

> ThreadLocal<> is built on top of pthread_getspecific/pthread_setspecific or
> TlsGetValue/TlsSetValue on windows; the Windows Tls calls might get
> optimized out internally, but I don't know if the pthread ones can be since
> they require a lookup of the key (the Tls functions require a TlsAlloc first
> to return an index).

pthread_getspecific on android is basically just:
void * pthread_getspecific(pthread_key_t key)
{
    if (key < min || key > max)) {
        return NULL;
    }
    return __get_tls()[key]);
}

where __get_tls() just loads the thread specific register. This register load is what I'd guess is slow and won't be improved by switching to __thread.
Hmm, yeah, you're right.  I actually forgot about pthread_key_create(), so ignore me!
Attachment #8430335 - Flags: review?(jgilbert)
Comment on attachment 8430335 [details] [diff] [review]
Cache the current context on B2G

Review of attachment 8430335 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContextProviderEGL.cpp
@@ +384,5 @@
>      // still expensive.
> +    bool hasDifferentContext = false;
> +    if (sEGLLibrary.CachedCurrentContext() != mContext) {
> +        // even if the cached context doesn't match the current one
> +        // might still

Good point.
Attachment #8430335 - Flags: review?(jgilbert) → review+
Comment on attachment 8430335 [details] [diff] [review]
Cache the current context on B2G

Review of attachment 8430335 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContextProviderEGL.cpp
@@ +419,3 @@
>          }
> +    } else {
> +        MOZ_ASSERT(sEGLLibrary.CachedCurrentContext() == sEGLLibrary.fGetCurrentContext())

I guess this should really be:
MOZ_ASSERT_IF(succeeded, mContext == sEGLLibrary.fGetCurrentContext());
https://hg.mozilla.org/mozilla-central/rev/e67d8be5cd0b
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: