Open Bug 674753 Opened 13 years ago Updated 2 years ago

Find a way to use APITrace for GL debugging on handhelds (and elsewhere)

Categories

(Core :: Graphics, defect)

ARM
Android
defect

Tracking

()

People

(Reporter: cjones, Unassigned)

References

(Blocks 1 open bug, )

Details

(Whiteboard: keep open! [gfx.relnote.13])

Attachments

(17 files, 1 obsolete file)

3.07 MB, patch
Details | Diff | Splinter Review
1.13 KB, patch
Details | Diff | Splinter Review
2.91 KB, patch
Details | Diff | Splinter Review
1.99 KB, patch
Details | Diff | Splinter Review
3.25 KB, patch
Details | Diff | Splinter Review
3.95 KB, patch
Details | Diff | Splinter Review
54.75 KB, patch
Details | Diff | Splinter Review
628 bytes, patch
Details | Diff | Splinter Review
7.47 KB, patch
Details | Diff | Splinter Review
3.49 KB, patch
Details | Diff | Splinter Review
6.02 KB, patch
Details | Diff | Splinter Review
3.51 KB, patch
Details | Diff | Splinter Review
5.78 KB, patch
Details | Diff | Splinter Review
4.19 KB, patch
bjacob
: review-
Details | Diff | Splinter Review
4.35 KB, patch
jrmuizel
: review+
bjacob
: review+
Details | Diff | Splinter Review
4.72 KB, patch
bjacob
: review+
jrmuizel
: review+
Details | Diff | Splinter Review
674 bytes, patch
bjacob
: review+
Details | Diff | Splinter Review
Android doesn't include a native GL debugger, AFAICT.  (Neither does Meego, FAIK.)  Perhaps relatedly, we have a bunch of GL rendering bugs there.  It's a tremendous PITA to debug these, and IME usually ends up requiring bug-specific hackery doo.  We should be pros and use protools.

Jeff points out APITrace as being the closest thing to an ideal solution that we also have the source to.  We should find a way to get this working on device.  If it needs to be actually LD_PRELOAD'd, then it will be a pain to get working on android, as usual.

The trace logs might fill up available space on device pretty quickly, so we might want the ability to redirect the log to adb or a TCP socket.

I wonder if there's a way to combine this with our GL debug mode for ultra win.  I don't know much about the current debug mode though.

It would be really cool to package this in debug builds and allow enabling it with a -gldebug flag or something.  We could also build our own HTML UI on top of the event logs for hackability.

We could give PIXwin a run for its money with the right investment (*cough* contractor *cough*).

Not sure of a better place to track this, so just sticking it on bug 607684.
(APITrace is under the MIT license, BTW.)
Oops, linked the wrong repo.  Also forgot to note that APITrace supports D3D, so we would also have the opportunity to develop a tool that gives a consistent debugging experience across all Tier Is.  (With yet more investment.)  But we should narrowly focus on android initially, since that's where we're hurting the most.
I investigated to see how we can integrate this. Here is what cjones and I found:

- Android doesn't support LD_PRELOAD.
- APITrace defines the symbols, wraps them and forward them using dlsym.

I think it may be a viable approach to modify GLContextProviderEGL.cpp to specifically dynamically link to APITrace and modify dlsym to not use RTLD_NEXT but explicitly load EGL_LIB.

Another note is that APITrace doesn't know about the KHR extentions. We seem to use only 4:
mHave_EGL_KHR_image_base, mHave_EGL_KHR_image_pixmap, mHave_EGL_KHR_gl_texture_2D_image, mHave_EGL_KHR_lock_surface
Another issue is how good its OpenGL ES support is.  My understanding is that ES isn't a faithful subset of OpenGL, but I have no idea how much that might bite in practice.
(In reply to comment #0)
> Android doesn't include a native GL debugger, AFAICT.  (Neither does Meego,
> FAIK.)  Perhaps relatedly, we have a bunch of GL rendering bugs there.  It's
> a tremendous PITA to debug these, and IME usually ends up requiring
> bug-specific hackery doo.  We should be pros and use protools.

We are still at a stage where we cause GL errors, see e.g. bug 661002. This kind of bug is easy to track down with the GL debug mode and some extensions of it (bug 654424). But yes, for sure APItrace would be very useful.

> I wonder if there's a way to combine this with our GL debug mode for ultra
> win.  I don't know much about the current debug mode though.

Check for yourself the opengl debug mode: gfx/thebes/GLContext.h, search for BeforeGLCall and AfterGLCall.

Things that the GL debug mode is doing, include:
 1. calling glFinish() after every GL call
 2. checking that GL calls are called on the GLContext that is current.
 3. optionally, logging GL calls
 4. optionally, logging GL errors
 5. optionally, aborting upon GL error

It would be useful to know which of these are done (better) by APItrace and which aren't.

For sure, 3 is done better by APItrace. Also for sure 2 is not done by APItrace since that can't be detected at the level of the GL API, only at the level of our C++ GLContext API. I don't know about others.

> 
> It would be really cool to package this in debug builds

+1 for APItrace bundled with debug builds.
It might also be possible to work around the lack of LD_PRELOAD by using a technique similar to what is done on OS X:

Usage on Mac OS X is similar to Linux above, except for the tracing procedure,
which is instead:

  DYLD_LIBRARY_PATH=/path/to/apitrace/wrappers /path/to/application

Note that although Mac OS X has an LD_PRELOAD equivalent,
DYLD_INSERT_LIBRARIES, it is mostly useless because it only works with
DYLD_FORCE_FLAT_NAMESPACE=1 which breaks most applications.  See the 'dyld' man
page for more details about these environment flags.
It's also worth noting that many of the driver vendors (qualcomm and powervr) have debuggers that will work with Android.
For Firefox/android, whatever we do has to use dlopen/dlsym, since that's all we have available.  BenWa's suggestion in comment 3 sounds OK to me, IIUC; basically, ensure that GLContextProviderEGL dlsym's/glGetProcAddress's from libapitrace.so, and make sure libapitrace.so dlsym's and glGetProcAddress's from the "real" libgl.so.

Jeff: I'm more concerned about comment 4.  Do you have any guess about compat issues?
Chris: for an overview of differences between OpenGL and OpenGL ES at the level of function prototypes (what APItrace is concerned about) look at the GL functions in GLContext.h. Some are slightly different in ES. For example, ES has glDepthRangef while non-ES has glDepthRange.
I made a fork to experience with the changes I have so far. It's very hacky at this point:

https://github.com/bgirard/apitrace

edit the paths in cross_compile.sh and CMakeLists.txt
source cross_compile.sh
cmake -H. -Bbuild
make -C build

the build fails at 77% because we're not including glx.
s/experience/experiment
I checked in some changes. It now fully compiles and links but for some reason that I don't under CMake does a static library instead of a shared library. If someone with CMake experience wants to take a look that would be great.
http://www.cmake.org/cmake/help/cmake2.6docs.html#command:add_library says

  add_library(foo.so SHARED foo.c ...)

Is that not working?
Here is the line I have in my gitfork:

> add_library (glxtrace SHARED glxtrace.cpp trace_write.cpp os_posix.cpp ${CMAKE_CURRENT_BINARY_DIR}/glproc.hpp)
Progress I've made over the weekend:
The shared library is now created, it exports (nearly) all the symbols we need for EGL/GLES.

We get a crash when doing the dlopen step in fennec with a corrupted trace and SIGILL. I'm trying to figure exactly what is causing the crash. 'file' on the android GLES shared library and the apitrace library reports the same arm format.

One guess is that dlopen causes static initializers to run in apitrace and those fails I don't see how this would result in SIGILL.
Assignee: nobody → kchen
Do we have any use case that I can try?
See discussion at http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/93b5ae5d54cdf756# .

Kan-Ru, I lean towards option (3).  That should be easiest for developers who want to debug Firefox/Android, and additionally will work in Gonk builds without much trouble.  We probably want to control loading apitrace with an environment variable.  Let's call that MOZ_TRACE_GL until someone has a better proposal :).

(In reply to Kan-Ru Chen from comment #16)
> Do we have any use case that I can try?

If you can get a recording of Firefox/Android just starting up and rendering say bing.com, that would be awesome! :)
Another question we need to answer: should we import apitrace into mozilla-central?  I think we probably should.  Kan-Ru, have you investigated getting your changes into the upstream apitrace?  That would make maintenance easier.

And also, re:

> I don't know many portions are rendered by GL in fennec now, but I only
> see calls from WebGL. 

You need to set the preference "layers.acceleration.force-enabled" to "true" to have general web content composited with GL.  You can set the preference by opening the URL about:config and searching for that preference name.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #17)
> We probably want to control loading apitrace with an
> environment variable.  Let's call that MOZ_TRACE_GL until someone has a
> better proposal :).

How about the proposed gfx.apitrace.path and gfx.apitrace.enabled preferences? Using environment variable on Android is as hard as using LD_PRELOAD ;-)
We have some hacks for environment variables: https://wiki.mozilla.org/Android#Arguments_and_Environment_Variables.  But either way is fine with me.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #18)
> Another question we need to answer: should we import apitrace into
> mozilla-central?  I think we probably should.  Kan-Ru, have you investigated
> getting your changes into the upstream apitrace?  That would make
> maintenance easier.

I'm working toward this. Currently upstream apitrace has merged EGL/GLES support contributed by LunarG, but it's kind of messy that it depends on a few desktop headers to build.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #18)
> Another question we need to answer: should we import apitrace into
> mozilla-central?  I think we probably should.

License looks ok, so a=me. Not clear on where it should be put, though. modules/ ?
(In reply to Joe Drew (:JOEDREW!) from comment #22)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #18)
> > Another question we need to answer: should we import apitrace into
> > mozilla-central?  I think we probably should.
> 
> License looks ok, so a=me. Not clear on where it should be put, though.
> modules/ ?

I currently put it under gfx/ nearby the gl code.
I think tools/ would be a good location for it.
Blocks: 692867
A pref is really great but perhaps having an environment variable as well to force enable this would be handy for platform developers, but it may be overkill.

Running:
'MOZ_TRACE_GL=1 minefield.sh' 
is much easier then flipping a pref on and back off.
Is it possible to replay the trace on the device as well? This would be helpful for reproducing driver crashes outside of Gecko.
yes, replay on device is possible. Only need to implement the relevant setup routine for eglretrace.
joe/benwa/jrmuizel/benoit are you guys signed off on landing this directly in m-c?
Comment on attachment 576854 [details] [diff] [review]
part 10: Build and install libapitrace

Review of attachment 576854 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/installer/packager.mk
@@ +331,5 @@
>    cp $(GECKO_APP_AP_PATH)/gecko.ap_ $(_ABS_DIST) && \
>    ( cd $(STAGEPATH)$(MOZ_PKG_DIR)$(_BINPATH) && \
>      mkdir -p lib/$(ABI_DIR) && \
>      mv libmozutils.so $(MOZ_CHILD_PROCESS_NAME) lib/$(ABI_DIR) && \
> +    if test -n "$(MOZ_DEBUG)"; then mv libapitrace.so lib/$(ABI_DIR); fi && \

I think we want '@DLL_PREFIX@apitrace@DLL_SUFFIX@' here.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #39)
> joe/benwa/jrmuizel/benoit are you guys signed off on landing this directly
> in m-c?

I think landing this in m-c is the right thing to do. I think we should have a specific mozconfig option to build apitrace that is independent from DEBUG.
Attached patch refreshed part 11 (obsolete) — Splinter Review
I agree that it would be nice to decouple that from DEBUG. This could be very helpful in non-debug builds.

Regarding overlap with the existing MOZ_GL_DEBUG modes, here's how it maps:
 - APItrace deprecates the logging features of MOZ_GL_DEBUG_VERBOSE: it's just better.
 - The basic features of MOZ_GL_DEBUG are still useful, and orthogonal to APItrace: checking that the GL context corresponding to |this| is current, in particular, is outside of the scope of APItrace. Also, the feature of calling glFinish after every GL call, isn't found in APItrace AFAIK.
 - The feature of MOZ_GL_DEBUG_ABORT_ON_ERROR is still useful, unless APItrace has something similar (?).
Attachment #589268 - Attachment is obsolete: true
bjacob, joe says that we crash with the latest patches.  Do you have a backtrace or anything like that?  Is it a trivial crash like a startup crash or are special steps needed to repro it?
I did crash too with these patches, which is why I left things there. But at that time, I wasn't able to use GDB with B2G. Now that Joe is able to, he might be able to get a backtrace. Yes, IIRC it was a startup crash, trivial to reproduce.
I've thoroughly updated the apitrace setup for Fennec and come up with a new set of patches. I'd love for anyone interested to take a look and give me feedback.

I've forked upstream apitrace on github for now, and am working towards upstreaming my patches for android support. We now have to build apitrace outside of mozilla using their buildsystem, but I think this makes sense. See my patches here: https://github.com/gw280/apitrace/commits/android

Secondly, I have a single patch against mozilla-central (currently based off the Maple branch) to enable tracing. Basically this checks to see if egltrace.so exists in $GRE_HOME/lib/egltrace.so and loads it if it is (but only if the gfx.apitrace.enabled pref is turned on). See my patch at https://github.com/gw280/mozilla-central/commits/android/apitrace
Comment on attachment 601127 [details] [diff] [review]
Add support for loading apitrace explicitly on Android

Review of attachment 601127 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContextProviderEGL.cpp
@@ +656,5 @@
>  #endif
>  
>          if (!mEGLLibrary) {
> +#if defined(ANDROID)
> +            nsAdoptingCString logFile = Preferences::GetCString("gfx.apitrace.log_path");

I've never used nsAdoptingCString... hopefully Jeff knows more.

@@ +666,5 @@
> +            nsCAutoString logPath;
> +            logPath.Append(getenv("GRE_HOME"));
> +            logPath.Append('/');
> +            logPath.Append(logFile);
> +            logPath.Append(".trace");

You could replace all these Append() calls by a single AppendPrintf().

@@ +674,5 @@
> +
> +            nsCAutoString apitracePath;
> +            apitracePath.Append(getenv("GRE_HOME"));
> +            apitracePath.Append("/lib/");
> +            apitracePath.Append(APITRACE_LIB);

Same here. In addition, it's not clear to me why you would have to construct a lib path here: shouldn't the system be able to find libraries on its own?

@@ +1061,5 @@
> +        // libGLESv2.so and libEGL.so
> +        nsCAutoString apitracePath;
> +        apitracePath.Append(getenv("GRE_HOME"));
> +        apitracePath.Append("/lib/");
> +        apitracePath.Append(APITRACE_LIB);

Same comments here.

@@ +1063,5 @@
> +        apitracePath.Append(getenv("GRE_HOME"));
> +        apitracePath.Append("/lib/");
> +        apitracePath.Append(APITRACE_LIB);
> +
> +        printf_stderr("Attempting load of %s\n", apitracePath);

I would do apitracePath.get(). Relying on an implicit conversion here scares me a bit.
Attachment #601127 - Flags: review?(bjacob) → review-
Ah, a quick search through the nsCString namespace didn't turf up AppendPrintf so I just did what was in one of the LayerOGL source files.. I'll sort that out.

Lib path - I just took that path straight out of Kan-Ru's patch, but I will double check our library search paths to see if they're needed. 

Implicit conversion - typo :) Missed the .get on that one.

One more thing - do you think the cost of a failed dlopen is high enough to warrant conditionalising this code? We can either:

a) Pref it
b) Build it only in debug builds
c) Do a quick check to see if the file $GRE_HOME/lib/egltrace.so exists, then attempt to dlopen it only if it does
d) Add a configure option
e) Do what we're doing in this current iteration of the patch which is to attempt to dlopen it, then fallback to dlopen(libEGL.so) if egltrace fails.
Comment on attachment 601381 [details] [diff] [review]
Bug 674753 - Add support for loading apitrace explicitly on Android.

Review of attachment 601381 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContextProviderEGL.cpp
@@ +658,5 @@
>  
>          if (!mEGLLibrary) {
> +#if defined(ANDROID)
> +            nsAdoptingCString logFile = Preferences::GetCString("gfx.apitrace.log_path");
> +

.logfile

@@ +660,5 @@
> +#if defined(ANDROID)
> +            nsAdoptingCString logFile = Preferences::GetCString("gfx.apitrace.log_path");
> +
> +            if (logFile.IsEmpty()) {
> +                logFile.Append("firefox");

assignment instead of append

@@ +669,5 @@
> +            nsCAutoString logPath;
> +            logPath.AppendPrintf("%s/%s.trace", getenv("GRE_HOME"), logFile.get());
> +
> +            printf_stderr("Logging GL tracing output to %s", logPath.get());
> +            setenv("TRACE_FILE", logPath.get(), false);

Probably worth noting that we do this to tell apitrace about the trace file

@@ +674,5 @@
> +
> +            printf_stderr("Attempting load of %s\n", APITRACE_LIB);
> +
> +            mEGLLibrary = PR_LoadLibrary(APITRACE_LIB);
> +

I would prefer this be in a separate function. But at minimum move the 'if()' out of the #endif

@@ +1058,5 @@
> +        printf_stderr("Attempting load of %s\n", APITRACE_LIB);
> +
> +        loaded = OpenLibrary(APITRACE_LIB);
> +
> +        if (!loaded)

Same here.

Add your unhappiness as comments.

@@ +1074,2 @@
>              }
>  #endif

If you can write this function so it's not weird that would be nice too.
Attachment #601381 - Flags: review?(jmuizelaar) → review+
Comment on attachment 601381 [details] [diff] [review]
Bug 674753 - Add support for loading apitrace explicitly on Android.

Review of attachment 601381 [details] [diff] [review]:
-----------------------------------------------------------------

Nothing more to add on top of Jeff's comments.
Attachment #601381 - Flags: review?(bjacob) → review+
Comment on attachment 601655 [details] [diff] [review]
Final patch incorporating Jeff's suggestions.

Review of attachment 601655 [details] [diff] [review]:
-----------------------------------------------------------------

I can't see anything wrong here.
Attachment #601655 - Flags: review?(bjacob) → review+
Landed George's patch on Maple:
http://hg.mozilla.org/projects/maple/

Leaving open for the other patches and mozilla-central.
Can we land George' patch on m-c to reduce the maple/mc interdiff?
Isn't that precisely going to introduce a merge conflict?
No, AFAIK hg will recognize this and handle it.
Oops, forgot to ensure that the handle to the apitrace library is static. We never hit the fast return path in LoadApitraceLibrary().
Attachment #602791 - Flags: superreview?(jmuizelaar)
Attachment #602791 - Flags: review?(bjacob)
Comment on attachment 602791 [details] [diff] [review]
fix static in LoadApitraceLibrary

Review of attachment 602791 [details] [diff] [review]:
-----------------------------------------------------------------

I think one review will do for this patch :)
Attachment #602791 - Flags: superreview?(jmuizelaar)
Attachment #602791 - Flags: review?(bjacob)
Attachment #602791 - Flags: review+
Attachment #601655 - Flags: review?(jmuizelaar) → review+
Depends on: 733951
Attachment #601127 - Flags: review?(jmuizelaar)
Whiteboard: keep open! → keep open! [gfx.relnote.13]
Assignee: kchen → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.