Closed Bug 737967 Opened 12 years ago Closed 10 years ago

Add GPU Profiling

Categories

(Core :: Gecko Profiler, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: BenWa, Assigned: BenWa)

References

Details

Attachments

(1 file, 10 obsolete files)

I'd like to add GPU Profiling to the build in profiler.

Here's how it would work. We insert fences in the command queue after each non trivial commands and keep a list of active fences. When sampling we also dump the state of the fences that are remaining. This way we can keep track of the GPU command queue in real time.

This assumes that GPU fences have low overhead.

This could be very useful for profiling compositing and webgl. I'd like to work on this once we support profiling secondary threads since we can have GLContext off the main thread.
FWIW, this is also the best way to track GPU load for throttling, which we may need to do down the road.

Be warned, though, that we should be very careful about thrashing MakeCurrent when dealing with fences. If possible, only fence and query on the same context.
MakeCurrent is a bad problem, make render this proposal invalid.
(In reply to Benoit Girard (:BenWa) from comment #2)
> MakeCurrent is a bad problem, make render this proposal invalid.

If we can fence/poll from the same context, we should be fine, and it should still give useful data, yes?
The problem is you need to MakeCurrent the context from the profiling thread to check the fence while the main thread is paused. You target context can be current on the main thread and so I have no idea what happens when a context is MakeCurrent on the main thread and you MakeCurrent from another thread to check the fence. I'm assuming that's not supported.
(In reply to Benoit Girard (:BenWa) from comment #4)
> The problem is you need to MakeCurrent the context from the profiling thread
> to check the fence while the main thread is paused. You target context can
> be current on the main thread and so I have no idea what happens when a
> context is MakeCurrent on the main thread and you MakeCurrent from another
> thread to check the fence. I'm assuming that's not supported.

I think you will receive the context in the general case, but the main thread will lose it and not know it. Could you store the current context when you pause, then restore whatever context that was when you're done?
Wait, why is this not simply using an additional profiling context? Sync objects (fences) are shared across contexts, so they should be fine to query.
I'm ready to look into building this into the profiler. I wont be doing this from the profiler thread but rather using (disjoint) timer queries and pushing the data to profiler markers when after more then one frame to avoid pipeline stalls.
Depends on: 1011356
Attached patch WIP (obsolete) — Splinter Review
Assignee: nobody → bgirard
Status: NEW → ASSIGNED
Attached patch WIP - Getting numbers (obsolete) — Splinter Review
Attachment #8427882 - Attachment is obsolete: true
Attached patch WIP - Getting numbers (obsolete) — Splinter Review
Attachment #8428211 - Attachment is obsolete: true
Alright I'm collecting the data but its slowing down the phone too much to really provide profiling data (data not displayed):

http://people.mozilla.org/~bgirard/cleopatra/#report=3ac2d8208b1481c21a5bce8fb3c83cca94a37d98

Right now I'm adding two query object per draw call. Allocating, setting and checking if they're available is cheap. However getting the results back is expensive. It takes about 0.3ms to check two query object.

This means that this is unworkable at the individual draw call levels. We can add this at the frame level but for the compositor we know when frames are executing. It might be useful a bit for webgl.

Thoughts?
Flags: needinfo?(jmuizelaar)
(In reply to Benoit Girard (:BenWa) from comment #11)
> Alright I'm collecting the data but its slowing down the phone too much to
> really provide profiling data (data not displayed):
> 
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=3ac2d8208b1481c21a5bce8fb3c83cca94a37d98
> 
> Right now I'm adding two query object per draw call. Allocating, setting and
> checking if they're available is cheap. However getting the results back is
> expensive. It takes about 0.3ms to check two query object.

It looks like getting the results is fopening and fgets some file. Can you find out what file this is? It might give us some idea why it's taking so long to get results. You should be able to get the file name by setting a break point on core_glGetQueryObjectui64vEXT and then adding a breakpoint to fopen.
Flags: needinfo?(jmuizelaar)
Alright we just looked and it's reading '/sys/class/kgsl/kgsl-3d0/gpuclk'. On my device it's 300Mhz most of the time.

Diego is it possible this is an oversight? This cost of this call is so high it's makes the extension unless for profiling which is it's primary usage.
Attached patch WIP - Getting numbers (obsolete) — Splinter Review
Attachment #8428214 - Attachment is obsolete: true
Comment on attachment 8428840 [details] [diff] [review]
WIP - Getting numbers

Review of attachment 8428840 [details] [diff] [review]:
-----------------------------------------------------------------

Hi James,

I think this profiling patch is only supposed to query GL for timers but for some reason it's also reading clock speed. Is the query in the patch reading more than it needs to?
Attachment #8428840 - Flags: feedback?(jhicks)
The slow calls are:
mSymbols.fGetQueryObjectui64v(mTimerQueries[i].queryGpuTimeStart, LOCAL_GL_QUERY_RESULT, &gpuTimeStart);
mSymbols.fGetQueryObjectui64v(mTimerQueries[i].queryGpuTimeEnd, LOCAL_GL_QUERY_RESULT, &gpuTimeEnd);
Depends on: 974832
No longer depends on: 1011356
Here's a raw GPU profile. I need to build out the UI for this:
http://people.mozilla.org/~bgirard/cleopatra/#report=6b2561c4840fc0a8d57f62116cba59ca474a603e
Attachment #8428840 - Flags: feedback?(jhicks)
Attached patch patch (obsolete) — Splinter Review
This gives us numbers for how long each framebuffer takes to resolve in the compositor. For WebGL it just shows us how long it takes for each frame.
Attachment #8428840 - Attachment is obsolete: true
Attachment #8508231 - Flags: review?(jgilbert)
Comment on attachment 8508231 [details] [diff] [review]
patch

Review of attachment 8508231 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/canvas/WebGLContext.cpp
@@ +1444,5 @@
>  
>      mShouldPresent = false;
>  
> +    mContextStateTracker.EndOGLSection(gl, "Frame");
> +    mContextStateTracker.StartOGLSection(gl, "Frame");

This will not always happen, so we can't use a pure stack for this.

::: gfx/thebes/ContextStateTracker.cpp
@@ +25,5 @@
> +  if (mSectionStack.Length() > 0) {
> +    // We need to end the query since we're starting a new section and restore it
> +    // when this section is finished.
> +    aCtxt->fEndQuery(LOCAL_GL_TIME_ELAPSED);
> +    mSectionStack[mSectionStack.Length() - 1].mCpuTimeEnd = TimeStamp::Now();

ContextState& Top() const {
  MOZ_ASSERT(mSectionStack.Length());
  return mSectionStack[mSectionStack.Length() - 1];
}

Top().mCpuTimeEnd = TimeStamp::Now()

@@ +43,5 @@
> +
> +void
> +ContextStateTrackerOGL::EndOGLSection(GLContext* aCtxt, const char* aSectionName)
> +{
> +  if (mSectionStack.Length() == 0) {

This would normally be an assert. Comment here about why it's not. (It sounds like we can start profiling mid-frame)

@@ +48,5 @@
> +    return;
> +  }
> +
> +  int i = mSectionStack.Length() - 1;
> +  if (strcmp(mSectionStack[i].mSectionName, aSectionName) == 0) {

Assert that this is true, otherwise a stack doesn't make sense.

@@ +89,5 @@
> +  TimeStamp now = TimeStamp::Now();
> +
> +  while (mCompletedSections.Length() != 0) {
> +    // Let's give the driver 200ms to finish the rendering before we query it
> +    // and stall the rendering pipeline.

This doesn't stall the rendering pipeline.

@@ +96,5 @@
> +    }
> +
> +    GLint gpuTime = 0;
> +    GLuint handle = (GLuint)mCompletedSections[0].mStartQueryHandle;
> +    aCtxt->fGetQueryObjectiv(handle, LOCAL_GL_QUERY_RESULT, &gpuTime);

You have to query that it's completed before you can query the result.

::: gfx/thebes/ContextStateTracker.h
@@ +16,5 @@
> +}
> +}
> +
> +/**
> + * This class tracks the state of the context for debigging and profiling.

de-what? :)

@@ +29,5 @@
> +
> +  bool IsProfiling() { return true; }
> +
> +protected:
> +  typedef unsigned int TimerQueryHandle;

Use sized types.
Attachment #8508231 - Flags: review?(jgilbert) → review-
Attached patch patch v2 (obsolete) — Splinter Review
Attachment #8508231 - Attachment is obsolete: true
Attachment #8508325 - Flags: review?(jgilbert)
Comment on attachment 8508325 [details] [diff] [review]
patch v2

Review of attachment 8508325 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/CompositorOGL.cpp
@@ +667,5 @@
>      = static_cast<CompositingRenderTargetOGL*>(aSurface);
>    if (mCurrentRenderTarget != surface) {
>      mCurrentRenderTarget = surface;
> +    mContextStateTracker.PopOGLSection(gl(), "FB");
> +    mContextStateTracker.PushOGLSection(gl(), "FB");

Why is everything named 'FB'?

::: gfx/thebes/ContextStateTracker.cpp
@@ +8,5 @@
> +#include "ProfilerMarkers.h"
> +
> +using mozilla::TimeDuration;
> +using mozilla::TimeStamp;
> +using namespace mozilla::gl;

Avoid `using`.

@@ +11,5 @@
> +using mozilla::TimeStamp;
> +using namespace mozilla::gl;
> +
> +void
> +ContextStateTrackerOGL::PushOGLSection(GLContext* aCtxt, const char* aSectionName)

s/aCtxt/aGL/, here and elsewhere.

@@ +57,5 @@
> +  mCompletedSections.AppendElement(mSectionStack[i]);
> +  mSectionStack.RemoveElementAt(i);
> +
> +  if (i - 1 >= 0) {
> +    const char* sectionToRestore = mSectionStack[i - 1].mSectionName;

Top() would read way better than this. It's worth hiding the details of this.

@@ +93,5 @@
> +    if (mCompletedSections[0].mCpuTimeEnd + TimeDuration::FromMilliseconds(200) > now) {
> +      break;
> +    }
> +
> +    GLuint handle = (GLuint)mCompletedSections[0].mStartQueryHandle;

Pick either c-cast or reinterpret_cast for this file.

@@ +105,5 @@
> +      break;
> +    }
> +
> +    GLint gpuTime = 0;
> +    aCtxt->fGetQueryObjectiv(handle, LOCAL_GL_QUERY_RESULT, &gpuTime);

gpuTime is probably not worth recording, since it will only be the time since we last called BeginQuery, not the time since we first called BeginQuery for this section.

::: gfx/thebes/ContextStateTracker.h
@@ +21,5 @@
> + * This class tracks the state of the context for debugging and profiling.
> + * Each section pushes a new stack entry and must be matched by an end section.
> + * All nested section must be ended before ending a parent section.
> + */
> +class ContextStateTracker {

Why is this not even in the mozilla:: namespace?
Attachment #8508325 - Flags: review?(jgilbert) → review+
(In reply to Jeff Gilbert [:jgilbert] from comment #22)
> gpuTime is probably not worth recording, since it will only be the time
> since we last called BeginQuery, not the time since we first called
> BeginQuery for this section.
> 

I think you're confusing this with fEndQuery? In case of a nested section we end the query, process the nested section and then set a new fBegin/EndQuery.
Attached patch patch v3 (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=5acf31feca9a
Attachment #8508325 - Attachment is obsolete: true
Attachment #8508455 - Flags: review+
Attachment #8508737 - Flags: review?(ehsan.akhgari)
Comment on attachment 8508737 [details] [diff] [review]
Part 0: Add GPU feature to the profiler

Review of attachment 8508737 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/ProfilerMarkers.cpp
@@ +87,5 @@
> +GPUMarkerPayload::GPUMarkerPayload(
> +                   const mozilla::TimeStamp& aCpuTimeStart,
> +                   const mozilla::TimeStamp& aCpuTimeEnd,
> +                   uint64_t aGpuTimeStart,
> +                   uint64_t aGpuTimeEnd)

Weird indentation!

::: tools/profiler/ProfilerMarkers.h
@@ +51,5 @@
>    void StreamPayload(JSStreamWriter& b) {
>      return streamPayload(b);
>    }
>  
> +  mozilla::TimeStamp& GetStartTime() { return mStartTime; }

Nit: just return the TimeStamp as a copy, no need to return a reference.  Also, make this const, please.

@@ +198,5 @@
> +  ~GPUMarkerPayload() {}
> +
> +protected:
> +  virtual void
> +  streamPayload(JSStreamWriter& b) { return streamPayloadImp(b); }

MOZ_OVERRIDE, please.

::: tools/profiler/platform.cpp
@@ +22,5 @@
>  #include "mozilla/Services.h"
>  #include "nsThreadUtils.h"
>  #include "ProfilerMarkers.h"
>  #include "nsXULAppAPI.h"
> +#include "GLContext.h"

Did you forget to remove this?

@@ +1049,5 @@
> +    stack->addMarker(aMarker, payload.forget(), static_cast<float>(delta.ToMilliseconds()));
> +  } else {
> +    mozilla::TimeDuration delta = mozilla::TimeStamp::Now() - sStartTime;
> +    stack->addMarker(aMarker, payload.forget(), static_cast<float>(delta.ToMilliseconds()));
> +  }

I'd rewrite this as something to the effect of:

TimeStamp origin = (aPayload && !...) ?
                   aPayload->GetStartTime() : TimeStamp::Now();
TimeDuration delta = origin - sStartTime;
stack->addMarker(...);
Attachment #8508737 - Flags: review?(ehsan.akhgari) → review+
Attachment #8508737 - Attachment is obsolete: true
Attachment #8508906 - Flags: review+
Attached patch patch v4 (obsolete) — Splinter Review
Use fGetQueryObjectuiv instead of fGetQueryObjectiv because it's not on b2g.
Attachment #8508455 - Attachment is obsolete: true
Attachment #8509589 - Flags: review+
This patch separation isn't particularly useful so I'm just going to land it as one unit.
Attachment #8508906 - Attachment is obsolete: true
Attachment #8509589 - Attachment is obsolete: true
Attachment #8509591 - Flags: review+
Sorry for the bustage. Same patch that ran on try this time:

https://hg.mozilla.org/integration/mozilla-inbound/rev/c3348fa64f8b
Depends on: 1087848
https://hg.mozilla.org/mozilla-central/rev/c3348fa64f8b
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: