Closed
Bug 737967
Opened 13 years ago
Closed 10 years ago
Add GPU Profiling
Categories
(Core :: Gecko Profiler, defect)
Tracking
()
RESOLVED
FIXED
mozilla36
People
(Reporter: BenWa, Assigned: BenWa)
References
Details
Attachments
(1 file, 10 obsolete files)
23.84 KB,
patch
|
BenWa
:
review+
|
Details | Diff | Splinter Review |
I'd like to add GPU Profiling to the build in profiler.
Here's how it would work. We insert fences in the command queue after each non trivial commands and keep a list of active fences. When sampling we also dump the state of the fences that are remaining. This way we can keep track of the GPU command queue in real time.
This assumes that GPU fences have low overhead.
This could be very useful for profiling compositing and webgl. I'd like to work on this once we support profiling secondary threads since we can have GLContext off the main thread.
Comment 1•13 years ago
|
||
FWIW, this is also the best way to track GPU load for throttling, which we may need to do down the road.
Be warned, though, that we should be very careful about thrashing MakeCurrent when dealing with fences. If possible, only fence and query on the same context.
Assignee | ||
Comment 2•13 years ago
|
||
MakeCurrent is a bad problem, make render this proposal invalid.
Comment 3•13 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #2)
> MakeCurrent is a bad problem, make render this proposal invalid.
If we can fence/poll from the same context, we should be fine, and it should still give useful data, yes?
Assignee | ||
Comment 4•13 years ago
|
||
The problem is you need to MakeCurrent the context from the profiling thread to check the fence while the main thread is paused. You target context can be current on the main thread and so I have no idea what happens when a context is MakeCurrent on the main thread and you MakeCurrent from another thread to check the fence. I'm assuming that's not supported.
Comment 5•13 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #4)
> The problem is you need to MakeCurrent the context from the profiling thread
> to check the fence while the main thread is paused. You target context can
> be current on the main thread and so I have no idea what happens when a
> context is MakeCurrent on the main thread and you MakeCurrent from another
> thread to check the fence. I'm assuming that's not supported.
I think you will receive the context in the general case, but the main thread will lose it and not know it. Could you store the current context when you pause, then restore whatever context that was when you're done?
Comment 6•13 years ago
|
||
Wait, why is this not simply using an additional profiling context? Sync objects (fences) are shared across contexts, so they should be fine to query.
Assignee | ||
Comment 7•11 years ago
|
||
I'm ready to look into building this into the profiler. I wont be doing this from the profiler thread but rather using (disjoint) timer queries and pushing the data to profiler markers when after more then one frame to avoid pipeline stalls.
Assignee | ||
Comment 8•11 years ago
|
||
Assignee: nobody → bgirard
Status: NEW → ASSIGNED
Assignee | ||
Comment 9•11 years ago
|
||
Attachment #8427882 -
Attachment is obsolete: true
Assignee | ||
Comment 10•11 years ago
|
||
Attachment #8428211 -
Attachment is obsolete: true
Assignee | ||
Comment 11•11 years ago
|
||
Alright I'm collecting the data but its slowing down the phone too much to really provide profiling data (data not displayed):
http://people.mozilla.org/~bgirard/cleopatra/#report=3ac2d8208b1481c21a5bce8fb3c83cca94a37d98
Right now I'm adding two query object per draw call. Allocating, setting and checking if they're available is cheap. However getting the results back is expensive. It takes about 0.3ms to check two query object.
This means that this is unworkable at the individual draw call levels. We can add this at the frame level but for the compositor we know when frames are executing. It might be useful a bit for webgl.
Thoughts?
Flags: needinfo?(jmuizelaar)
Comment 12•11 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #11)
> Alright I'm collecting the data but its slowing down the phone too much to
> really provide profiling data (data not displayed):
>
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=3ac2d8208b1481c21a5bce8fb3c83cca94a37d98
>
> Right now I'm adding two query object per draw call. Allocating, setting and
> checking if they're available is cheap. However getting the results back is
> expensive. It takes about 0.3ms to check two query object.
It looks like getting the results is fopening and fgets some file. Can you find out what file this is? It might give us some idea why it's taking so long to get results. You should be able to get the file name by setting a break point on core_glGetQueryObjectui64vEXT and then adding a breakpoint to fopen.
Flags: needinfo?(jmuizelaar)
Assignee | ||
Comment 13•11 years ago
|
||
Alright we just looked and it's reading '/sys/class/kgsl/kgsl-3d0/gpuclk'. On my device it's 300Mhz most of the time.
Diego is it possible this is an oversight? This cost of this call is so high it's makes the extension unless for profiling which is it's primary usage.
Assignee | ||
Comment 14•11 years ago
|
||
Attachment #8428214 -
Attachment is obsolete: true
Comment 15•11 years ago
|
||
Comment on attachment 8428840 [details] [diff] [review]
WIP - Getting numbers
Review of attachment 8428840 [details] [diff] [review]:
-----------------------------------------------------------------
Hi James,
I think this profiling patch is only supposed to query GL for timers but for some reason it's also reading clock speed. Is the query in the patch reading more than it needs to?
Attachment #8428840 -
Flags: feedback?(jhicks)
Assignee | ||
Comment 16•11 years ago
|
||
The slow calls are:
mSymbols.fGetQueryObjectui64v(mTimerQueries[i].queryGpuTimeStart, LOCAL_GL_QUERY_RESULT, &gpuTimeStart);
mSymbols.fGetQueryObjectui64v(mTimerQueries[i].queryGpuTimeEnd, LOCAL_GL_QUERY_RESULT, &gpuTimeEnd);
Assignee | ||
Comment 17•11 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 18•10 years ago
|
||
Here's a raw GPU profile. I need to build out the UI for this:
http://people.mozilla.org/~bgirard/cleopatra/#report=6b2561c4840fc0a8d57f62116cba59ca474a603e
Assignee | ||
Updated•10 years ago
|
Attachment #8428840 -
Flags: feedback?(jhicks)
Assignee | ||
Comment 19•10 years ago
|
||
This gives us numbers for how long each framebuffer takes to resolve in the compositor. For WebGL it just shows us how long it takes for each frame.
Attachment #8428840 -
Attachment is obsolete: true
Attachment #8508231 -
Flags: review?(jgilbert)
Comment 20•10 years ago
|
||
Comment on attachment 8508231 [details] [diff] [review]
patch
Review of attachment 8508231 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/canvas/WebGLContext.cpp
@@ +1444,5 @@
>
> mShouldPresent = false;
>
> + mContextStateTracker.EndOGLSection(gl, "Frame");
> + mContextStateTracker.StartOGLSection(gl, "Frame");
This will not always happen, so we can't use a pure stack for this.
::: gfx/thebes/ContextStateTracker.cpp
@@ +25,5 @@
> + if (mSectionStack.Length() > 0) {
> + // We need to end the query since we're starting a new section and restore it
> + // when this section is finished.
> + aCtxt->fEndQuery(LOCAL_GL_TIME_ELAPSED);
> + mSectionStack[mSectionStack.Length() - 1].mCpuTimeEnd = TimeStamp::Now();
ContextState& Top() const {
MOZ_ASSERT(mSectionStack.Length());
return mSectionStack[mSectionStack.Length() - 1];
}
Top().mCpuTimeEnd = TimeStamp::Now()
@@ +43,5 @@
> +
> +void
> +ContextStateTrackerOGL::EndOGLSection(GLContext* aCtxt, const char* aSectionName)
> +{
> + if (mSectionStack.Length() == 0) {
This would normally be an assert. Comment here about why it's not. (It sounds like we can start profiling mid-frame)
@@ +48,5 @@
> + return;
> + }
> +
> + int i = mSectionStack.Length() - 1;
> + if (strcmp(mSectionStack[i].mSectionName, aSectionName) == 0) {
Assert that this is true, otherwise a stack doesn't make sense.
@@ +89,5 @@
> + TimeStamp now = TimeStamp::Now();
> +
> + while (mCompletedSections.Length() != 0) {
> + // Let's give the driver 200ms to finish the rendering before we query it
> + // and stall the rendering pipeline.
This doesn't stall the rendering pipeline.
@@ +96,5 @@
> + }
> +
> + GLint gpuTime = 0;
> + GLuint handle = (GLuint)mCompletedSections[0].mStartQueryHandle;
> + aCtxt->fGetQueryObjectiv(handle, LOCAL_GL_QUERY_RESULT, &gpuTime);
You have to query that it's completed before you can query the result.
::: gfx/thebes/ContextStateTracker.h
@@ +16,5 @@
> +}
> +}
> +
> +/**
> + * This class tracks the state of the context for debigging and profiling.
de-what? :)
@@ +29,5 @@
> +
> + bool IsProfiling() { return true; }
> +
> +protected:
> + typedef unsigned int TimerQueryHandle;
Use sized types.
Attachment #8508231 -
Flags: review?(jgilbert) → review-
Assignee | ||
Comment 21•10 years ago
|
||
Attachment #8508231 -
Attachment is obsolete: true
Attachment #8508325 -
Flags: review?(jgilbert)
Comment 22•10 years ago
|
||
Comment on attachment 8508325 [details] [diff] [review]
patch v2
Review of attachment 8508325 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/opengl/CompositorOGL.cpp
@@ +667,5 @@
> = static_cast<CompositingRenderTargetOGL*>(aSurface);
> if (mCurrentRenderTarget != surface) {
> mCurrentRenderTarget = surface;
> + mContextStateTracker.PopOGLSection(gl(), "FB");
> + mContextStateTracker.PushOGLSection(gl(), "FB");
Why is everything named 'FB'?
::: gfx/thebes/ContextStateTracker.cpp
@@ +8,5 @@
> +#include "ProfilerMarkers.h"
> +
> +using mozilla::TimeDuration;
> +using mozilla::TimeStamp;
> +using namespace mozilla::gl;
Avoid `using`.
@@ +11,5 @@
> +using mozilla::TimeStamp;
> +using namespace mozilla::gl;
> +
> +void
> +ContextStateTrackerOGL::PushOGLSection(GLContext* aCtxt, const char* aSectionName)
s/aCtxt/aGL/, here and elsewhere.
@@ +57,5 @@
> + mCompletedSections.AppendElement(mSectionStack[i]);
> + mSectionStack.RemoveElementAt(i);
> +
> + if (i - 1 >= 0) {
> + const char* sectionToRestore = mSectionStack[i - 1].mSectionName;
Top() would read way better than this. It's worth hiding the details of this.
@@ +93,5 @@
> + if (mCompletedSections[0].mCpuTimeEnd + TimeDuration::FromMilliseconds(200) > now) {
> + break;
> + }
> +
> + GLuint handle = (GLuint)mCompletedSections[0].mStartQueryHandle;
Pick either c-cast or reinterpret_cast for this file.
@@ +105,5 @@
> + break;
> + }
> +
> + GLint gpuTime = 0;
> + aCtxt->fGetQueryObjectiv(handle, LOCAL_GL_QUERY_RESULT, &gpuTime);
gpuTime is probably not worth recording, since it will only be the time since we last called BeginQuery, not the time since we first called BeginQuery for this section.
::: gfx/thebes/ContextStateTracker.h
@@ +21,5 @@
> + * This class tracks the state of the context for debugging and profiling.
> + * Each section pushes a new stack entry and must be matched by an end section.
> + * All nested section must be ended before ending a parent section.
> + */
> +class ContextStateTracker {
Why is this not even in the mozilla:: namespace?
Attachment #8508325 -
Flags: review?(jgilbert) → review+
Assignee | ||
Comment 23•10 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #22)
> gpuTime is probably not worth recording, since it will only be the time
> since we last called BeginQuery, not the time since we first called
> BeginQuery for this section.
>
I think you're confusing this with fEndQuery? In case of a nested section we end the query, process the nested section and then set a new fBegin/EndQuery.
Assignee | ||
Comment 24•10 years ago
|
||
Attachment #8508325 -
Attachment is obsolete: true
Attachment #8508455 -
Flags: review+
Assignee | ||
Comment 25•10 years ago
|
||
Assignee | ||
Comment 26•10 years ago
|
||
Attachment #8508737 -
Flags: review?(ehsan.akhgari)
Comment 27•10 years ago
|
||
Comment on attachment 8508737 [details] [diff] [review]
Part 0: Add GPU feature to the profiler
Review of attachment 8508737 [details] [diff] [review]:
-----------------------------------------------------------------
::: tools/profiler/ProfilerMarkers.cpp
@@ +87,5 @@
> +GPUMarkerPayload::GPUMarkerPayload(
> + const mozilla::TimeStamp& aCpuTimeStart,
> + const mozilla::TimeStamp& aCpuTimeEnd,
> + uint64_t aGpuTimeStart,
> + uint64_t aGpuTimeEnd)
Weird indentation!
::: tools/profiler/ProfilerMarkers.h
@@ +51,5 @@
> void StreamPayload(JSStreamWriter& b) {
> return streamPayload(b);
> }
>
> + mozilla::TimeStamp& GetStartTime() { return mStartTime; }
Nit: just return the TimeStamp as a copy, no need to return a reference. Also, make this const, please.
@@ +198,5 @@
> + ~GPUMarkerPayload() {}
> +
> +protected:
> + virtual void
> + streamPayload(JSStreamWriter& b) { return streamPayloadImp(b); }
MOZ_OVERRIDE, please.
::: tools/profiler/platform.cpp
@@ +22,5 @@
> #include "mozilla/Services.h"
> #include "nsThreadUtils.h"
> #include "ProfilerMarkers.h"
> #include "nsXULAppAPI.h"
> +#include "GLContext.h"
Did you forget to remove this?
@@ +1049,5 @@
> + stack->addMarker(aMarker, payload.forget(), static_cast<float>(delta.ToMilliseconds()));
> + } else {
> + mozilla::TimeDuration delta = mozilla::TimeStamp::Now() - sStartTime;
> + stack->addMarker(aMarker, payload.forget(), static_cast<float>(delta.ToMilliseconds()));
> + }
I'd rewrite this as something to the effect of:
TimeStamp origin = (aPayload && !...) ?
aPayload->GetStartTime() : TimeStamp::Now();
TimeDuration delta = origin - sStartTime;
stack->addMarker(...);
Attachment #8508737 -
Flags: review?(ehsan.akhgari) → review+
Assignee | ||
Comment 28•10 years ago
|
||
Attachment #8508737 -
Attachment is obsolete: true
Attachment #8508906 -
Flags: review+
Assignee | ||
Comment 29•10 years ago
|
||
Use fGetQueryObjectuiv instead of fGetQueryObjectiv because it's not on b2g.
Attachment #8508455 -
Attachment is obsolete: true
Attachment #8509589 -
Flags: review+
Assignee | ||
Comment 30•10 years ago
|
||
This patch separation isn't particularly useful so I'm just going to land it as one unit.
Attachment #8508906 -
Attachment is obsolete: true
Attachment #8509589 -
Attachment is obsolete: true
Attachment #8509591 -
Flags: review+
Assignee | ||
Comment 31•10 years ago
|
||
Comment 32•10 years ago
|
||
Assignee | ||
Comment 33•10 years ago
|
||
Comment 34•10 years ago
|
||
Also non-unified bustage it appears:
https://tbpl.mozilla.org/php/getParsedLog.php?id=50798074&tree=Mozilla-Inbound
Assignee | ||
Comment 35•10 years ago
|
||
Sorry for the bustage. Same patch that ran on try this time:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c3348fa64f8b
Comment 36•10 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in
before you can comment on or make changes to this bug.
Description
•