Closed Bug 1348402 Opened 7 years ago Closed 6 years ago

Make PROFILER_LABEL as cheap as possible

Categories

(Core :: Gecko Profiler, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox55 --- affected

People

(Reporter: mstange, Unassigned)

References

Details

We'd like to add PROFILER_LABELs to our DOM binding code in bug 785440. In order to do that, we need to find a solution that has extremely low overhead when the profiler is turned off, and as low overhead as possible when the profiler is turned on.

So either we need to add a variant called PROFILER_LABEL_HOT which only (cheaply) checks whether the profiler is enabled, and if not, does nothing.
Or we need to optimize PROFILER_LABEL so much that it becomes cheap enough to be enabled all the time.

The "is profiler enabled" check can access an Atomic<bool> which is very cheap.
Actually adding an entry to the PseudoStack requires a TLS lookup, which will be more expensive but I don't know by how much.

The other thing that worries me about our PseudoStack implementation is the fact that a bunch of parameters (label string pointer, category, line number) are passed to a bunch of different places until they end up in PseudoStack::push, which then copies them into a ProfileEntry and stores it in the mStack array.

It might be faster to make the PseudoStack something like a linked list, where the ProfileEntry is stored inside the SamplerStackFrameRAII stack object, and each SamplerStackFrameRAII also has a pointer to the previous SamplerStackFrameRAII object on the stack, and the PseudoStack itself has just a head pointer to the current leaf SamplerStackFrameRAII object, instead of its mStack array.

I might work on this after njn is done with his cleanups in bug 1347795.
> I might work on this after njn is done with his cleanups in bug 1347795.

I'm rethinking that work a bit, and it might take some time. So if you want to work on this now please go ahead.
> It might be faster to make the PseudoStack something like a linked list,

Bug 1366650 is a step towards this, because it makes both the profiler and SpiderMonkey go through the PseudoStack class to manipulate the pseudo-stack.
Depends on: 1366650
No longer depends on: 1347795
Depends on: 1347274
> So either we need to add a variant called PROFILER_LABEL_HOT which only
> (cheaply) checks whether the profiler is enabled, and if not, does nothing.

How would that work? Keeping the pseudostack balanced seems like a problem.

Here's a different idea. I instrumented the PROFILER_LABEL* macros and got the top 20 labels by frequency after starting up and browsing briefly:

> 433703 counts:
> (  1)    48008 (11.1%, 11.1%): LABEL: nsStyleSet::FileRules
> (  2)    41116 ( 9.5%, 20.5%): LABEL: ContainerState::ProcessDisplayItems
> (  3)    29410 ( 6.8%, 27.3%): LABEL: EventDispatcher::Dispatch
> (  4)    20851 ( 4.8%, 32.1%): LABEL: MessagePumpDefault::Run:Wait
> (  5)    20541 ( 4.7%, 36.9%): LABEL: FrameLayerBuilder::PaintItems
> (  6)    17368 ( 4.0%, 40.9%): LABEL: PCompositorBridge::Msg_DidComposite
> (  7)    13852 ( 3.2%, 44.1%): LABEL: ClientPaintedLayer::PaintThebes
> (  8)    13635 ( 3.1%, 47.2%): LABEL: nsStyleContext::CalcStyleDifferenceInternal
> (  9)    12611 ( 2.9%, 50.1%): LABEL: PaintedLayerComposite::RenderLayer
> ( 10)    10676 ( 2.5%, 52.6%): LABEL: ViewportFrame::BuildDisplayList
> ( 11)     8615 ( 2.0%, 54.6%): LABEL: nsDisplayText::Paint
> ( 12)     7559 ( 1.7%, 56.3%): LABEL: PVsync::Msg_Notify
> ( 13)     7554 ( 1.7%, 58.1%): LABEL: BasicLayerManager::PaintLayer
> ( 14)     6424 ( 1.5%, 59.5%): L_DYN: nsObserverService::NotifyObservers
> ( 15)     6141 ( 1.4%, 61.0%): LABEL: nsTimerImpl::Fire
> ( 16)     5677 ( 1.3%, 62.3%): LABEL: nsLayoutUtils::GetFramesForArea
> ( 17)     5677 ( 1.3%, 63.6%): LABEL: nsLayoutUtils::GetFrameForPoint
> ( 18)     4869 ( 1.1%, 64.7%): LABEL: nsRefreshDriver::Tick
> ( 19)     4507 ( 1.0%, 65.7%): L_DYN: PresShell::DoFlushPendingNotifications
> ( 20)     4108 ( 0.9%, 66.7%): LABEL: nsCSSRendering::PaintStyleImageLayer

For 8 of the top 10 functions the scope of the label is the entire function body. Is a label like that actually useful on platforms that have native stack unwinding? Doesn't the label just duplicate info you'd get from the native stack?

Come to think of it, I don't recall seeing PseudoStack entries in the profiler's output, but I'm not an expert on that part of things...
Flags: needinfo?(mstange)
(In reply to Nicholas Nethercote [:njn] from comment #3)
> > So either we need to add a variant called PROFILER_LABEL_HOT which only
> > (cheaply) checks whether the profiler is enabled, and if not, does nothing.
> 
> How would that work? Keeping the pseudostack balanced seems like a problem.

Just like we handle TLS not being initialized: In AutoProfilerLabel, set mPseudoStack to nullptr so that you don't call mPseudoStack->pop() in the destructor.

> For 8 of the top 10 functions the scope of the label is the entire function
> body. Is a label like that actually useful on platforms that have native
> stack unwinding? Doesn't the label just duplicate info you'd get from the
> native stack?

The pseudo stack labels serve a few purposes.
 - On platforms / build configurations without native stack walking, they give us some information about the stack.
   These platforms are currently Windows 32bit release builds (because of bug 1340721) and Mac release builds (because we don't enable frame pointers), and maybe Linux 32bit builds for which Lul fails because of lack of address space.
   So with bug 1340721 fixed, we'd only really make use of this data when diagnosing issues from Mac users with Firefox release builds.
 - On platforms / build configurations that do have native stack walking, sometimes stack walking fails in the middle. In those cases the pseudo frames give us a fallback for the rest of the stack.
 - They have category information. The category information is used in the devtools profiler if you check "Show Gecko Platform Data": https://screenshots.firefox.com/25IoZMBRv4z6sm8f/null
   And I'd also like to use the category information to draw pretty graphs in perf.html soon.

> Come to think of it, I don't recall seeing PseudoStack entries in the
> profiler's output, but I'm not an expert on that part of things...

Oh they do exist. E.g. here's a random profile I just took of my browser: http://bit.ly/2sGiGZD
The Startup::XRE_InitChildProcess frame in the call tree is a pseudo stack frame. So is the Events::ProcessGeckoEvents frame.
Flags: needinfo?(mstange)
Depends on: 1385998
Profiler labels are pretty cheap now, mostly thanks to bug 1385998. And profiler labels in places where we have a JSContext go down a specialized path that does not access TLS.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.