Open Bug 1538061 Opened 5 years ago Updated 2 years ago

Stack sampling sometimes overshoots the PROFILER_INIT block

Categories

(Core :: Gecko Profiler, enhancement, P3)

Unspecified
Windows
enhancement

Tracking

()

Tracking Status
firefox68 --- affected

People

(Reporter: mozbugz, Unassigned)

References

(Blocks 1 open bug)

Details

Seen on Windows, haven't checked others OSes.

In the Flame Graph, I would expect the first node above "(root)" to always be the same, as it's the first AUTO_PROFILER_LABEL created right after AUTO_PROFILER_INIT.
However, lots of stacks also include parent nodes all the way to RtlUserThreadStart.
E.g.: https://perfht.ml/2Ydq7qI

Glancing at the code, stack walkers seem to compare pointers to the stack pointer recorded during INIT, maybe that's not enough to guarantee proper walk termination?

In any case, MergeStacks() should be able to always cut stacks at the correct spot, when it hits the known initial label. (Though as this label is not currently guaranteed to be there and with the same name for different processes, we should instead force a guaranteed root label that's always added by INIT, assuming that's possible).

It would still be nice to fix the walkers as well if possible, to save a bit of overhead wasted in exploring and storing unwanted frames. This could be a follow-up bug if we only patch MergeStacks() first.

(In reply to Gerald Squelart [:gerald] from comment #0)

In the Flame Graph, I would expect the first node above "(root)" to always be the same, as it's the first AUTO_PROFILER_LABEL created right after AUTO_PROFILER_INIT.
However, lots of stacks also include parent nodes all the way to RtlUserThreadStart.
E.g.: https://perfht.ml/2Ydq7qI

My impression when looking at this profile is that the samples that don't contain RtlUserThreadStart are samples where we only have labels and JS frames, so maybe the stack walking failed completely in them?

(In reply to Florian Quèze [:florian] from comment #1)

My impression when looking at this profile is that the samples that don't contain RtlUserThreadStart are samples where we only have labels and JS frames, so maybe the stack walking failed completely in them?

Hmm, interesting, so we actually have two issues here:

  1. Stack walking can fail, so we only see labels and JS frames. Could this be bug 1434402?
  2. Stack walking overshoots the PROFILER_INIT call. (Or am I wrong in my assumption?) I would like to focus on this in this bug.

One thought:
If we had a strict "root label" that the profiler could assume was always there, it could be used to ensure that profiles always have the exact same common root, which would produce more consistent stacks (in cases where stack walking goes wrong).
What do you think?

(In reply to Gerald Squelart [:gerald] from comment #2)

  1. Stack walking can fail, so we only see labels and JS frames. Could this be bug 1434402?

Could it just be the impact of https://searchfox.org/mozilla-central/rev/6dab6dad9cc852011a14275a8b2c2c03ed7600a7/mozglue/misc/StackWalk.cpp#101-116 ?

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.