Removing too many leaf frames from allocation stacks on Windows?
Categories
(Core :: Gecko Profiler, defect, P2)
Tracking
()
People
(Reporter: mstange, Unassigned)
Details
Here's a profile with allocation stacks (warning, big and slow): https://share.firefox.dev/3jBQ8v2
It's from a local Windows build, from bug 1717917 comment 32.
In the inverted call tree, there are many entries where the stack ends at a DOM frame label, or another frame label, or a JS function.
This is surprising to me - if something allocated memory, wouldn't that happen with a malloc call from C++ code? So shouldn't there be a C++ function at the end of these stacks?
It is unclear what exactly is causing the allocations.
Interesting and puzzling!
As a quick test, I deactivated the leaf-frame-skipping in the stack walker (by making FrameSkipper::ShouldSkipPC always return false), and I didn't notice differences:
- opt, debug, with normal frame-skipping: https://share.firefox.dev/3xQwMaS
- opt, debug, no frame-skipping: https://share.firefox.dev/2W2dRga
- -O1, no-debug, no frame-skipping: https://share.firefox.dev/3ABVcXo
I tried -O0, but (re)symbolication timed out, so that didn't help.
Looking at the code, I would expect inverted call stacks to only start from replace_malloc and other such functions that intercept allocations and add markers for them, or maybe one of the sub-functions on the way to the stack walker.
On Windows the actual sequence of calls should be:
- function_that_calls_malloc
replace_mallocin memory_hooks.cppgMallocTable.malloccalling the "real" mallocAllocCallbackprofiler_add_native_allocation_markerprofiler_add_markerAddMarkerToBufferprofiler_capture_backtrace_intoRegisters::SyncPopulateRtlCaptureContextwin32 function retrieving the registers in the context of the caller
DoSyncSampleDoSharedSampleDoNativeBacktraceDoMozStackWalkBacktrace- StackWalkCallback, first records the leaf function as captured by
RtlCaptureContextabove MozStackWalkThreadnative stack-walker, called in a loop with resume points (mostly past JIT code)
- StackWalkCallback, first records the leaf function as captured by
One worry is that the context captured by RtlCaptureContext may be disconnected from the stack-walking happening after that. We already have bug 1714501 to look into this.
But note that it's nevertheless always stored as the leaf frame, so I would expect the most-immediate non-inline caller to be present.
I think the next step will be to try and debug this, e.g.: debug-break on the stack-walking function, and compare the captured stack with the debugger's view of the stack...
Other suggestions welcome.
Description
•