Closed Bug 1329888 Opened 8 years ago Closed 8 years ago

Gecko profiler mistakenly shows _platform_bzero$VARIANT$Haswell being called from arena_dalloc (instead of arena_malloc)

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Performance Impact low

People

(Reporter: ehsan.akhgari, Assigned: mstange)

References

Details

Attachments

(1 file)

When profiling bug 1269695, I noticed that we spend around 318ms in arena_dalloc coming from LifoAlloc::freeAll(), and it seems most of this time is going to memsetting the memory to 0 (presumably through jemalloc's opt_zero configuration). Is it possible to avoid this for LifoAlloc?
Flags: needinfo?(jdemooij)
Flags: needinfo?(emanuel.hoogeveen)
jemalloc doesn't fill memory with 0 on free. It fills with 0xe5.
Right - as far as I can tell opt_free is never set, but we always set opt_poison. So if we're zeroing it isn't coming from jemalloc. If it's the poisoning that's taking this long.. well, it's a compile time flag, so it can only be enabled or disabled globally. I've never really looked at the LifoAlloc implementation so I don't know the constraints, but could we make freeAll() asynchronous?
Flags: needinfo?(emanuel.hoogeveen)
Hmm, the profile here <https://clptr.io/2jGzv4R> suggests that arena_dalloc is calling _platform_bzero$VARIANT$Haswell and as far as I know that's what memsetting memory to 0 will translate to on OSX with Haswell CPUs, so there is _something_ setting the memory to 0...
There is no apparent code path from arena_dalloc to a memset(ptr, 0, size), only 0xe5, at jemalloc.c:4638 and jemalloc.c:4731. Even if opt_zero was set, memset(ptr, 0, size) would happen on *malloc*, not free. I can see how madvise would be called, but I don't expect that to call _platform_bzero. So really, I have no idea where this would come from.
<ehsan> glandium: I'm really out of ideas :( <glandium> ehsan: dtrace? <ehsan> glandium: yeah maybe <glandium> ehsan: or simply a debugger with a breakpoint in _platform_bzero <ehsan> glandium: can you ni? me on the bug for that please?
Flags: needinfo?(ehsan)
Examining this under the debugger, it is *arena_malloc* not arena_dalloc that is calling _platform_bzero$VARIANT$Haswell! So this is either us incorrectly walking the stack or incorrectly symbolicating. Markus, do you mind taking a look please?
Component: JavaScript Engine → Gecko Profiler
Flags: needinfo?(ehsan)
Summary: Consider opting out of jemalloc zeroing in arena_dalloc → Gecko profiler mistakenly shows _platform_bzero$VARIANT$Haswell being called from arena_dalloc (instead of arena_malloc)
Flags: needinfo?(jdemooij) → needinfo?(mstange)
Why would that be attributed to lifoAlloc::freeAll then?
(other than that, memset(..., 0, ...) from arena_malloc means calloc is being called, and that it's the expected behavior)
Whiteboard: [qf:p5]
I think this might be caused by https://github.com/devtools-html/Gecko-Profiler-Addon/issues/29 , i.e. we're using symbols from the wrong architecture (x86_64 instead of x86_64h). In bug 1329111 I'm going to add an arch field to the shared library information value that we expose to the profiler, so that it can pick the correct architecture.
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Depends on: 1329111
Flags: needinfo?(mstange)
See Also: → 1354215
Markus are you still looking at this? It's still giving me grief in the "make jemalloc faster for stylo" bugs I'm looking into.
Flags: needinfo?(mstange)
This should be fixed. If it's still happening I'd like to look into it again. Do you have STR?
Flags: needinfo?(mstange)
(In reply to Markus Stange [:mstange] from comment #11) > This should be fixed. If it's still happening I'd like to look into it > again. Do you have STR? The STR are the same, record a profile, bzero is blamed on dalloc. For something more concrete: #1 ./mach run --diable-e10s #2 Start profiling #3 Load https://en.wikipedia.org/wiki/Barack_Obama #4 Stop pofiling #5 Invert the stack, note bzero shows up, note the step up is dalloc, note that's not possible
I can reproduce this. However, Instruments shows the same thing.
memset and bzero share a lot of code. memset has a different beginning but then jumps into the code that it shares with bzero. Frame addresses in the shared part are symbolicated as bzero because bzero is the closer symbol.
I don't think there's much we can or should do about this.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Oh, and memset is being called in order to fill the freed memory with 0xe5.
(In reply to Markus Stange [:mstange] from comment #14) > Created attachment 8864269 [details] > screenshot of __platform_memset$VARIANT$Haswell disassembly > > memset and bzero share a lot of code. memset has a different beginning but > then jumps into the code that it shares with bzero. Frame addresses in the > shared part are symbolicated as bzero because bzero is the closer symbol. Thanks Markus, that makes a lot more sense now.
Performance Impact: --- → P3
Whiteboard: [qf:p5]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: