657.27 KB, image/png
When profiling bug 1269695, I noticed that we spend around 318ms in arena_dalloc coming from LifoAlloc::freeAll(), and it seems most of this time is going to memsetting the memory to 0 (presumably through jemalloc's opt_zero configuration). Is it possible to avoid this for LifoAlloc?
jemalloc doesn't fill memory with 0 on free. It fills with 0xe5.
Right - as far as I can tell opt_free is never set, but we always set opt_poison. So if we're zeroing it isn't coming from jemalloc. If it's the poisoning that's taking this long.. well, it's a compile time flag, so it can only be enabled or disabled globally. I've never really looked at the LifoAlloc implementation so I don't know the constraints, but could we make freeAll() asynchronous?
Hmm, the profile here <https://clptr.io/2jGzv4R> suggests that arena_dalloc is calling _platform_bzero$VARIANT$Haswell and as far as I know that's what memsetting memory to 0 will translate to on OSX with Haswell CPUs, so there is _something_ setting the memory to 0...
There is no apparent code path from arena_dalloc to a memset(ptr, 0, size), only 0xe5, at jemalloc.c:4638 and jemalloc.c:4731. Even if opt_zero was set, memset(ptr, 0, size) would happen on *malloc*, not free. I can see how madvise would be called, but I don't expect that to call _platform_bzero. So really, I have no idea where this would come from.
<ehsan> glandium: I'm really out of ideas :( <glandium> ehsan: dtrace? <ehsan> glandium: yeah maybe <glandium> ehsan: or simply a debugger with a breakpoint in _platform_bzero <ehsan> glandium: can you ni? me on the bug for that please?
Examining this under the debugger, it is *arena_malloc* not arena_dalloc that is calling _platform_bzero$VARIANT$Haswell! So this is either us incorrectly walking the stack or incorrectly symbolicating. Markus, do you mind taking a look please?
Summary: Consider opting out of jemalloc zeroing in arena_dalloc → Gecko profiler mistakenly shows _platform_bzero$VARIANT$Haswell being called from arena_dalloc (instead of arena_malloc)
Why would that be attributed to lifoAlloc::freeAll then?
(other than that, memset(..., 0, ...) from arena_malloc means calloc is being called, and that it's the expected behavior)
I think this might be caused by https://github.com/devtools-html/Gecko-Profiler-Addon/issues/29 , i.e. we're using symbols from the wrong architecture (x86_64 instead of x86_64h). In bug 1329111 I'm going to add an arch field to the shared library information value that we expose to the profiler, so that it can pick the correct architecture.
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Depends on: 1329111
Markus are you still looking at this? It's still giving me grief in the "make jemalloc faster for stylo" bugs I'm looking into.
This should be fixed. If it's still happening I'd like to look into it again. Do you have STR?
(In reply to Markus Stange [:mstange] from comment #11) > This should be fixed. If it's still happening I'd like to look into it > again. Do you have STR? The STR are the same, record a profile, bzero is blamed on dalloc. For something more concrete: #1 ./mach run --diable-e10s #2 Start profiling #3 Load https://en.wikipedia.org/wiki/Barack_Obama #4 Stop pofiling #5 Invert the stack, note bzero shows up, note the step up is dalloc, note that's not possible
I can reproduce this. However, Instruments shows the same thing.
Created attachment 8864269 [details] screenshot of __platform_memset$VARIANT$Haswell disassembly memset and bzero share a lot of code. memset has a different beginning but then jumps into the code that it shares with bzero. Frame addresses in the shared part are symbolicated as bzero because bzero is the closer symbol.
I don't think there's much we can or should do about this.
Status: ASSIGNED → RESOLVED
Last Resolved: 11 months ago
Resolution: --- → WONTFIX
Oh, and memset is being called in order to fill the freed memory with 0xe5.
(In reply to Markus Stange [:mstange] from comment #14) > Created attachment 8864269 [details] > screenshot of __platform_memset$VARIANT$Haswell disassembly > > memset and bzero share a lot of code. memset has a different beginning but > then jumps into the code that it shares with bzero. Frame addresses in the > shared part are symbolicated as bzero because bzero is the closer symbol. Thanks Markus, that makes a lot more sense now.
You need to log in before you can comment on or make changes to this bug.