Closed Bug 1329888 Opened 8 years ago Closed 8 years ago

Gecko profiler mistakenly shows _platform_bzero$VARIANT$Haswell being called from arena_dalloc (instead of arena_malloc)

Tracking

()

Status:

RESOLVED WONTFIX

Project Flags:

Performance Impact

low

People

(Reporter: ehsan.akhgari, Assigned: mstange)

References

Details

Attachments

(1 file)

screenshot of __platform_memset$VARIANT$Haswell disassembly 8 years ago Markus Stange [:mstange] 657.27 KB, image/png		Details

(no longer active)

Reporter

Description

•

8 years ago

When profiling bug 1269695, I noticed that we spend around 318ms in arena_dalloc coming from LifoAlloc::freeAll(), and it seems most of this time is going to memsetting the memory to 0 (presumably through jemalloc's opt_zero configuration). Is it possible to avoid this for LifoAlloc?

Flags: needinfo?(jdemooij)

Flags: needinfo?(emanuel.hoogeveen)

Mike Hommey [:glandium]

Comment 1

•

8 years ago

jemalloc doesn't fill memory with 0 on free. It fills with 0xe5.

Emanuel Hoogeveen [:ehoogeveen]

Comment 2

•

8 years ago

Right - as far as I can tell opt_free is never set, but we always set opt_poison. So if we're zeroing it isn't coming from jemalloc. If it's the poisoning that's taking this long.. well, it's a compile time flag, so it can only be enabled or disabled globally. I've never really looked at the LifoAlloc implementation so I don't know the constraints, but could we make freeAll() asynchronous?

Flags: needinfo?(emanuel.hoogeveen)

(no longer active)

Reporter

Comment 3

•

8 years ago

Hmm, the profile here <https://clptr.io/2jGzv4R> suggests that arena_dalloc is calling _platform_bzero$VARIANT$Haswell and as far as I know that's what memsetting memory to 0 will translate to on OSX with Haswell CPUs, so there is _something_ setting the memory to 0...

Mike Hommey [:glandium]

Comment 4

•

8 years ago

There is no apparent code path from arena_dalloc to a memset(ptr, 0, size), only 0xe5, at jemalloc.c:4638 and jemalloc.c:4731. Even if opt_zero was set, memset(ptr, 0, size) would happen on *malloc*, not free. I can see how madvise would be called, but I don't expect that to call _platform_bzero. So really, I have no idea where this would come from.

Mike Hommey [:glandium]

Comment 5

•

8 years ago

<ehsan> glandium: I'm really out of ideas :( <glandium> ehsan: dtrace? <ehsan> glandium: yeah maybe <glandium> ehsan: or simply a debugger with a breakpoint in _platform_bzero <ehsan> glandium: can you ni? me on the bug for that please?

Flags: needinfo?(ehsan)

(no longer active)

Reporter

Comment 6

•

8 years ago

Examining this under the debugger, it is *arena_malloc* not arena_dalloc that is calling _platform_bzero$VARIANT$Haswell! So this is either us incorrectly walking the stack or incorrectly symbolicating. Markus, do you mind taking a look please?

Component: JavaScript Engine → Gecko Profiler

Flags: needinfo?(ehsan)

Summary: Consider opting out of jemalloc zeroing in arena_dalloc → Gecko profiler mistakenly shows _platform_bzero$VARIANT$Haswell being called from arena_dalloc (instead of arena_malloc)

(no longer active)

Reporter

Updated

•

8 years ago

Flags: needinfo?(jdemooij) → needinfo?(mstange)

Mike Hommey [:glandium]

Comment 7

•

8 years ago

Why would that be attributed to lifoAlloc::freeAll then?

Mike Hommey [:glandium]

Comment 8

•

8 years ago

(other than that, memset(..., 0, ...) from arena_malloc means calloc is being called, and that it's the expected behavior)

Jean Gong :jgong

Updated

•

8 years ago

Whiteboard: [qf:p5]

Markus Stange [:mstange]

Assignee

Comment 9

•

8 years ago

I think this might be caused by https://github.com/devtools-html/Gecko-Profiler-Addon/issues/29 , i.e. we're using symbols from the wrong architecture (x86_64 instead of x86_64h). In bug 1329111 I'm going to add an arch field to the shared library information value that we expose to the profiler, so that it can pick the correct architecture.

Assignee: nobody → mstange

Status: NEW → ASSIGNED

Depends on: 1329111

Flags: needinfo?(mstange)

Chris Peterson [:cpeterson]

Updated

•

8 years ago

Comment 10

•

8 years ago

Markus are you still looking at this? It's still giving me grief in the "make jemalloc faster for stylo" bugs I'm looking into.

Flags: needinfo?(mstange)

Markus Stange [:mstange]

Assignee

Comment 11

•

8 years ago

This should be fixed. If it's still happening I'd like to look into it again. Do you have STR?

Flags: needinfo?(mstange)

Eric Rahm [:erahm]

Comment 12

•

8 years ago

(In reply to Markus Stange [:mstange] from comment #11) > This should be fixed. If it's still happening I'd like to look into it > again. Do you have STR? The STR are the same, record a profile, bzero is blamed on dalloc. For something more concrete: #1 ./mach run --diable-e10s #2 Start profiling #3 Load https://en.wikipedia.org/wiki/Barack_Obama #4 Stop pofiling #5 Invert the stack, note bzero shows up, note the step up is dalloc, note that's not possible

Markus Stange [:mstange]

Assignee

Comment 13

•

8 years ago

I can reproduce this. However, Instruments shows the same thing.

Markus Stange [:mstange]

Assignee

Comment 14

•

8 years ago

Attached image screenshot of __platform_memset$VARIANT$Haswell disassembly — Details

memset and bzero share a lot of code. memset has a different beginning but then jumps into the code that it shares with bzero. Frame addresses in the shared part are symbolicated as bzero because bzero is the closer symbol.

Markus Stange [:mstange]

Assignee

Comment 15

•

8 years ago

I don't think there's much we can or should do about this.

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Resolution: --- → WONTFIX

Markus Stange [:mstange]

Assignee

Comment 16

•

8 years ago

Oh, and memset is being called in order to fill the freed memory with 0xe5.

Eric Rahm [:erahm]

Comment 17

•

8 years ago

(In reply to Markus Stange [:mstange] from comment #14) > Created attachment 8864269 [details] > screenshot of __platform_memset$VARIANT$Haswell disassembly > > memset and bzero share a lot of code. memset has a different beginning but > then jumps into the code that it shares with bzero. Frame addresses in the > shared part are symbolicated as bzero because bzero is the closer symbol. Thanks Markus, that makes a lot more sense now.

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P3

Whiteboard: [qf:p5]

You need to log in before you can comment on or make changes to this bug.