Closed Bug 1356701 Opened 3 years ago Closed 3 years ago

High amount of je_malloc_usable_size

Categories

(Core :: Memory Allocator, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: erahm, Assigned: glandium)

References

(Blocks 1 open bug)

Details

(Whiteboard: [Stylo])

Attachments

(3 files)

During profiling of stylo tests on mac we're seeing a fair amount of time spent in |je_malloc_usable_size|.

Initial investigation indicates osx's implementation of |free| and |realloc| use |je_malloc_usable_size| [1] to determine which zone the pointer is in, it then calls |zone_free_definite_size| [2] where we again use |malloc_usable_size_impl| to determine if the pointer is in our zone.

I propose removing the checks in our zone functions as osx is already doing this for use.

It's possible we could do a further optimization where we set |zone.size| to a function that just checks if the ptr is in one of our chunks, but doesn't lookup the actual size.
The 'zone_' prefixed allocation functions should only be called with the
proper zone. There's no need to double-check.
Assignee: nobody → erahm
Status: NEW → ASSIGNED
This adds a lighter weight method for checking if a pointer was allocated from
our zone. It will definitely break anything that expects |zone_size| to return
a legit value, but I don't think it's ever used that way.
Whiteboard: [Stylo]
Blocks: 1360881
Comment on attachment 8863562 [details] [diff] [review]
Part 1: Stop checking that the pointer is in our zone for zone_* malloc functions

This is a conservative patch that just removes the |malloc_usable_size| checks from the 'zone_' prefixed functions. I think we discussed maybe avoiding the check elsewhere, do you think we can be more aggressive?
Attachment #8863562 - Flags: review?(mh+mozilla)
Comment on attachment 8863564 [details] [diff] [review]
WIP Part 2: Add a lightweight zone_size impl

Review of attachment 8863564 [details] [diff] [review]:
-----------------------------------------------------------------

This was an attempt at adding a lighter-weight option to check if a ptr came from our allocator. It doesn't actually look up the size, but instead just returns 1 if the ptr is ours. Do you think this is reasonable or is it a bit too aggressive?
Attachment #8863564 - Flags: feedback?(mh+mozilla)
Comment on attachment 8863562 [details] [diff] [review]
Part 1: Stop checking that the pointer is in our zone for zone_* malloc functions

Review of attachment 8863562 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/build/zone.c
@@ -124,5 @@
>  static void *
>  zone_realloc(malloc_zone_t *zone, void *ptr, size_t size)
>  {
> -  if (malloc_usable_size_impl(ptr))
> -    return realloc_impl(ptr, size);

Is it worthwhile to keep malloc_usable_size(ptr) as a debug assertion?
Comment on attachment 8863562 [details] [diff] [review]
Part 1: Stop checking that the pointer is in our zone for zone_* malloc functions

Review of attachment 8863562 [details] [diff] [review]:
-----------------------------------------------------------------

I know I said on irc we might be able to get away with removing this, but see https://github.com/jemalloc/jemalloc/commit/5b3db098f73f467a03f87a2242c692268f796a56 . The behavior or CoreGraphics described is old and probably fixed, but it's still something that can happen in arbitrary code. In 10.12, malloc_default_zone() even always returns the same zone static zone, that (iirc) hard-codedly calls into the real default zone.
Attachment #8863562 - Flags: review?(mh+mozilla) → review-
Comment on attachment 8863564 [details] [diff] [review]
WIP Part 2: Add a lightweight zone_size impl

Review of attachment 8863564 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/build/zone.c
@@ +107,5 @@
>  zone_size(malloc_zone_t *zone, const void *ptr)
>  {
> +  // This hack returns 1 if the ptr is in one of our chunks, 0 otherwise.
> +  // It only works with mozjemalloc.
> +  return malloc_in_heap_impl(ptr);

zone_size is what's used under the hood for malloc_usable_size(ptr) on osx, so not returning the size here is going to break everything calling malloc_usable_size.
Attachment #8863564 - Flags: feedback?(mh+mozilla) → feedback-
We discussed this with Bobby this morning, and we're going to be more aggressive on this, by making gecko avoid going through the system malloc/free altogether, which will skip those je_malloc_usable_size calls.
Assignee: erahm → mh+mozilla
Flags: needinfo?(mh+mozilla)
Blocks: 1373430
Flags: needinfo?(mh+mozilla)
Comment on attachment 8883212 [details]
Bug 1356701 - Export unprefixed malloc and duplication functions on OSX.

https://reviewboard.mozilla.org/r/154152/#review159266

::: memory/build/mozmemory_wrap.h:56
(Diff revision 1)
>   *   prefixed malloc implementation and duplication functions are not
>   *   exported.
>   *
>   * - On MacOSX, the system libc has a zone allocator, which allows us to
>   *   hook custom malloc implementation functions without exporting them.
> - *   The malloc implementation functions are all prefixed with "je_" and used
> + *   However, since we want things in Firefox skipping the system zone

Nit: s/skipping/to skip/

::: memory/build/mozmemory_wrap.h:58
(Diff revision 1)
>   *
>   * - On MacOSX, the system libc has a zone allocator, which allows us to
>   *   hook custom malloc implementation functions without exporting them.
> - *   The malloc implementation functions are all prefixed with "je_" and used
> - *   this way from the custom zone allocator. They are not exported.
> - *   Duplication functions are not included, since they will call the custom
> + *   However, since we want things in Firefox skipping the system zone
> + *   allocator, the malloc implementation functions are all exported
> + *   unprefixed, as well as duplication functions.

It took me a while to work out what "duplication functions" are. Maybe just say "strdup and strndup"? Or maybe this reference can be removed, and they can be considered to fall under the umbrella of "malloc implementation functions"?
Attachment #8883212 - Flags: review?(n.nethercote) → review+
"duplication functions" are referenced several times in mozmemory_wrap.h, and are defined the first time they are mentioned: https://dxr.mozilla.org/mozilla-central/source/memory/build/mozmemory_wrap.h#30-33
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/261d61f4eeec
Export unprefixed malloc and duplication functions on OSX. r=njn
Comment on attachment 8883212 [details]
Bug 1356701 - Export unprefixed malloc and duplication functions on OSX.

https://reviewboard.mozilla.org/r/154152/#review159288

::: commit-message-e6a7a:14
(Diff revision 2)
> +The risk is that some things in Gecko try to realloc/free pointers it
> +got from system libraries, if those were allocated with a system zone
> +that is not jemalloc.

Note, for posterity: we have the same problem already on Windows, so this is not an entirely new risk, although the it comes from platform-specific code, so there might be some OSX specific problems that arise from this.
https://hg.mozilla.org/mozilla-central/rev/261d61f4eeec
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Depends on: 1378339
I think we should back this out from nightly - top crash for MacOS today.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #16)
> I think we should back this out from nightly - top crash for MacOS today.

That seems sensible.

That said, I measured, and this speeds up the stylo post-traversal by 17.5% on mac, which is very significant. So hopefully we can get the crash figured out (rr might help?) and then reland it.
Backed out for causing bug 1378339. Will respin nightlies shortly.

https://hg.mozilla.org/mozilla-central/rev/11755fd63168581e194258d04bb6a7337779ec78
Status: RESOLVED → REOPENED
Flags: needinfo?(mh+mozilla)
Resolution: FIXED → ---
Target Milestone: mozilla56 → ---
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #17)
> (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #16)
> > I think we should back this out from nightly - top crash for MacOS today.
> 
> That seems sensible.
> 
> That said, I measured, and this speeds up the stylo post-traversal by 17.5%
> on mac, which is very significant. So hopefully we can get the crash figured
> out (rr might help?) and then reland it.

Sadly, no rr on mac... it's also "disappointing" that CI was just fine with it.
Flags: needinfo?(mh+mozilla)
All the crashes were on 10.12, maybe it's related to the version, so can't be caught on automation...
Depends on: 1378332
Depends on: 1378592
Comment on attachment 8883212 [details]
Bug 1356701 - Export unprefixed malloc and duplication functions on OSX.

Additional zone.c changes + bug 1378592 make both bug 1378332 and bug 
1378339 go away according to local testing.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7e4b7914734ade194d51c223625d67f4e1c433fe
Attachment #8883212 - Flags: review+ → review?(n.nethercote)
Comment on attachment 8883212 [details]
Bug 1356701 - Export unprefixed malloc and duplication functions on OSX.

https://reviewboard.mozilla.org/r/154152/#review159774

rs=me
Attachment #8883212 - Flags: review?(n.nethercote) → review+
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/e84fd163bc20
Export unprefixed malloc and duplication functions on OSX. r=njn
Before the backout from mozilla-central, on July 5th, these improvements were noticed:

== Change summary for alert #7690 (as of July 04 2017 10:27 UTC) ==

Improvements:

 10%  TestStandardURL Perf osx-10-10 opt      53,688.42 -> 48,291.58
  4%  Stylo Gecko_nsCSSParser_ParseSheet_Bench osx-10-10 opt 73,531.67 -> 70,537.50

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7690
I confirm the backout with these expected regressions:

== Change summary for alert #7708 (as of July 05 2017 16:59 UTC) ==

Regressions:

 13%  TestStandardURL Perf osx-10-10 opt      48,840.62 -> 55,293.00
  3%  Stylo Gecko_nsCSSParser_ParseSheet_Bench osx-10-10 opt 70,545.61 -> 72,875.42

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7708
https://hg.mozilla.org/mozilla-central/rev/e84fd163bc20
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Your latest push brought in this improvement:

== Change summary for alert #7726 (as of July 06 2017 22:14 UTC) ==

Improvements:

  3%  tcanvasmark summary osx-cross opt e10s     6,100.08 -> 6,302.79

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7726
Another more relevant improvement:

== Change summary for alert #7725 (as of July 06 2017 22:14 UTC) ==

Improvements:

  4%  Stylo Gecko_nsCSSParser_ParseSheet_Bench osx-cross opt      73,141.33 -> 70,202.75

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7725
Depends on: 1396361
You need to log in before you can comment on or make changes to this bug.