1361258 - stylo: Use Thread-local jemalloc arenas

Reporter

Description

•

7 years ago

See the discussion surrounding bug 1291355 comment 35.

Bobby Holley (:bholley)

Reporter

Updated

•

7 years ago

Flags: needinfo?(bobbyholley)

Julian Seward [:jseward]

Comment 1

•

7 years ago

Attached patch bug1361258-1.cset (obsolete) — Details — Splinter Review

Patch for contemplation.  The switching is done in traverse_subtree
in servo/ports/geckolib/glue.rs.

Bobby Holley (:bholley)

Reporter

Comment 2

•

7 years ago

Nice! bug 1347525 is the benchmark to measure, according to smaug.

Assignee: bobbyholley → jseward

Flags: needinfo?(bobbyholley)

Julian Seward [:jseward]

Comment 3

•

7 years ago

Here are "[PERF],traversal_time_ms" numbers for the Obama testcase,
taking the best of 4 runs:

one arena, obama
  1  97 97 99 101 -> 97
  2  70 63 63 64  -> 63
  4  50 52 47 46  -> 46
  8  71 61 61 62  -> 61

multi arena, obama
  1  103 101 102 101 -> 101
  2  56 57 57 68     ->  56
  4  36 35 35 35     ->  35
  8  25 25 24 24     ->  24

adaptive, obama
  1  98 109 99 98 -> 98
  2  55 58 56 56  -> 55
  4  35 38 39 36  -> 35
  8  25 24 25 31  -> 24

Single-arena scales poorly but is marginally faster in the 1-thread case.
Multi arena scales much better.  Adaptive (meaning, with this patch) does
as well as Multi for 8 threads and as well as a single arena for the one
thread case.

<marginally OT>
Ignoring the one-arena case, we get a more or less 3 x speedup with 4 
threads but only a 4 x speedup with 8 threads.  It would be interesting
to know why that is:

* lack of available parallelism in the problem?

* low level resource contention of some kind (RFO misses)?

* the effects of hyperthreading? -- a core shared by two threads
  is going to be slower for each thread than if only one thread
  runs on it

* something else?
</marginally OT>

Julian Seward [:jseward]

Comment 4

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #1)
> Patch for contemplation.  The switching is done in traverse_subtree
> in servo/ports/geckolib/glue.rs.

One thing I don't like about this is the fact that we have to "manually"
change the allocation mode back to single-arena after the parallel
traversal.  It would be all too easy to forget to do that some time
in the future.  Also, what happens if the parallel traversal takes an
exception of some kind?

I would prefer to wrap this up in a C++ style RAII construct.  Is that
possible in Rust?  And would that help in the exception case?

For reference, the relevant code fragment (post patch) is:

if traversal_driver.is_parallel() {
    unsafe { bindings::Gecko_SetMultipleJemallocArenasEnabled(true); }
    parallel::traverse_dom(&traversal, element, token,
                           global_style_data.style_thread_pool.as_ref().unwrap());
    unsafe { bindings::Gecko_SetMultipleJemallocArenasEnabled(false); }
} else {
    sequential::traverse_dom(&traversal, element, token);
}

Flags: needinfo?(bobbyholley)

Julian Seward [:jseward]

Comment 5

•

7 years ago

Here are some numbers for smaug's microbenchmark as discussed in bug 1347525,
specifically using the test case at bug 1347525 comment 13.

This is on Linux x86_64, with some effort made to run on quiet machine.
The following numbers are time in ms:

STYLO_THREADS=1    single arena     78
STYLO_THREADS=8    single arena     78

STYLO_THREADS=1    multi arena      75
STYLO_THREADS=8    multi arena      76

STYLO_THREADS=1    adaptive         76
STYLO_THREADS=8    adaptive         77

m-c, -Og, non stylo, ~ a week old   64

So there's no evidence here that multi arenas slows down this test, nor that
this patch is helping.  I'll try on OSX, since one thing that might be different
there is the TLS costs.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 6

•

7 years ago

Stylo overhead in general is horrible there. That must get fixed.

Comparing opt m-i (no stylo), 
MOZ_MEMORY_NARENAS_DEFAULT_ONE enabled (the default setup) I get best value 47
MOZ_MEMORY_NARENAS_DEFAULT_ONE disabled (MOZ_STYLO case), I get best value 49.
So the difference is smaller than it used to be, which makes sense since malloc usage has been reduced in HTML parser, and elsewhere in DOM.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Comment 7

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #4)
> One thing I don't like about this is the fact that we have to "manually"
> change the allocation mode back to single-arena after the parallel
> traversal.  It would be all too easy to forget to do that some time
> in the future.  Also, what happens if the parallel traversal takes an
> exception of some kind?
> 
> I would prefer to wrap this up in a C++ style RAII construct.  Is that
> possible in Rust?

It would look something like:

  pub struct AutoMultipleJemallocArenas;

  impl AutoMultipleJemallocArenas {
      pub fn new() -> AutoMultipleJemallocArenas {
          unsafe {
              bindings::Gecko_SetMultipleJemallocArenasEnabled(true);
          }
      }
  }

  impl Drop for AutoMultipleJemallocArenas {
      fn drop(&mut self) {
          unsafe {
              bindings::Gecko_SetMultipleJemallocArenasEnabled(false);
          }
      }
  }

And then usage would look something like:

  let mut maybe_arenas = None;
  if traversal_driver.is_parallel() {
      maybe_arenas = Some(AutoMultipleJemallocArenas::new());
  }

  // do traversal...
      

> if traversal_driver.is_parallel() {
>     unsafe { bindings::Gecko_SetMultipleJemallocArenasEnabled(true); }
>     parallel::traverse_dom(&traversal, element, token,
>                           
> global_style_data.style_thread_pool.as_ref().unwrap());
>     unsafe { bindings::Gecko_SetMultipleJemallocArenasEnabled(false); }
> } else {
>     sequential::traverse_dom(&traversal, element, token);
> }

Bobby Holley (:bholley)

Reporter

Comment 8

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #6)
> Stylo overhead in general is horrible there. That must get fixed.

On this testcase specifically, with stylo enabled? Can you file a bug with a profile?

> 
> Comparing opt m-i (no stylo), 
> MOZ_MEMORY_NARENAS_DEFAULT_ONE enabled (the default setup) I get best value
> 47
> MOZ_MEMORY_NARENAS_DEFAULT_ONE disabled (MOZ_STYLO case), I get best value
> 49.
> So the difference is smaller than it used to be, which makes sense since
> malloc usage has been reduced in HTML parser, and elsewhere in DOM.

Ok, I think the next things to do are:
(1) Remeasure comment 5 with a checkout of m-c from about 21 days ago, which is when the discussion around bug 1291355 comment 6 happened. We should be able to reproduce the 10% overhead he talked about, and then see if adaptive arenas fix it.
(2) Push both multiple arenas and adapative arenas to try with -t all to get talos results.

Comment 7 is spot-on for doing RAII in Rust. I also think we should do it in parallel.rs rather than in the caller in case we call traverse_dom from multiple places.

Flags: needinfo?(bobbyholley)

Bobby Holley (:bholley)

Reporter

Comment 9

•

7 years ago

(We can make it compile-time conditional by doing if cfg!(feature = "gecko") { }).

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 10

•

7 years ago

FWIW, talos won't give too useful numbers.

Also, if we focus on the innerHTML case, bug 1355441 will reduce malloc usage there some more.


(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8)
> (In reply to Olli Pettay [:smaug] from comment #6)
> > Stylo overhead in general is horrible there. That must get fixed.
> 
> On this testcase specifically, with stylo enabled? Can you file a bug with a
> profile?
Julian, could you file that? I don't have stylo builds around to get Gecko profiler profile.

Flags: needinfo?(jseward)

Bobby Holley (:bholley)

Reporter

Comment 11

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #10)
> FWIW, talos won't give too useful numbers.
> 
> Also, if we focus on the innerHTML case, bug 1355441 will reduce malloc
> usage there some more.

Yes, but the point is to figure out the impact of adaptive arenas on malloc-bound workloads. I want to make sure it fixes the problem you measured before so that we can be sure there's no overhead in other areas when we turn it on.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 12

•

7 years ago

Sure, we need to find some new microbenchmarks.
Perhaps some tescase which is creating lots of new elements.
Bug 1232023 might be such?

Bobby Holley (:bholley)

Reporter

Comment 13

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #12)
> Sure, we need to find some new microbenchmarks.

In comment 8, I proposed just checking out a copy of m-c from last month when you measured it and using that. Is there any reason that wouldn't work?

Julian Seward [:jseward]

Comment 14

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #10)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8)
> > (In reply to Olli Pettay [:smaug] from comment #6)
> > > Stylo overhead in general is horrible there. That must get fixed.
> > 
> > On this testcase specifically, with stylo enabled? Can you file a bug with a
> > profile?
> Julian, could you file that? I don't have stylo builds around to get Gecko
> profiler profile.

Can do, but I need to know: What sampling rate?  Which threads?

Flags: needinfo?(jseward) → needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 15

•

7 years ago

Whatever is the default rate in the profiler. And main thread should be enough, since that is where the slow down is. I assume the issue is that stylo does something slow whenever an element is added to DOM, 
or the issue is that stylo increases the sizeof Element by 8 bytes, and that causes elements to no fit so well into cpu caches or something.

Flags: needinfo?(bugs)

Julian Seward [:jseward]

Comment 16

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8)
> (1) Remeasure comment 5 with a checkout of m-c from about 21 days ago, which
> is when the discussion around bug 1291355 comment 6 happened. We should be
> able to reproduce the 10% overhead he talked about, and then see if adaptive
> arenas fix it.

Here's what I've got.  The tl;dr is that I can't repro it.

I compared m-c, optimised, non-debug, non-stylo, from a few hours ago
("current") and for 21 days ago ("previous"), using the following changeset
chosen more or less arbitrarily:

  changeset:   353289:02bf71459f40
  user:        Florian Queze <florian@queze.net>
  date:        Fri Apr 14 21:51:40 2017 +0200
  summary:     Bug 1356569 - fix test_SyncedTabsDeckComponent.js, r=jaws.

With results (milliseconds, smaller is better)

  current, single arena   69
  current, multi arena    69

  previous, single arena  66
  previous, multi arena   67

So there's not much to see here.  I *think* I did this right.

I also tried current-vs-previous with optimised, non-debug, stylo enabled:

  current: see comment 5, but copying in for convenience:

    STYLO_THREADS=1    single arena     78
    STYLO_THREADS=8    single arena     78

    STYLO_THREADS=1    multi arena      75
    STYLO_THREADS=8    multi arena      76

    STYLO_THREADS=1    adaptive         76
    STYLO_THREADS=8    adaptive         77

  previous:

    STYLO_THREADS=1    single arena     82
    STYLO_THREADS=8    single arena     81

    STYLO_THREADS=1    multi arena      79
    STYLO_THREADS=8    multi arena      81

    STYLO_THREADS=1    adaptive         82
    STYLO_THREADS=8    adaptive         81

This is on Linux x86_64, Fedora 25, Intel(R) Xeon(R) CPU E3-1505M v5, 2.6 GHz.
(4 cores, 8 threads, 8 MB L3).

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 17

•

7 years ago

ok, thanks. Perhaps I should have then re-run the test whenever I did it.
(I did re-run jemalloc4 test more often)

Might be worth to try also https://bugzilla.mozilla.org/show_bug.cgi?id=1232023

Stylo regression is anyhow very bad.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 18

•

7 years ago

(in bug 1232023, malloc seems to take 25% of cloneNode)

Julian Seward [:jseward]

Comment 19

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #16)

Ah, wait.  I made a stupid mistake with the following measurements
and they should be ignored:

> With results (milliseconds, smaller is better)
> 
>   current, single arena   69
>   current, multi arena    69
> 
>   previous, single arena  66
>   previous, multi arena   67

I built with --disable-jemalloc.  So these numbers are irrelevant.

The values for the optimised, non-debug, stylo enabled build are
AFAIK still valid.

Julian Seward [:jseward]

Comment 20

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #19)
> (In reply to Julian Seward [:jseward] from comment #16)
> Ah, wait.  I made a stupid mistake with the following measurements
> and they should be ignored:

Sorry for the noise.  Here are new measurements.  They use the mozconfig
shown below, and the following configurations for single vs multi arena:

single arena:
 memory/mozjemalloc/jemalloc.c:151:
  #ifndef MOZ_STYLO
  #define MOZ_MEMORY_NARENAS_DEFAULT_ONE
  #endif

multi arena:
  #ifndef MOZ_STYLO
  //#define MOZ_MEMORY_NARENAS_DEFAULT_ONE
  #endif

Results are: (without stylo, m-c curr)

  single arena  53 54
  multi arena   53 54

(14 Apr) 353289:02bf71459f40

  single arena  59 57 58
  multi arena   57 60 56

If we take the minimum time as most accurately representing the actual cost,
then I still can't demonstrate any slowdown for the multi-arena case.

smaug, can you sanity check this config?  Am I doing something different or
wrong here, relative to what you measured?

--- mozconfig -----------------------------------------------------

. $topsrcdir/browser/config/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/gcc-O2-nondebug-plain
ac_add_options --enable-tests

ac_add_options --enable-optimize="-g -O2"
ac_add_options --enable-debug-symbols

ac_add_options --enable-profiling

ac_add_options --disable-crashreporter

mk_add_options MOZ_MAKE_FLAGS="-j8"
mk_add_options AUTOCLOBBER=1

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 21

•

7 years ago

Well, that sounds like there isn't regression, as I hinted in comment 17.
But I'll try again.

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 22

•

7 years ago

I don't know why I got the slower performance a month ago.

Anyhow, what is still clear that there is some major stylo specific regression, based on comment 16.
Has the bug filed for that?

Julian Seward [:jseward]

Comment 23

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #22)
> Anyhow, what is still clear that there is some major stylo specific
> regression, based on comment 16.  Has the bug filed for that?

Yes, bug 1362982.

Julian Seward [:jseward]

Comment 24

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8)
> (2) Push both multiple arenas and adapative arenas to try with -t all to get
> talos results.

Here are results for single and multiple arenas (not adaptive).  They are
generated with
./mach try -b do -p all -u all -t all --rebuild-talos 5

Single   is changeset 5a190b..
Multiple is changeset dcf903..

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=5a190bf0eeb8d55404cd84062c573cbed092a64e&newProject=try&newRevision=dcf903ecb43d590a983a467c339dd588ee6fe3a3&framework=1&showOnlyImportant=0

They show no (important) perf changes, but there are RSS regressions for
windows7-32, up to 15.80%.  No other platforms seem to space-regress, though.
There are also very large changes to "tp5o XRes opt" and "tp5o XRes opt e10s".
What are those?

Bobby Holley (:bholley)

Reporter

Comment 25

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #24)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8)
> > (2) Push both multiple arenas and adapative arenas to try with -t all to get
> > talos results.
> 
> Here are results for single and multiple arenas (not adaptive).  They are
> generated with
> ./mach try -b do -p all -u all -t all --rebuild-talos 5
> 
> Single   is changeset 5a190b..
> Multiple is changeset dcf903..
> 
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=5a190bf0eeb8d55404cd84062c573cbe
> d092a64e&newProject=try&newRevision=dcf903ecb43d590a983a467c339dd588ee6fe3a3&
> framework=1&showOnlyImportant=0
> 
> They show no (important) perf changes, but there are RSS regressions for
> windows7-32, up to 15.80%.  No other platforms seem to space-regress, though.

Oh hm, that's worth looking into. How does it look with adaptive? I'm curious as to whether it's a sunk cost or depends on the amount of multiple arena usage.

> There are also very large changes to "tp5o XRes opt" and "tp5o XRes opt
> e10s".
> What are those?

https://wiki.mozilla.org/Buildbot/Talos/Tests#Xres_.28X_Resource_Monitoring.29

Bobby Holley (:bholley)

Reporter

Comment 26

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #24)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8)
> > (2) Push both multiple arenas and adapative arenas to try with -t all to get
> > talos results.
> 
> Here are results for single and multiple arenas (not adaptive).  They are
> generated with
> ./mach try -b do -p all -u all -t all --rebuild-talos 5
> 
> Single   is changeset 5a190b..
> Multiple is changeset dcf903..
> 
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=5a190bf0eeb8d55404cd84062c573cbe
> d092a64e&newProject=try&newRevision=dcf903ecb43d590a983a467c339dd588ee6fe3a3&
> framework=1&showOnlyImportant=0
> 
> They show no (important) perf changes, but there are RSS regressions for
> windows7-32, up to 15.80%.  No other platforms seem to space-regress, though.

Really? If I uncheck 'show only important changes', I see a 12% linux regression too.

These regressions are on the order of 25MB, which is very large, much higher than it should be, at least for adaptive. Can you test adaptive? I would expect a maximum overhead of chunksize * num_extra_arenas, which IIUC should be something on the order of 4MB.

Julian Seward [:jseward]

Comment 27

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #26)
> Really? If I uncheck 'show only important changes', I see a 12% linux
> regression too.

Ah, right, I didn't see that at first.  It's marked as low confidence,
but I'm not sure what that means.

> Can you test adaptive?

Sure.  I also re-tested multi, so as to test all 3 variants from exactly
the same sources.

Single vs Multi:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=22028266be5e4485a959d44b1619c7e3d3f80dfa&newProject=try&newRevision=3dbb7af4687d3ee209016ce9ce01c23d1a1eebd2&framework=1

Single vs Adaptive:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=22028266be5e4485a959d44b1619c7e3d3f80dfa&newProject=try&newRevision=64247742801704909670ff22d798413f871e09ee&framework=1

Single vs Adaptive doesn't seem to show much by way of high-confidence
differences apart from a 6% improvement on linux64-stylo.  I assume that's
because most tests are done on non-Stylo-enabled builds, so they behave
the same as single-arena.

Is that what you wanted me to measure?  I'm somewhat confused.

Flags: needinfo?(bobbyholley)

Bobby Holley (:bholley)

Reporter

Comment 28

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #27)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #26)
> > Really? If I uncheck 'show only important changes', I see a 12% linux
> > regression too.
> 
> Ah, right, I didn't see that at first.  It's marked as low confidence,
> but I'm not sure what that means.
> 
> > Can you test adaptive?
> 
> Sure.  I also re-tested multi, so as to test all 3 variants from exactly
> the same sources.
> 
> Single vs Multi:
> 
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=22028266be5e4485a959d44b1619c7e3
> d3f80dfa&newProject=try&newRevision=3dbb7af4687d3ee209016ce9ce01c23d1a1eebd2&
> framework=1
> 
> Single vs Adaptive:
> 
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=22028266be5e4485a959d44b1619c7e3
> d3f80dfa&newProject=try&newRevision=64247742801704909670ff22d798413f871e09ee&
> framework=1
> 
> Single vs Adaptive doesn't seem to show much by way of high-confidence
> differences apart from a 6% improvement on linux64-stylo.  I assume that's
> because most tests are done on non-Stylo-enabled builds, so they behave
> the same as single-arena.

The linux64-stylo talos tests run everything with stylo, so I don't think that's accurate. The 6% improvement on linux64-stylo is presumably because stylo builds currently force multiple arenas, and so enabling adapative wins a lot of that back.

From what I see in your try pushes, linux64-stylo still uses about 15MB more RSS than vanilla linux64. We should investigate how much of that is due to multiple arenas and how much of it is due to other memory overhead in stylo. If the overhead of multiple arenas is more than a few MB, we should investigate why.



> 
> Is that what you wanted me to measure?  I'm somewhat confused.

Yes. The key point I wanted to establish is that adaptive doesn't regress other random stuff when stylo isn't turned on. The near-term goal is to get stylo built by default, which means getting rid of the special compile-time stylo behavior in mozjemalloc.c.

So in summary, we should do two things:
(1) Get adaptive arenas reviewed and landed. Ideally this would be on m-c before the end of the week, because that's our target date for building stylo by default. Do you think this is realistic?
(2) File a separate bug to measure the RSS hit of adaptive arenas on stylo, as described above. Once (1) is landed, this can by done by simply pushing a linux64-stylo try push with adaptive arenas disabled, and comparing that to the baseline results.

Let me know if that makes sense and what you think timelines for the above items are. This stuff is important. :-)

Flags: needinfo?(bobbyholley) → needinfo?(jseward)

Julian Seward [:jseward]

Comment 29

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #28)
> Let me know if that makes sense and what you think timelines for the above
> items are. This stuff is important. :-)

Yes that makes sense.  The patch needs tidying up before review -- tidying
at the jemalloc.c end and RAIIification at the Rust end.  I can prepare a
version for review by end of Weds.  I'll also file the followup, RSS
investigation, bug.

Flags: needinfo?(jseward)

Eric Rahm [:erahm]

Comment 30

•

7 years ago

Julian is it possible to get a summary of where we're at with this? I think there was a lot of talk about single-vs-multi and adaptive-vs-multi but not single-vs-adaptive and really that's all we care about for this bug. There was talk of regressions on microbenchmarks but it's not clear whether that was a non-issue or still persists (again we only care about single-vs-adaptive for this bug).

I'd also like to see some numbers comparing a non-stylo base build w/o the adaptive patches with a --enable-stylo build w/ the adaptive patches, this will make it easier to compare in perfherder. For this we want multiple talos runs and awsy runs as well.

Flags: needinfo?(jseward)

Eric Rahm [:erahm]

Comment 31

•

7 years ago

RE: not getting high confidence results, apparently the try flag is '--rebuild 5' not '--rebuild-talos 5'.

Eric Rahm [:erahm]

Comment 32

•

7 years ago

(In reply to Eric Rahm [:erahm] from comment #31)
> RE: not getting high confidence results, apparently the try flag is
> '--rebuild 5' not '--rebuild-talos 5'.

It might be more nuanced than that, it seems like taskcluster might not handle either rebuild flag (hence having good results on win/mac but not linux).

Bobby Holley (:bholley)

Reporter

Comment 33

•

7 years ago

(In reply to Eric Rahm [:erahm] from comment #30)
> Julian is it possible to get a summary of where we're at with this?

comment 28 summarizes my understanding of the current state of things.

> I think
> there was a lot of talk about single-vs-multi and adaptive-vs-multi but not
> single-vs-adaptive and really that's all we care about for this bug.

This is covered in comment 27, no?

> There
> was talk of regressions on microbenchmarks but it's not clear whether that
> was a non-issue or still persists (again we only care about
> single-vs-adaptive for this bug).

My impression is that there is no regression from enabling adaptie.

> 
> I'd also like to see some numbers comparing a non-stylo base build w/o the
> adaptive patches with a --enable-stylo build w/ the adaptive patches, this
> will make it easier to compare in perfherder.

I think that would conflate two variables. We know that adaptive + nostylo has no RSS regression. The next interesting thing to compare is stylo w/adaptive vs stylo w/o adaptive to measure the overhead of the arena choice (vs the general overhead of other stylo data structures).

> For this we want multiple
> talos runs and awsy runs as well.

FWIW, you don't need --rebuild to trigger multiple talos runs. Just click the rebuild button for the job of interest on treeherder.

Julian Seward [:jseward]

Comment 34

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #33)

I agree with Bobby's summary in comment 33.  From my point of view:

(1) Stylo (parallel) scales poorly with single arenas, for reasons we can guess
    at but don't yet know.

(2) With multiple arenas it scales much better.  There had been concern that
    multiple arenas would cause perf loss for malloc-intensive serial code, but
    that hasn't been repeatably observed.

(3) Multiple arenas do however cause some significant RSS regressions on Talos,
    for reasons to be investigated in a followup bug.

(4) Bobby suggested an adaptive approach, where we enable multiple arenas
    exactly during the parallel styling pass(es) but nowhere else.  The original
    motivation was to avoid the problems expected in (2), but that turned out to
    be a moot point.  Instead, we discovered we need the adaptive approach to
    avoid the problems observed in (3).

(5) For a non-stylo-enabled run, the adaptive approach has the same perf
    characteristics as single arena, because multiple arenas are never enabled.

(6) For a stylo-enabled run, the adaptive approach gets us the parallel
    scalability of (2) with less of the space penalty of (3).

(7) For the moment this seems the least-worst approach, so we're going with it.

Flags: needinfo?(jseward)

Julian Seward [:jseward]

Comment 35

•

7 years ago

Attached patch Mike's proposal, part 1 of 2 (obsolete) — Details — Splinter Review

Attachment #8864146 - Attachment is obsolete: true

Julian Seward [:jseward]

Comment 36

•

7 years ago

Attached patch Mike's proposal, part 2 of 2 (obsolete) — Details — Splinter Review

These two patches (from Mike) implement the scheme discussed at
bug 1291355 comment 69 through bug 1291355 comment 74.

Julian Seward [:jseward]

Comment 37

•

7 years ago

Some numbers for the patches in comment 35 and comment 36:

Obama parallel traversal times, milliseconds

single

  STYLO_THREADS=1 84 83 83  (best 83)  1.00 x
  STYLO_THREADS=2 58 55 55  (best 55)  1.51 x
  STYLO_THREADS=3 43 42 43  (best 42)  1.97 x
  STYLO_THREADS=4 43 43 50  (best 43)  1.93 x
  STYLO_THREADS=6 45 47 60  (best 45)  1.84 x
  STYLO_THREADS=8 56 59 59  (best 56)  1.48 x

adaptive-mh

  STYLO_THREADS=1 86 85 83  (best 83)  1.00 x
  STYLO_THREADS=2 49 48 49  (best 48)  1.73 x
  STYLO_THREADS=3 34 40 34  (best 34)  2.44 x
  STYLO_THREADS=4 32 31 30  (best 30)  2.77 x
  STYLO_THREADS=6 26 24 24  (best 24)  3.45 x
  STYLO_THREADS=8 22 23 21  (best 21)  3.95 x


"innerHTML" test case from
https://bugzilla.mozilla.org/show_bug.cgi?id=1347525#c13, milliseconds

single

  73 72 75  (best 72)

adaptive-mh

  75 74 72  (best 72)


So -- it looks good from both the parallel and single-thread aspects.
Talos measurements to come.


--- command for Obama testcase -------------------------------

STYLO_THREADS=8 DISPLAY=:2.0 DUMP_STYLE_STATISTICS=1 ./mach run --disable-e10s file:///home/sewardj/MOZ/STYLO/Barack_Obama_Wikipedia.html

--- command for innerHTML testcase ---------------------------

STYLO_THREADS=1 DISPLAY=:2.0 QUMP_STYLE_STATISTICS=1 ./gcc-Og-nondebug-stylo/dist/bin/firefox-bin -P dev -no-remote file:///home/sewardj/MOZ/STYLO/bug1347525testcase_looping2.html

--- config for adaptive-mh -----------------------------------

is with the patches in comment 35 and comment 36 applied

--- config for single ----------------------------------------

is with the patches in comment 35 and comment 36 applied, plus
the following

diff --git a/memory/mozjemalloc/jemalloc.c b/memory/mozjemalloc/jemalloc.c
--- a/memory/mozjemalloc/jemalloc.c
+++ b/memory/mozjemalloc/jemalloc.c
@@ -3048,17 +3048,17 @@ chunk_dealloc(void *chunk, size_t size)
  * Begin arena.
  */
 
 MOZ_JEMALLOC_API void
 jemalloc_set_multiple_arenas_enabled_impl(bool multi_enabled)
 {
        arena_t *arena;
 
-       if (multi_enabled) {
+       if (0 && multi_enabled) {
                arena = choose_arena_hard();
        } else {
                arena = arenas[0];
        }
 #ifdef MOZ_MEMORY_WINDOWS
        TlsSetValue(tlsIndex, arena);
 #else
        arenas_map = arena;

--- mozconfig ------------------------------------------------

sewardj@nienna[C39]:/space/MOZ/MC-STYLO$ cat $MOZCONFIG
. $topsrcdir/browser/config/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/gcc-Og-nondebug-stylo
ac_add_options --enable-tests

ac_add_options --enable-optimize="-g -Og"
ac_add_options --enable-debug-symbols

ac_add_options --enable-stylo

ac_add_options --enable-valgrind

ac_add_options --enable-profiling
ac_add_options --enable-elf-hack

ac_add_options --disable-crashreporter

mk_add_options MOZ_MAKE_FLAGS="-j8"
mk_add_options AUTOCLOBBER=1

Eric Rahm [:erahm]

Comment 38

•

7 years ago

Thanks for the thorough breakdown Julian. One question: if we're comparing perf should we be doing more aggressively optimized builds, ie: -O3 vs -Og?

Mike Hommey [:glandium]

Assignee

Comment 39

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #36)
> Created attachment 8866354 [details] [diff] [review]
> Mike's proposal, part 2 of 2
> 
> These two patches (from Mike) implement the scheme discussed at
> bug 1291355 comment 69 through bug 1291355 comment 74.

Credit goes where credit due: the first part comes entirely from Julian's previous patch.

(In reply to Eric Rahm [:erahm] from comment #38)
> Thanks for the thorough breakdown Julian. One question: if we're comparing
> perf should we be doing more aggressively optimized builds, ie: -O3 vs -Og?

Note that --enable-optimize flags are not used for rust code.

Mike Hommey [:glandium]

Assignee

Comment 40

•

7 years ago

Attached patch Add a jemalloc_thread_local_arena API with a binding for rust — Details — Splinter Review

This is extracted from Julian's original patch, with a change to the API name.

Assignee: jseward → mh+mozilla

Attachment #8866351 - Attachment is obsolete: true

Attachment #8866606 - Flags: review+

Mike Hommey [:glandium]

Assignee

Comment 41

•

7 years ago

Attached patch Initial implementation for jemalloc_thread_local_arena — Details — Splinter Review

See message in the patch for the tradeoffs made in order to make this landable quickly.

Bobby, please check that you're okay with the second item for the short term (I intend to fix it next week along with the last item, but I didn't want to block enabling stylo on this)

Attachment #8866354 - Attachment is obsolete: true

Attachment #8866607 - Flags: review?(erahm)

Attachment #8866607 - Flags: feedback?(bobbyholley)

Mike Hommey [:glandium]

Assignee

Comment 42

•

7 years ago

Attached patch Opt-in to thread-local jemalloc arenas for stylo rayon threads (obsolete) — Details — Splinter Review

Attachment #8866608 - Flags: review?(bobbyholley)

Mike Hommey [:glandium]

Assignee

Comment 43

•

7 years ago

Attached patch Backout 2214b3c57c9c (bug 1291356), disabling multiple jemalloc arenas — Details — Splinter Review

Attachment #8866609 - Flags: review+

Julian Seward [:jseward]

Comment 44

•

7 years ago

(In reply to Eric Rahm [:erahm] from comment #38)
-O2 would probably be wise in future.  -Og is somewhat less aggressive
than -O2.  The main thing though is that opt levels between the two 
compared versions are the same, and that is of course true.

Julian Seward [:jseward]

Comment 45

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #37)
> So -- it looks good from both the parallel and single-thread aspects.
> Talos measurements to come.

Talos results are at [1].  MacOS failed to build, so the results will
presumably lack numbers for MacOS.

[1]
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=70a83a6491aa7e1d5293df57763f4815027c8773&newProject=try&newRevision=4a8ef276b769fcbdf600513b9113bc1e692dc242&framework=1&showOnlyImportant=0

Mike Hommey [:glandium]

Assignee

Comment 46

•

7 years ago

Attached patch Add a hazard exception for Gecko_SetJemallocThreadLocalArena — Details — Splinter Review

Attachment #8866640 - Flags: review?(sphink)

Mike Hommey [:glandium]

Assignee

Comment 47

•

7 years ago

Attached patch Use Thread Local Storage in mozjemalloc on mac (obsolete) — Details — Splinter Review

Attachment #8866645 - Flags: review?(erahm)

Mike Hommey [:glandium]

Assignee

Comment 48

•

7 years ago

Comment on attachment 8866645 [details] [diff] [review]
Use Thread Local Storage in mozjemalloc on mac

Review of attachment 8866645 [details] [diff] [review]:
-----------------------------------------------------------------

Somehow this is busted.

Attachment #8866645 - Flags: review?(erahm)

Mike Hommey [:glandium]

Assignee

Comment 49

•

7 years ago

Attached patch Use Thread Local Storage in mozjemalloc on mac (obsolete) — Details — Splinter Review

Attachment #8866645 - Attachment is obsolete: true

Attachment #8866664 - Flags: review?(erahm)

Mike Hommey [:glandium]

Assignee

Comment 50

•

7 years ago

Comment on attachment 8866664 [details] [diff] [review]
Use Thread Local Storage in mozjemalloc on mac

This is busted as well.

Attachment #8866664 - Flags: review?(erahm)

Bobby Holley (:bholley)

Reporter

Comment 51

•

7 years ago

Comment on attachment 8866607 [details] [diff] [review]
Initial implementation for jemalloc_thread_local_arena

Review of attachment 8866607 [details] [diff] [review]:
-----------------------------------------------------------------

Under-reporting memory usage for a week sounds like a fine price to pay for moving ahead on this. Thanks for working on this!

Attachment #8866607 - Flags: feedback?(bobbyholley) → feedback+

Bobby Holley (:bholley)

Reporter

Comment 52

•

7 years ago

Comment on attachment 8866608 [details] [diff] [review]
Opt-in to thread-local jemalloc arenas for stylo rayon threads

Review of attachment 8866608 [details] [diff] [review]:
-----------------------------------------------------------------

This is much nicer than twiddling a bit whenever we use the thread pool - thank you!

Note that this will need to be rebased on top of https://github.com/servo/servo/pull/16786

Attachment #8866608 - Flags: review?(bobbyholley) → review+

Mike Hommey [:glandium]

Assignee

Comment 53

•

7 years ago

Attached patch Use Thread Local Storage in mozjemalloc on mac — Details — Splinter Review

Talos agrees that this essentially has no additional overhead (comparing with and without this patch, but with all the others).

Attachment #8866664 - Attachment is obsolete: true

Attachment #8866721 - Flags: review?(erahm)

Mike Hommey [:glandium]

Assignee

Comment 54

•

7 years ago

This compares unpatched vs. patched (without the TLS change on mac). Note that without the TLS change on mac, this patch queue is a noop, and the differences that appear on perfherder for glterrain and bloom are just due to those tests being noisy or bimodal:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8e8da236d6c0&newProject=try&newRevision=f0338f956b40&framework=1&showOnlyImportant=0

(note this means that for linux-stylo, this is multi-arena vs. thread-local arena, and there's a nice drop in memory usage)

This compares unpacked vs. fully patched on mac (although, without the initialization ordering change, which doesn't have an impact on performance):

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8e8da236d6c0&newProject=try&newRevision=1868d1ad74ea&framework=1&showOnlyImportant=0

And I just pushed mozilla-central with forced single-arena for stylo, to compare against thread-local arena (mostly for memory usage):

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=703266e25919&newProject=try&newRevision=f0338f956b40&framework=1&showOnlyImportant=0

Mike Hommey [:glandium]

Assignee

Comment 55

•

7 years ago

Attached patch Opt-in to thread-local jemalloc arenas for stylo rayon threads — Details — Splinter Review

Rebased ; does this need to go through github?

Attachment #8866608 - Attachment is obsolete: true

Attachment #8866960 - Flags: review?(bobbyholley)

Mike Hommey [:glandium]

Assignee

Updated

•

7 years ago

Summary: stylo: Investigate enabling multiple jemalloc arenas only during the parallel traversal → stylo: Use Thread-local jemalloc arenas

Eric Rahm [:erahm]

Updated

•

7 years ago

Attachment #8866721 - Flags: review?(erahm) → review+

Eric Rahm [:erahm]

Comment 56

•

7 years ago

Comment on attachment 8866607 [details] [diff] [review]
Initial implementation for jemalloc_thread_local_arena

Review of attachment 8866607 [details] [diff] [review]:
-----------------------------------------------------------------

So just to be clear: every time the stylo thread pool gets created and kicks in we leak an arena right? Do you know if we reuse the same pool each time (so we only leak thread-pool-size arenas total) or if it gets recreated for each styling (so we leak number_of_styloings * thread-pool-size). I guess as long as you promise to fix it and stylo is disabled by default I don't care that much.

::: memory/mozjemalloc/jemalloc.c
@@ +3063,5 @@
> +		arena = arenas_extend(NO_INDEX);
> +	} else {
> +		arena = arenas[0];
> +	}
> +#ifdef MOZ_MEMORY_WINDOWS

I guess 'thread_local' (available in msvc) is C++11 only, whereas '__thread' is available for C (in gcc/clang but not msvc)? I wonder if it would make sense to just compile this code as C++ instead...regardless that's all follow up worthey, the status quo is fine here.

Attachment #8866607 - Flags: review?(erahm) → review+

Chris Peterson [:cpeterson]

Updated

•

7 years ago

Blocks: stylo-nightly-build

Mike Hommey [:glandium]

Assignee

Comment 57

•

7 years ago

(In reply to Eric Rahm [:erahm] from comment #56)
> Comment on attachment 8866607 [details] [diff] [review]
> Initial implementation for jemalloc_thread_local_arena
> 
> Review of attachment 8866607 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> So just to be clear: every time the stylo thread pool gets created and kicks
> in we leak an arena right? Do you know if we reuse the same pool each time
> (so we only leak thread-pool-size arenas total) or if it gets recreated for
> each styling (so we leak number_of_styloings * thread-pool-size). I guess as
> long as you promise to fix it and stylo is disabled by default I don't care
> that much.

Maybe the commit message wasn't clear enough, but the thread pool for stylo only starts threads once, and they live until the end of the process (and in fact, the threads didn't even call the exit handler when I tried ; they're probably just killed).

> ::: memory/mozjemalloc/jemalloc.c
> @@ +3063,5 @@
> > +		arena = arenas_extend(NO_INDEX);
> > +	} else {
> > +		arena = arenas[0];
> > +	}
> > +#ifdef MOZ_MEMORY_WINDOWS
> 
> I guess 'thread_local' (available in msvc) is C++11 only, whereas '__thread'
> is available for C (in gcc/clang but not msvc)? I wonder if it would make
> sense to just compile this code as C++ instead...regardless that's all
> follow up worthey, the status quo is fine here.

I'm actually considering it, because there's a lot of boilerplate in this code that could use RAII. And some awfully large macros that could use templates. That wouldn't necessarily allow to use thread_local, though, because of e.g. what's in the commit message for the mac TLS thing. But yeah, followup.

Mike Hommey [:glandium]

Assignee

Updated

•

7 years ago

Blocks: 1364358

Mike Hommey [:glandium]

Assignee

Updated

•

7 years ago

Blocks: 1364359

Bobby Holley (:bholley)

Reporter

Comment 58

•

7 years ago

Comment on attachment 8866960 [details] [diff] [review]
Opt-in to thread-local jemalloc arenas for stylo rayon threads

Review of attachment 8866960 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. This should be PR-ed against github.com/servo/servo when the relevant APIs have hit autoland. Ping somebody in #servo to trigger the landing.

When you write that patch, you'll also need to copy over the generated bindings.rs file from your objdir into components/style/gecko/generated/bindings.rs . We check in a copy of the bindings so that the servo CI can at least compile libgeckoservo independently. The bindings don't need to be perfectly up to date, so long as things compile.

Attachment #8866960 - Flags: review?(bobbyholley) → review+

Mike Hommey [:glandium]

Assignee

Comment 59

•

7 years ago

Ok, so since mozreview won't allow to autoland this because of the mixed authorship and some of the patches being r=glandium, I'll go ahead and land on inbound now, ahead of sfink's review of the trivial one-liner exception for hazard builds. Eventually, this will make it to autoland through merges, at which point I'll be able to get servo landed. And then I'll disable multiple-arenas.

Mike Hommey [:glandium]

Assignee

Updated

•

7 years ago

Keywords: leave-open

Pulsebot

Comment 60

•

7 years ago

Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/182823a741b4
Add a jemalloc_thread_local_arena API with a binding for rust. r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/1290ff350272
Add a hazard exception for Gecko_SetJemallocThreadLocalArena. r=me
https://hg.mozilla.org/integration/mozilla-inbound/rev/751e131baa5d
Initial implementation for jemalloc_thread_local_arena. r=erahm
https://hg.mozilla.org/integration/mozilla-inbound/rev/bacbf98fc818
Use Thread Local Storage in mozjemalloc on mac. r=erahm

Mike Hommey [:glandium]

Assignee

Updated

•

7 years ago

Attachment #8866606 - Flags: checkin+

Mike Hommey [:glandium]

Assignee

Updated

•

7 years ago

Attachment #8866607 - Flags: checkin+

Mike Hommey [:glandium]

Assignee

Comment 61

•

7 years ago

Comment on attachment 8866640 [details] [diff] [review]
Add a hazard exception for Gecko_SetJemallocThreadLocalArena

Still interested in the review, after the fact.

Attachment #8866640 - Flags: checkin+

Mike Hommey [:glandium]

Assignee

Updated

•

7 years ago

Attachment #8866721 - Flags: checkin+

Carsten Book [:Tomcat]

Comment 62

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/182823a741b4
https://hg.mozilla.org/mozilla-central/rev/1290ff350272
https://hg.mozilla.org/mozilla-central/rev/751e131baa5d
https://hg.mozilla.org/mozilla-central/rev/bacbf98fc818

Steve Fink [:sfink] [:s:]

Comment 63

•

7 years ago

Comment on attachment 8866640 [details] [diff] [review]
Add a hazard exception for Gecko_SetJemallocThreadLocalArena

Review of attachment 8866640 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/devtools/rootAnalysis/analyzeHeapWrites.js
@@ +384,5 @@
>          "Gecko_EnsureMozBorderColors",
>          "Gecko_ClearMozBorderColors",
>          "Gecko_AppendMozBorderColors",
>          "Gecko_CopyMozBorderColors",
> +        "Gecko_SetJemallocThreadLocalArena",

At some point, I'd like to go through all these and try to remove as many as possible, but this one seems more reasonable to have there than most.

Attachment #8866640 - Flags: review?(sphink) → review+

Mike Hommey [:glandium]

Assignee

Comment 64

•

7 years ago

https://github.com/servo/servo/pull/16847 is landing the servo part in the servo repo. After that only the backout of 2214b3c57c9c is left, and that will need to wait for the autolanded servo commit to reach inbound.

Bobby Holley (:bholley)

Reporter

Comment 65

•

7 years ago

(In reply to Mike Hommey [:glandium] from comment #64)
> https://github.com/servo/servo/pull/16847 is landing the servo part in the
> servo repo. After that only the backout of 2214b3c57c9c is left, and that
> will need to wait for the autolanded servo commit to reach inbound.

I can just push this directly to autoland to avoid round-tripping through central, which will be especially slow on a weekend.

Bobby Holley (:bholley)

Reporter

Updated

•

7 years ago

Keywords: leave-open

Pulsebot

Comment 66

•

7 years ago

Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/503e83d0d7ab
Backout 2214b3c57c9c (bug 1291356), disabling multiple jemalloc arenas. r=glandium

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 67

•

7 years ago

Merge of backout https://hg.mozilla.org/mozilla-central/rev/503e83d0d7ab

Bobby Holley (:bholley)

Reporter

Updated

•

7 years ago

Blocks: 1364838

Mike Hommey [:glandium]

Assignee

Comment 68

•

7 years ago

Everything landed on central.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox55: --- → fixed

Target Milestone: --- → mozilla55

Mike Hommey [:glandium]

Assignee

Updated

•

7 years ago

Blocks: 1403766

bug1361258-1.cset 7 years ago Julian Seward [:jseward] 15.60 KB, patch		Details \| Diff \| Splinter Review
Mike's proposal, part 1 of 2 7 years ago Julian Seward [:jseward] 10.28 KB, patch		Details \| Diff \| Splinter Review
Mike's proposal, part 2 of 2 7 years ago Julian Seward [:jseward] 8.10 KB, patch		Details \| Diff \| Splinter Review
Add a jemalloc_thread_local_arena API with a binding for rust 7 years ago Mike Hommey [:glandium] 11.28 KB, patch	glandium : review+ glandium : checkin+	Details \| Diff \| Splinter Review
Initial implementation for jemalloc_thread_local_arena 7 years ago Mike Hommey [:glandium] 4.09 KB, patch	erahm : review+ bholley : feedback+ glandium : checkin+	Details \| Diff \| Splinter Review
Opt-in to thread-local jemalloc arenas for stylo rayon threads 7 years ago Mike Hommey [:glandium] 1.94 KB, patch	bholley : review+	Details \| Diff \| Splinter Review
Backout 2214b3c57c9c (bug 1291356), disabling multiple jemalloc arenas 7 years ago Mike Hommey [:glandium] 1.43 KB, patch	glandium : review+	Details \| Diff \| Splinter Review
Add a hazard exception for Gecko_SetJemallocThreadLocalArena 7 years ago Mike Hommey [:glandium] 1.34 KB, patch	sfink : review+ glandium : checkin+	Details \| Diff \| Splinter Review
Use Thread Local Storage in mozjemalloc on mac 7 years ago Mike Hommey [:glandium] 1.85 KB, patch		Details \| Diff \| Splinter Review
Use Thread Local Storage in mozjemalloc on mac 7 years ago Mike Hommey [:glandium] 3.67 KB, patch		Details \| Diff \| Splinter Review
Use Thread Local Storage in mozjemalloc on mac 7 years ago Mike Hommey [:glandium] 7.10 KB, patch	erahm : review+ glandium : checkin+	Details \| Diff \| Splinter Review
Opt-in to thread-local jemalloc arenas for stylo rayon threads 7 years ago Mike Hommey [:glandium] 2.16 KB, patch	bholley : review+	Details \| Diff \| Splinter Review