Stylo: Remove atomic refcount bumping from style::rule_tree::StrongRuleNode::ensure_child

RESOLVED FIXED

Status

()

P1
normal
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: jseward, Assigned: emilio)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

a year ago
MESI profiling of the Stylo pass only, when running the Obama testcase, with
4 worker threads, shows that style::rule_tree::StrongRuleNode::ensure_child
is the biggest single source of RFOs, generating 1/3 of all RFOs in the
styling pass.  Examining the annotated assembly shows they all come from an
atomic add instructtion in ensure_child (post inlining).  The relevant
source code contains the following interesting comment:

  // TODO(emilio): We could avoid all the refcount churn here.

Emilio kindly hacked up the change he had in mind, which is now at
https://github.com/servo/servo/pull/17294

This seems to be a win.  Numbers are:

BEFORE                       |    best  |         | PAR SPEEDUP
1p  115.16  113.96  115.81   |  113.96  |         | 1.00 x
2    60.21   60.68   60.44   |   60.21  |         | 1.89 x
3    42.32   41.94   41.85   |   41.85  |         | 2.72 x
4    33.17   33.23   33.31   |   33.17  |         | 3.43 x
6    29.53   30.13   29.70   |   29.53  |         | 3.85 x
8    27.17   26.78   27.11   |   26.78  |         | 4.25 x

AFTER                             best  | %BEFORE |
1p  111.75  111.45  112.57   |  111.45  |   97.8% | 1.00 x
2    59.02   58.44   58.52   |   58.44  |   97.1% | 1.90 x
3    40.33   40.16   40.80   |   40.16  |   96.0% | 2.78 x
4    31.17   31.10   31.14   |   31.10  |   93.7% | 3.58 x
6    28.23   28.27   28.23   |   28.23  |   95.6% | 3.94 x
8    25.66   25.70   25.81   |   25.66  |   95.8% | 4.34 x

Which I interpret to mean:

* an absolute speedup of between 2% (1 core) and 6% (4 cores)

* parallel speedup seems modestly improved, at 4 cores from around 3.5 x to
  3.6 x.  Given that the 6 and 8 thread numbers necessarily involve
  hyperthreading, I would not pay much attention to them.

The average number of instructions per RFO increased from 1196 to 1871 (viz,
less interference in the memory system).

Setup: Quiet F25 machine, Skylake Xeon, 2.6 GHz, runs of the form:


STYLO_ITERS=500 STYLO_ITER_THRESHOLD=10 STYLO_THREADS={1,2,3,4,6,8} \
  FORCE_STYLO_THREAD_POOL=1 DISPLAY=:2.0 \
   ./gcc-Og-nondebug-stylo/dist/bin/firefox-bin -P dev -no-remote \
     /home/sewardj/MOZ/STYLO/Barack_Obama_Wikipedia.html
Duplicate of this bug: 1356166
Nice find + fix! This is exactly the kind of stuff I was hoping we'd find with your investigations. Something we may have known about, but didn't know was hot.
Assignee: nobody → emilio+bugs
Priority: -- → P1
(Assignee)

Comment 3

a year ago
This landed: https://hg.mozilla.org/mozilla-central/rev/b74375c2dcee
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.