1366347 - stylo: Rewrite parallel.rs

Assignee

Description

•

7 years ago

I went in to the Berlin office today to meet with jseward, and we had a chat with nmatsakis over vidyo. I was explaining to him how our parallel traversal works, and then we started looking at the code, and I realized that the code doesn't do anything close to what I thought it was doing. Instead, it does various things that are terrible for performance.

The biggest offense, by far, is that children of siblings are coalesced into the same work unit (I thought they were partitioned per-parent). In practice, this means there is zero parallelism until we hit a level in the DOM with more than 64 same-level cousins. This explains why stylo has been performing so much better on wikipedia than myspace: wikipedia's DOM is very wide starting pretty early on, whereas myspace is narrower.

There are also various other dumb things going on in terms of doing unnecessary allocations and context switches. I spent the afternoon fixing all this stuff up and have a result that I'm pretty happy with.

Sequential stylo can style myspace's DOM in 17ms. Before this patch, parallel stylo styled in in 12ms. After this patch, parallel styles myspace in 3.5ms.

Julian Seward [:jseward]

Comment 1

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #0)
> Sequential stylo can style myspace's DOM in 17ms. Before this patch,
> parallel stylo styled in in 12ms. After this patch, parallel styles myspace
> in 3.5ms.

Wow.  That is an excellent result.

Bobby Holley (:bholley)

Assignee

Comment 2

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #1)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #0)
> > Sequential stylo can style myspace's DOM in 17ms. Before this patch,
> > parallel stylo styled in in 12ms. After this patch, parallel styles myspace
> > in 3.5ms.
> 
> Wow.  That is an excellent result.

Yeah, though I don't want to jinx it until I get the Talos numbers back. There's always a chance that I'm screwing up the measurements.

Also, the win on bloom-basic is significant but less pronounced. We're now looking at a 3.4x speedup from parallelism (up from 3x), but that's still a bit short of what I'd expect.

Bobby Holley (:bholley)

Assignee

Comment 3

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=156e5a6f852b367e74bc50d0c50f21212c2d033b

Fingers crossed. In the mean time I'm going to go find some falafel. :-)

Manish Goregaokar [:manishearth]

Comment 4

•

7 years ago

O - [Child 1064] WARNING: stylo: cannot get ServoStyleSheets from XBL bindings yet. See bug 1290276.: file /home/worker/workspace/build/src/layout/base/nsCSSFrameConstructor.cpp, line 2752
1813
[task 2017-05-19T19:55:06.491083Z] 19:55:06     INFO - thread '<unnamed>' panicked at 'assertion failed: !traversal.is_parallel()', /home/worker/workspace/build/src/servo/components/style/sequential.rs:27
1814
[task 2017-05-19T19:55:06.491761Z] 19:55:06     INFO - stack backtrace:
1815
[task 2017-05-19T19:55:06.815927Z] 19:55:06     INFO -    0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
1816
[task 2017-05-19T19:55:06.816665Z] 19:55:06     INFO -    1: std::sys_common::backtrace::_print
1817
[task 2017-05-19T19:55:06.816763Z] 19:55:06     INFO -    2: std::panicking::default_hook::{{closure}}
1818
[task 2017-05-19T19:55:06.816871Z] 19:55:06     INFO -    3: std::panicking::default_hook
1819
[task 2017-05-19T19:55:06.817292Z] 19:55:06     INFO -    4: std::panicking::rust_panic_with_hook
1820
[task 2017-05-19T19:55:06.817387Z] 19:55:06     INFO -    5: std::panicking::begin_panic
1821
[task 2017-05-19T19:55:06.817489Z] 19:55:06     INFO -    6: style::sequential::traverse_dom
1822
[task 2017-05-19T19:55:06.817877Z] 19:55:06     INFO -    7: geckoservo::glue::traverse_subtree
1823
[task 2017-05-19T19:55:06.817965Z] 19:55:06     INFO -    8: Servo_TraverseSubtree
1824
[task 2017-05-19T19:55:06.818103Z] 19:55:06     INFO -    9: _ZN7mozilla13ServoStyleSet25PrepareAndTraverseSubtreeEPKNS_3dom7ElementENS_21TraversalRootBehaviorENS_24TraversalRestyleBehaviorE
1825

Bobby Holley (:bholley)

Assignee

Comment 5

•

7 years ago

Yeah that's just because I had some other stale stuff in the branch. The perf results should still be valid, re-pushing the debug tests here:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7c96701bfa6f4c471534dac3fdaa0fda697586d

Bobby Holley (:bholley)

Assignee

Comment 6

•

7 years ago

Attached patch Fix the parallel traversal. v1 (obsolete) — Details — Splinter Review

Attachment #8869590 - Flags: review?(emilio+bugs)

Bobby Holley (:bholley)

Assignee

Comment 7

•

7 years ago

Here's another Talos try push based on a more recent revision: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fb98394e86539475aef2ed400965fd4b2f525088

Results look like quite an improvement. We should get this into the tree.

Emilio Cobos Álvarez (:emilio)

Comment 8

•

7 years ago

Comment on attachment 8869590 [details] [diff] [review]
Fix the parallel traversal. v1

Review of attachment 8869590 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/style/parallel.rs
@@ +95,5 @@
>  
>      queue.install(|| {
>          rayon::scope(|scope| {
> +            traverse_nodes(nodes,
> +                           /* is_tail_call = */ true,

Can we convert is_tail_call into an enum? I don't think it's too much of a hassle, and it's worth for readability I believe, specially if you document the variants well :P

@@ +233,5 @@
> +                top_down_dom(&nodes, root, traversal_data, scope, traversal, tls);
> +            });
> +        }
> +    } else {
> +        let mut first_chunk: Option<Box<[SendNode<E::ConcreteNode>]>> = None;

Why using a boxed slice here instead of an Option<NodeList<E>>? If there's a good reason, leave a comment explaining why.

@@ +250,5 @@
> +        }
> +
> +        // If this is a tail call, bypass rayon for the first chunk and invoke top_down_dom
> +        // directly.
> +        debug_assert!(first_chunk.is_some() == is_tail_call);

nit: debug_assert_eq!.

Attachment #8869590 - Flags: review?(emilio+bugs) → review+

Bobby Holley (:bholley)

Assignee

Comment 9

•

7 years ago

Attached patch Fix the parallel traversal. r=emilio — Details — Splinter Review

Attachment #8869612 - Flags: review+

Bobby Holley (:bholley)

Assignee

Updated

•

7 years ago

Attachment #8869590 - Attachment is obsolete: true

Bobby Holley (:bholley)

Assignee

Comment 10

•

7 years ago

For the crashtest failure: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4a94f8d02eeee9e6a5fab51d0cbd70c064a4641

Bobby Holley (:bholley)

Assignee

Comment 11

•

7 years ago

Assuming that the crashtest failure is fixed, the only remaining issue is the font crash, presumably because more stuff is executing in parallel now and we're uncovering latent bugs: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7c96701bfa6f4c471534dac3fdaa0fda697586d

I'm basically out of time for the week. Manish, can you take that? It would be good to get this patch into the tree as soon as possible.

Flags: needinfo?(manishearth)

Manish Goregaokar [:manishearth]

Comment 12

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1eea5fdc5802aa521d8261593e8362c61817328

The fix is just

diff --git a/layout/style/ServoBindings.cpp b/layout/style/ServoBindings.cpp
index 4f7e1ed..da9774a 100644
--- a/layout/style/ServoBindings.cpp
+++ b/layout/style/ServoBindings.cpp
@@ -1916,6 +1916,7 @@ void
 Gecko_nsStyleFont_FixupNoneGeneric(nsStyleFont* aFont,
                                    RawGeckoPresContextBorrowed aPresContext)
 {
+  MutexAutoLock lock(*sServoFontMetricsLock);
   const nsFont* defaultVariableFont =
     aPresContext->GetDefaultFont(kPresContext_DefaultVariableFont_ID,
                                  aFont->mLanguage);


However, before landing, I would like to get numbers on whether or not this mutex affected perf (so that we can verify that we return to the original situation when we add a finer-grained rwlock).

(Or, I can land this first, provided this analysis is done at some point)

Flags: needinfo?(manishearth)

Manish Goregaokar [:manishearth]

Comment 13

•

7 years ago

Try is good. Should we land this?

Manish Goregaokar [:manishearth]

Comment 14

•

7 years ago

Attached patch gecko-side threadsafety patch (obsolete) — Details — Splinter Review

Attachment #8869670 - Flags: review?(emilio+bugs)

Bobby Holley (:bholley)

Assignee

Comment 15

•

7 years ago

(In reply to Manish Goregaokar [:manishearth] from comment #12)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=e1eea5fdc5802aa521d8261593e8362c61817328
> 
> The fix is just
> 
> diff --git a/layout/style/ServoBindings.cpp b/layout/style/ServoBindings.cpp
> index 4f7e1ed..da9774a 100644
> --- a/layout/style/ServoBindings.cpp
> +++ b/layout/style/ServoBindings.cpp
> @@ -1916,6 +1916,7 @@ void
>  Gecko_nsStyleFont_FixupNoneGeneric(nsStyleFont* aFont,
>                                     RawGeckoPresContextBorrowed aPresContext)
>  {
> +  MutexAutoLock lock(*sServoFontMetricsLock);
>    const nsFont* defaultVariableFont =
>      aPresContext->GetDefaultFont(kPresContext_DefaultVariableFont_ID,
>                                   aFont->mLanguage);
> 
> 
> However, before landing, I would like to get numbers on whether or not this
> mutex affected perf (so that we can verify that we return to the original
> situation when we add a finer-grained rwlock).

I just measured with a microbenchmark which I created by applying the patch at [1] to bloom-basic.html in [2].

The results weren't pretty. Before this patch, each worker thread spends less than 1ms in Gecko_nsStyleFont_FixupNoneGeneric. With this patch, each thread spends about 200ms thrashing for the lock, ruining overall performance.

So I'm not really comfortable landing this. I'll file a dependent bug.

[1] https://gist.github.com/bholley/e839eecacc8c56f80fec47b4356e212e
[2] https://github.com/heycam/style-perf-tests

https://perfht.ml/2qBgzqr

Bobby Holley (:bholley)

Assignee

Updated

•

7 years ago

Depends on: 1366468

Bobby Holley (:bholley)

Assignee

Updated

•

7 years ago

Attachment #8869670 - Flags: review?(emilio+bugs) → review-

Manish Goregaokar [:manishearth]

Comment 16

•

7 years ago

Okay, we can then try the existing fix on the bug but that is not thread safe in the presence of XML:lang and also is affected by a bug in the gfx caching code.

I'll push up a try push.

Manish Goregaokar [:manishearth]

Comment 17

•

7 years ago

Trying the WIP fix from bug 1362599

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6bdc63353e62e176a9b10d412b34e7d84bd9e27

This is 100% thread safe, but will panic if you use xml:lang to use new languages. Don't do that (our tests don't, so far). But that makes this a temporary fix. Feel free to land, however, it's already r=heycam.

It contains hacks because the caching code is broken. It should not affect performance aside from a miniscule blip right before the first traversal.

The right solution here is to use a read-write lock. I'm not yet sure if the RW lock should be used in conjunction with the precaching, or if it should be used in isolation. It can be used in isolation, but precaching prevents a lot of contention (OTOH it adds to the pretraverse cost, though it's mostly code that would have to run anyway at some point during traversal).

Either way, those patches can be landed. They may make one or two test fail because of the gfx bug mentioned in bug 1362599 comment 12 (I have inserted a hack fix to prevent that, but it may not always work)

Manish Goregaokar [:manishearth]

Comment 18

•

7 years ago

Ah, the naming scheme for the unicode atom is different, try https://treeherder.mozilla.org/#/jobs?repo=try&revision=68d0ccaedcbaf8f997d2cd3d837fc0daa2cd3e76

Manish Goregaokar [:manishearth]

Comment 19

•

7 years ago

Okay, that passes. Feel free to land!

(do a quick measurement first though)

Bobby Holley (:bholley)

Assignee

Comment 20

•

7 years ago

Can you please post the patch(es) you intend to be landed in bug 1366468? It looks like there were fixups in comment 18, and I'm not comfortable just scraping the try push and pushing it.

Flags: needinfo?(manishearth)

Manish Goregaokar [:manishearth]

Comment 21

•

7 years ago

They've been posted on bug 1362599.

I'll open a new bug for the missing XML:lang stuff later (not on laptop right now)

Flags: needinfo?(manishearth)

Manish Goregaokar [:manishearth]

Updated

•

7 years ago

Attachment #8869670 - Attachment is obsolete: true

Bobby Holley (:bholley)

Assignee

Comment 22

•

7 years ago

Ok, just tested them, and the overhead is gone. Thanks!

So all I need to do is hit 'land commits' on bug 1366347?

Flags: needinfo?(manishearth)

Bobby Holley (:bholley)

Assignee

Comment 23

•

7 years ago

I guess that won't work, because there's a whitespace change to servo code. So I'll presumably need to strip that. Anything else?

Manish Goregaokar [:manishearth]

Comment 24

•

7 years ago

Yeah, sorry, that snuck in there when I was pushing. Land those with the servo part stripped, yes.

Flags: needinfo?(manishearth)

Bobby Holley (:bholley)

Assignee

Comment 25

•

7 years ago

https://github.com/servo/servo/pull/16971

Bobby Holley (:bholley)

Assignee

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Hiroyuki Ikezoe (:hiro)

Comment 26

•

7 years ago

I've started using a revision that including this change. It's incredibly faster than before! Amazing!

Fix the parallel traversal. v1 7 years ago Bobby Holley (:bholley) 13.12 KB, patch	emilio : review+	Details \| Diff \| Splinter Review
Fix the parallel traversal. r=emilio 7 years ago Bobby Holley (:bholley) 13.67 KB, patch	bholley : review+	Details \| Diff \| Splinter Review
gecko-side threadsafety patch 7 years ago Manish Goregaokar [:manishearth] 734 bytes, patch	bholley : review-	Details \| Diff \| Splinter Review