Closed
Bug 1366347
Opened 8 years ago
Closed 8 years ago
stylo: Rewrite parallel.rs
Categories
(Core :: CSS Parsing and Computation, enhancement, P1)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
FIXED
People
(Reporter: bholley, Assigned: bholley)
References
Details
Attachments
(1 file, 2 obsolete files)
13.67 KB,
patch
|
bholley
:
review+
|
Details | Diff | Splinter Review |
I went in to the Berlin office today to meet with jseward, and we had a chat with nmatsakis over vidyo. I was explaining to him how our parallel traversal works, and then we started looking at the code, and I realized that the code doesn't do anything close to what I thought it was doing. Instead, it does various things that are terrible for performance.
The biggest offense, by far, is that children of siblings are coalesced into the same work unit (I thought they were partitioned per-parent). In practice, this means there is zero parallelism until we hit a level in the DOM with more than 64 same-level cousins. This explains why stylo has been performing so much better on wikipedia than myspace: wikipedia's DOM is very wide starting pretty early on, whereas myspace is narrower.
There are also various other dumb things going on in terms of doing unnecessary allocations and context switches. I spent the afternoon fixing all this stuff up and have a result that I'm pretty happy with.
Sequential stylo can style myspace's DOM in 17ms. Before this patch, parallel stylo styled in in 12ms. After this patch, parallel styles myspace in 3.5ms.
Comment 1•8 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #0)
> Sequential stylo can style myspace's DOM in 17ms. Before this patch,
> parallel stylo styled in in 12ms. After this patch, parallel styles myspace
> in 3.5ms.
Wow. That is an excellent result.
Assignee | ||
Comment 2•8 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #1)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #0)
> > Sequential stylo can style myspace's DOM in 17ms. Before this patch,
> > parallel stylo styled in in 12ms. After this patch, parallel styles myspace
> > in 3.5ms.
>
> Wow. That is an excellent result.
Yeah, though I don't want to jinx it until I get the Talos numbers back. There's always a chance that I'm screwing up the measurements.
Also, the win on bloom-basic is significant but less pronounced. We're now looking at a 3.4x speedup from parallelism (up from 3x), but that's still a bit short of what I'd expect.
Assignee | ||
Comment 3•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=156e5a6f852b367e74bc50d0c50f21212c2d033b
Fingers crossed. In the mean time I'm going to go find some falafel. :-)
Comment 4•8 years ago
|
||
O - [Child 1064] WARNING: stylo: cannot get ServoStyleSheets from XBL bindings yet. See bug 1290276.: file /home/worker/workspace/build/src/layout/base/nsCSSFrameConstructor.cpp, line 2752
1813
[task 2017-05-19T19:55:06.491083Z] 19:55:06 INFO - thread '<unnamed>' panicked at 'assertion failed: !traversal.is_parallel()', /home/worker/workspace/build/src/servo/components/style/sequential.rs:27
1814
[task 2017-05-19T19:55:06.491761Z] 19:55:06 INFO - stack backtrace:
1815
[task 2017-05-19T19:55:06.815927Z] 19:55:06 INFO - 0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
1816
[task 2017-05-19T19:55:06.816665Z] 19:55:06 INFO - 1: std::sys_common::backtrace::_print
1817
[task 2017-05-19T19:55:06.816763Z] 19:55:06 INFO - 2: std::panicking::default_hook::{{closure}}
1818
[task 2017-05-19T19:55:06.816871Z] 19:55:06 INFO - 3: std::panicking::default_hook
1819
[task 2017-05-19T19:55:06.817292Z] 19:55:06 INFO - 4: std::panicking::rust_panic_with_hook
1820
[task 2017-05-19T19:55:06.817387Z] 19:55:06 INFO - 5: std::panicking::begin_panic
1821
[task 2017-05-19T19:55:06.817489Z] 19:55:06 INFO - 6: style::sequential::traverse_dom
1822
[task 2017-05-19T19:55:06.817877Z] 19:55:06 INFO - 7: geckoservo::glue::traverse_subtree
1823
[task 2017-05-19T19:55:06.817965Z] 19:55:06 INFO - 8: Servo_TraverseSubtree
1824
[task 2017-05-19T19:55:06.818103Z] 19:55:06 INFO - 9: _ZN7mozilla13ServoStyleSet25PrepareAndTraverseSubtreeEPKNS_3dom7ElementENS_21TraversalRootBehaviorENS_24TraversalRestyleBehaviorE
1825
Assignee | ||
Comment 5•8 years ago
|
||
Yeah that's just because I had some other stale stuff in the branch. The perf results should still be valid, re-pushing the debug tests here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7c96701bfa6f4c471534dac3fdaa0fda697586d
Assignee | ||
Comment 6•8 years ago
|
||
Attachment #8869590 -
Flags: review?(emilio+bugs)
Assignee | ||
Comment 7•8 years ago
|
||
Here's another Talos try push based on a more recent revision: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fb98394e86539475aef2ed400965fd4b2f525088
Results look like quite an improvement. We should get this into the tree.
Comment 8•8 years ago
|
||
Comment on attachment 8869590 [details] [diff] [review]
Fix the parallel traversal. v1
Review of attachment 8869590 [details] [diff] [review]:
-----------------------------------------------------------------
::: servo/components/style/parallel.rs
@@ +95,5 @@
>
> queue.install(|| {
> rayon::scope(|scope| {
> + traverse_nodes(nodes,
> + /* is_tail_call = */ true,
Can we convert is_tail_call into an enum? I don't think it's too much of a hassle, and it's worth for readability I believe, specially if you document the variants well :P
@@ +233,5 @@
> + top_down_dom(&nodes, root, traversal_data, scope, traversal, tls);
> + });
> + }
> + } else {
> + let mut first_chunk: Option<Box<[SendNode<E::ConcreteNode>]>> = None;
Why using a boxed slice here instead of an Option<NodeList<E>>? If there's a good reason, leave a comment explaining why.
@@ +250,5 @@
> + }
> +
> + // If this is a tail call, bypass rayon for the first chunk and invoke top_down_dom
> + // directly.
> + debug_assert!(first_chunk.is_some() == is_tail_call);
nit: debug_assert_eq!.
Attachment #8869590 -
Flags: review?(emilio+bugs) → review+
Assignee | ||
Comment 9•8 years ago
|
||
Attachment #8869612 -
Flags: review+
Assignee | ||
Updated•8 years ago
|
Attachment #8869590 -
Attachment is obsolete: true
Assignee | ||
Comment 10•8 years ago
|
||
For the crashtest failure: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4a94f8d02eeee9e6a5fab51d0cbd70c064a4641
Assignee | ||
Comment 11•8 years ago
|
||
Assuming that the crashtest failure is fixed, the only remaining issue is the font crash, presumably because more stuff is executing in parallel now and we're uncovering latent bugs: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7c96701bfa6f4c471534dac3fdaa0fda697586d
I'm basically out of time for the week. Manish, can you take that? It would be good to get this patch into the tree as soon as possible.
Flags: needinfo?(manishearth)
Comment 12•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1eea5fdc5802aa521d8261593e8362c61817328
The fix is just
diff --git a/layout/style/ServoBindings.cpp b/layout/style/ServoBindings.cpp
index 4f7e1ed..da9774a 100644
--- a/layout/style/ServoBindings.cpp
+++ b/layout/style/ServoBindings.cpp
@@ -1916,6 +1916,7 @@ void
Gecko_nsStyleFont_FixupNoneGeneric(nsStyleFont* aFont,
RawGeckoPresContextBorrowed aPresContext)
{
+ MutexAutoLock lock(*sServoFontMetricsLock);
const nsFont* defaultVariableFont =
aPresContext->GetDefaultFont(kPresContext_DefaultVariableFont_ID,
aFont->mLanguage);
However, before landing, I would like to get numbers on whether or not this mutex affected perf (so that we can verify that we return to the original situation when we add a finer-grained rwlock).
(Or, I can land this first, provided this analysis is done at some point)
Flags: needinfo?(manishearth)
Comment 13•8 years ago
|
||
Try is good. Should we land this?
Comment 14•8 years ago
|
||
Attachment #8869670 -
Flags: review?(emilio+bugs)
Assignee | ||
Comment 15•8 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #12)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=e1eea5fdc5802aa521d8261593e8362c61817328
>
> The fix is just
>
> diff --git a/layout/style/ServoBindings.cpp b/layout/style/ServoBindings.cpp
> index 4f7e1ed..da9774a 100644
> --- a/layout/style/ServoBindings.cpp
> +++ b/layout/style/ServoBindings.cpp
> @@ -1916,6 +1916,7 @@ void
> Gecko_nsStyleFont_FixupNoneGeneric(nsStyleFont* aFont,
> RawGeckoPresContextBorrowed aPresContext)
> {
> + MutexAutoLock lock(*sServoFontMetricsLock);
> const nsFont* defaultVariableFont =
> aPresContext->GetDefaultFont(kPresContext_DefaultVariableFont_ID,
> aFont->mLanguage);
>
>
> However, before landing, I would like to get numbers on whether or not this
> mutex affected perf (so that we can verify that we return to the original
> situation when we add a finer-grained rwlock).
I just measured with a microbenchmark which I created by applying the patch at [1] to bloom-basic.html in [2].
The results weren't pretty. Before this patch, each worker thread spends less than 1ms in Gecko_nsStyleFont_FixupNoneGeneric. With this patch, each thread spends about 200ms thrashing for the lock, ruining overall performance.
So I'm not really comfortable landing this. I'll file a dependent bug.
[1] https://gist.github.com/bholley/e839eecacc8c56f80fec47b4356e212e
[2] https://github.com/heycam/style-perf-tests
https://perfht.ml/2qBgzqr
Assignee | ||
Updated•8 years ago
|
Attachment #8869670 -
Flags: review?(emilio+bugs) → review-
Comment 16•8 years ago
|
||
Okay, we can then try the existing fix on the bug but that is not thread safe in the presence of XML:lang and also is affected by a bug in the gfx caching code.
I'll push up a try push.
Comment 17•8 years ago
|
||
Trying the WIP fix from bug 1362599
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6bdc63353e62e176a9b10d412b34e7d84bd9e27
This is 100% thread safe, but will panic if you use xml:lang to use new languages. Don't do that (our tests don't, so far). But that makes this a temporary fix. Feel free to land, however, it's already r=heycam.
It contains hacks because the caching code is broken. It should not affect performance aside from a miniscule blip right before the first traversal.
The right solution here is to use a read-write lock. I'm not yet sure if the RW lock should be used in conjunction with the precaching, or if it should be used in isolation. It can be used in isolation, but precaching prevents a lot of contention (OTOH it adds to the pretraverse cost, though it's mostly code that would have to run anyway at some point during traversal).
Either way, those patches can be landed. They may make one or two test fail because of the gfx bug mentioned in bug 1362599 comment 12 (I have inserted a hack fix to prevent that, but it may not always work)
Comment 18•8 years ago
|
||
Ah, the naming scheme for the unicode atom is different, try https://treeherder.mozilla.org/#/jobs?repo=try&revision=68d0ccaedcbaf8f997d2cd3d837fc0daa2cd3e76
Comment 19•8 years ago
|
||
Okay, that passes. Feel free to land!
(do a quick measurement first though)
Assignee | ||
Comment 20•8 years ago
|
||
Can you please post the patch(es) you intend to be landed in bug 1366468? It looks like there were fixups in comment 18, and I'm not comfortable just scraping the try push and pushing it.
Flags: needinfo?(manishearth)
Comment 21•8 years ago
|
||
They've been posted on bug 1362599.
I'll open a new bug for the missing XML:lang stuff later (not on laptop right now)
Flags: needinfo?(manishearth)
Updated•8 years ago
|
Attachment #8869670 -
Attachment is obsolete: true
Assignee | ||
Comment 22•8 years ago
|
||
Ok, just tested them, and the overhead is gone. Thanks!
So all I need to do is hit 'land commits' on bug 1366347?
Flags: needinfo?(manishearth)
Assignee | ||
Comment 23•8 years ago
|
||
I guess that won't work, because there's a whitespace change to servo code. So I'll presumably need to strip that. Anything else?
Comment 24•8 years ago
|
||
Yeah, sorry, that snuck in there when I was pushing. Land those with the servo part stripped, yes.
Flags: needinfo?(manishearth)
Assignee | ||
Comment 25•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 26•8 years ago
|
||
I've started using a revision that including this change. It's incredibly faster than before! Amazing!
You need to log in
before you can comment on or make changes to this bug.
Description
•