The default bug view has changed. See this FAQ.

stylo: RwLock overhead dominates selector matching

RESOLVED FIXED in Firefox 55

Status

()

Core
CSS Parsing and Computation
P1
normal
RESOLVED FIXED
2 months ago
17 hours ago

People

(Reporter: bholley, Assigned: SimonSapin)

Tracking

(Blocks: 2 bugs)

unspecified
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)

Details

The two RwLock acquires at [1] each take as much time as all selector matching. Combined, they make the computation of applicable declarations 3x slower. This is the major thing that's killing us in bug 1331848.

This was added in [2] to support CSSOM, and then we doubled the overhead in [3] when we made the locking more granular to support the rule tree.

I've verified that each operation only costs us ~6ns on my machine, which is the expected overhead of an atomic operation. But this loop is hot, and so that's still way to much.

We're going to need to change this somehow. 

[1] https://github.com/servo/servo/blob/a80725c91b972577fcb888a75ab225305f3098e8/components/style/stylist.rs#L996
[2] https://github.com/servo/servo/pull/13459
[3] https://github.com/servo/servo/pull/13202
Some strawman ideas:
* Can we rearrange the hierarchy to avoid locking in the tight loop? Does the locking really need to be this granular?
* Do we actually have concurrent access here? If so, from what? If not, can we unsafely bypass the lock?
Flags: needinfo?(simon.sapin)
CC Jack, since this ruins style system perf, which is one of the metrics that we brag about in Servo PR.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #1)
> * Do we actually have concurrent access here? If so, from what? If not, can
> we unsafely bypass the lock?

I guess it is possible that we have concurrent read access here, but I don't think there could be concurrent write access, since CSSOM mutation should happen only when we run script on the main thread. But as far as I know, we block the main thread when we do styling.
The only info we actually use here is whether or not the rule contains important rules / unimportant rules.

I wonder if we can cache this info on Rule on construction? It seems like we recreate the Rule map on each update(), so this should be okay?
(Assignee)

Comment 5

2 months ago
I think it’s not too hard to move things around to have one lock instead of two. But to get to zero we’d have to rethink our thread-safety strategy for style rules. And "just do unsafe unsynchronized access because 'we know' script is not running at the same time" does not inspire me much confidence.

Manish, IIRC Stylist::update is not called for modifications within CSSStyleDeclaration block.
Flags: needinfo?(simon.sapin)
(In reply to Simon Sapin (:SimonSapin) from comment #5)
> I think it’s not too hard to move things around to have one lock instead of
> two. But to get to zero we’d have to rethink our thread-safety strategy for
> style rules.

Unless we want to spend half of our time in the style system acquiring that lock, I think we need to get to zero.
(Assignee)

Comment 7

2 months ago
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #6)
> Unless we want to spend half of our time in the style system acquiring that
> lock, I think we need to get to zero.

Right. I’m just saying that if we’re gonna do unsafe unsynchronized access, I’d like at least a more detailed analysis of the code (if not something the compiler or a lint can help us enforce) to make sure there’s no data race, not just hand-waving "no script thread" based on what is theoretically supposed to happen.
I did have a plan for zero-cost type-system-based threadsafe atomicless rwlocks, where reads from off main thread are regular loads and R/W on main thread are the same cost as a nonatomic refcell.

The idea is that you have two kinds of zero-sized "context marker" (main thread and parallel traversal). It is not Send or Sync. It cannot be safely constructed. You construct one at the start of parallel traversal and pass it down through whatever methods. The rwlock has an `off_main_thread_read()` function that gives an &T (but needs a off-main-thread marker to work). On the main thread, we have one hanging off of something or the other, and pass it down through the methods and use `main_thread_write()/main_thread_read()` (which need main thread markers). The only trickiness here is that there may be readonly methods called in both situations (which will be revealed by the typechecker as you try to refactor these changes, which is nice). You can make them generic over the marker -- and have them use the appropriate read function depending on the marker (one with overhead, one without), which is code bloaty but otherwise ok. To avoid the code bloat you could just make reads not need markers and always use the refcell -- there are some things wrt non-atomic writes that are iffy here, but I think that can be worked out.


Alternatively, a panicky RwLock (AtomicRefCell) might be all we need to make this fast. Are we sure that it's the atomics themselves (rather than all the other code in RwLock::read()) causing the problem? These RwLocks are ones that should never have any R-W contention, in the absence of bugs.
Would be acceptable to switch from RwLock to AtomicRefCell on debug builds, and to a dumb wrapper on release?
(Assignee)

Comment 10

2 months ago
Alright, the problem is not locking at all, it’s locking in the inner loop. Maybe we can have rules share a per-document lock. Untested idea:

* Replace Arc<RwLock<T>> with struct LockedArc<T> { data: Arc<T>, shared_lock: Arc<RwLock<()>> }
* Have opaque wrapper types for RwLockReadGuard and RwLockWriteGuard
* Have LockedArc::as_mut(&self, &mut SharedWriteGuard) -> Result<&T, …> that checks that the guard holds the correct lock by comparing pointer addresses of something.
(Assignee)

Comment 11

2 months ago
(In reply to Manish Goregaokar [:manishearth] from comment #8)
> The idea is that you have two kinds of zero-sized "context marker" (main
> thread and parallel traversal). It is not Send or Sync. It cannot be safely
> constructed. You construct one at the start of parallel traversal and pass
> it down through whatever methods.

I did try something very similar in September. The first thing is that adding parameters to so many functions and methods to pass tokens around is a pain.

But more importantly I ended up in `tick_all_animations` calling code that needs shared read-only access to style data, but AFAICT there’s nothing stopping the script thread from running at the same time.

----

I pushed my old branch to https://github.com/servo/servo/compare/attic/stylerefcell

Two types of "tokens" that are unsafe to construct. They have the same relationship as `&T` / `&mut T`, std::cell::Ref / std::cell::RefMut, etc.

* Creating a SingleThreadToken asserts that no other thread is gonna access in any way the same cells as the ones accessed with that token at the same time. When it’s used we can temporarily pretend that `StyleRefCell<T>: !Sync` and have it behave like `RefCell<T>`.
* Conversely, creating a ReadOnlyToken asserts relevant cells are only accessed with ReadOnlyToken (though that can be on multiple threads). In that case StyleRefCell<T> behaves like Arc<T> and gives out &T.
(In reply to Simon Sapin (:SimonSapin) from comment #11)
> (In reply to Manish Goregaokar [:manishearth] from comment #8)
> > The idea is that you have two kinds of zero-sized "context marker" (main
> > thread and parallel traversal). It is not Send or Sync. It cannot be safely
> > constructed. You construct one at the start of parallel traversal and pass
> > it down through whatever methods.
> 
> I did try something very similar in September. The first thing is that
> adding parameters to so many functions and methods to pass tokens around is
> a pain.
> 
> But more importantly I ended up in `tick_all_animations` calling code that
> needs shared read-only access to style data, but AFAICT there’s nothing
> stopping the script thread from running at the same time.

So it's true that there's nothing stopping script to run while tick_all_animations is running. I believe the only thing that code needs is read-only access to the keyframes map that the stylist has though, so we can probably make it work?

It'd be wrong, even with synchronized mutation of keyframe rules, doing two animation ticks with different keyframes in them.

All that code needs to be rewritten sooner or later, but in the meantime keeping an immutable view of the state of the keyframes rule in the animations code instead of just the name could be ok?
> Would be acceptable to switch from RwLock to AtomicRefCell on debug builds, and to a dumb wrapper on release?

I'm not particularly fond of unsafe stuff in release builds.


> [...] that checks that the guard holds the correct lock by comparing pointer addresses of something.

I like this idea.

> I pushed my old branch to https://github.com/servo/servo/compare/attic/stylerefcell

Yeah, basically what I was talking about, with some minor differences: I would have cached the token on the window instead of unsafely constructing it in the CSSOM. I would also use Token instead of &Token since &Token is not completely zero-cost (there's an extra load involved).

I personally feel that the mental overhead of passing tokens around isn't a major one. But I understand if we don't want that.
(Assignee)

Comment 14

2 months ago
(In reply to Manish Goregaokar [:manishearth] from comment #13)
> I personally feel that the mental overhead of passing tokens around isn't a
> major one.

Yeah, I don’t mind too much once the work is done. I’m just saying it was a lot of tedious work to get to the point where I even found out about the tick_all_animations issue.
My main concern with "annotate the world with tokens" is flexibility. When we were doing all the lifetime stuff with the dom.rs traits, the practical concern of passing the lifetime to all the right places became a real drag on development. So if we do something like that, I'd hope it would be something we could stick in SharedStyleContext or whatever so that we don't have to do anything fancy.

Keeping locks and just acquiring them in smart places also sounds fine.
Assignee: nobody → simon.sapin
Priority: -- → P1
(Assignee)

Updated

4 days ago
See Also: → bug 1348587
This is fixed. Thanks Simon!
Status: NEW → RESOLVED
Last Resolved: 2 days ago
Resolution: --- → FIXED
Duplicate of this bug: 1311469

Updated

22 hours ago
Depends on: 1349149

Updated

17 hours ago
status-firefox55: --- → fixed
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.