stylo: Investigate the performance impact of the CSSOM RwLocks

RESOLVED DUPLICATE of bug 1335941

Status

()

Core
CSS Parsing and Computation
P4
normal
RESOLVED DUPLICATE of bug 1335941
a year ago
7 months ago

People

(Reporter: Bobby Holley (parental leave - send mail for anything urgent), Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Simon has been adding a bunch of parking_lot::RwLocks around style data structures to enable more mutations via CSSOM.

parking_lot::RwLock borrows are cheap, but not free. In my testing, a mutable borrow/release pair is about 10ns, and an immutable one is 20. For comparison, an Arc borrow/release pair is about 13ns.

So we should investigate how many of these borrows actually happen in our hot paths, and sum up whether they're significant or not.
IIRC, our original plan was just to use borrow_for_layout on DOMRefCell, which would basically assert that the data structures weren't being mutated. IIRC Simon later determined that there was a place where we actually needed concurrent modification, but it may have been Servo-only (i.e. not affecting Stylo).

Simon, can you clarify whether we might be able to compile these RwLocks out in Stylo release builds?
Flags: needinfo?(simon.sapin)
The changes I’ve been making were based on discussion in bug 1305141 and assuming we would use always RwLock for this, not compile them out.

One alternative is DOMRefCell<T>, but it inherently introduces a risk of data race. In my opinion this risk is too high to make DOMRefCell viable for stylesheets.

It has three methods relevant here:

* borrow() and borrow_mut() are the same as RefCell<T>. They give access to &T and &mut T respectively, and track at runtime a flag + counter to make sure any mutable borrow is the only borrow at any point in time. Access to the flag + counter is *unsynchronized*, making these methods not thread-safe. (They’re should only be used from the script/main thread.)

* borrow_for_layout() is designed to be used from multiple threads at the same time. It **does not check anything** and just returns &T, not touching the borrow flag/counter. This method is `unsafe`, it is up to the user to make sure that no thread is using borrow_mut() at the same time.

Maintaining that invariant turns out not to be easy. Even if "we know" that restyling only happens while the script (main) thread is paused, there is no easy to ensure that some of the same code is not also used at other times or from other threads.
Flags: needinfo?(simon.sapin)
(In reply to Simon Sapin (:SimonSapin) from comment #2)
> The changes I’ve been making were based on discussion in bug 1305141 and
> assuming we would use always RwLock for this, not compile them out.

Right, I agree that was the plan. I just want to understand what our options are if it turns out to be expensive.
 
> One alternative is DOMRefCell<T>, but it inherently introduces a risk of
> data race. In my opinion this risk is too high to make DOMRefCell viable for
> stylesheets.

I'm not proposing switching to DOMRefCell here. My question was simply whether, in Stylo builds, we actually depend on the blocking behavior of the RwLock, or whether it exists purely to convert runtime data races into safe panics. If the latter, then we have the _option_ to compile them out in release stylo builds if we think our debug test coverage is good enough (CI debug coverage for Gecko is significantly more thorough than Servo).

That's obviously not a great option, since we lose the release-mode safety guarantee. The ideal cases is that the overhead is negligible, and next-best is that there's some other way to optimize it. But if we could make restyle 50% faster and the only way to do it was compiling out the RwLocks, I would certainly consider it.

Answering my original question from comment 1, it sounds like we may or may not depend the blocking behavior of RwLock depending on what animation architecture we settle on with Stylo.
The test coverage that would be relevant to data races is not the percentage of lines of code executed, but rather the number of combinations of thread scheduling. It seems to me very difficult to get good test coverage for that. One way to approach it might might be to run every test many times under rr’s chaos mode, but I imagine that multiplying CI cycle times by a large factor is not acceptable.

I would not be confident in disabling (panic-if-already-locked) locks in release mode.
Priority: -- → P4
This should be fixed by bug 1335941.
Status: NEW → RESOLVED
Last Resolved: 7 months ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1335941
You need to log in before you can comment on or make changes to this bug.