Closed Bug 1360889 Opened 8 years ago Closed 8 years ago

stylo: Add a custom Arc to use for style structs

Categories

(Core :: CSS Parsing and Computation, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: bholley, Assigned: bholley)

References

Details

Attachments

(4 files)

As discussed in bug 1360883, there are a few disadvantages of Arc for us: * The weak refcount adds extra memory overhead (certainly on 32-bit), and we don't use it. * We do several extra atomic RMU operations to handle the weak refcount. * It doesn't have a free get_mut()-like API (i.e. one that's not RMU). * We can't use a custom allocator, which makes it inappropriate for arenas, which we may want here. I took a few minutes to prototype a forked version of Arc and it looks like it'll work fine. Need some more time to finish it up, but I should have patches here soon.
Do we really want a custom Arc? I mean, we only use make/get_mut dufing the cascade, right? Seems to me that we could do something like: ``` enum StyleBuilderStruct<'a, T> { Borrowed(&'a Arc<T>), Owned(T), } impl<'a, T> StyleBuilderStruct<'a, T> { pub fn make_mut(&mut self) -> &mut T { if let StyleBuilderStruct::Borrowed(arc) = *self { *self = StyleBuilderStruct::Owned((**arc).clone()); } match *self { StyleBuilderStruct::Owned(ref mut v) => v, StyleBuilderStruct::Borrowed(..) => debug_unreachable!(), } } pub fn build(self) -> Arc<T> { match self { StyleBuilderStruct::Borrowed(s) => s.clone(), StyleBuilderStruct::Owned(s) => Arc::new(s), } } } ``` That would avoid the initial cloning of the parent/default style structs, and make subsequent make_mut calls basically free.
But that would make StyleBuilder enormous (because it needs inline storage for 20 style structs), and trigger a bunch of extra memmoving, right? That seems really suboptimal. I don't see what's so bad about forking arc. With the weak stuff stripped out, it's not much code at all.
> that would make StyleBuilder enormous Is that bad, if it’s on the stack and never moves?
(In reply to Simon Sapin (:SimonSapin) from comment #3) > > that would make StyleBuilder enormous > > Is that bad, if it’s on the stack and never moves? It's certainly likely to cause a few more cache misses. Maybe that's ok and not as bad as I was imagining, but we can definitely do better.
Update: We decided to land emilio's approach in comment 1 over the weekend and build on that, which landed in [1]. We also landed [2] and [3] to significantly reduce allocations. I just profiled obama: * before: https://perfht.ml/2oU1fRY * after: https://perfht.ml/2oYGJzC The upshot is that we significantly reduced allocations, but significantly increased memmov traffic, which is roughly what I'd expect. The exact overhead of the cache misses is hard to measure, but to me the memmov traffic is enough justification to proceed with the custom Arc (also, it sounds like we will indeed need to do arenas). I'll try to get that finished up today. [1] https://github.com/servo/servo/pull/16663 [2] https://github.com/servo/servo/pull/16661 [3] https://github.com/servo/servo/pull/16664
I think the biggest offender there now is the font-size stuff. I think we should fix that too. Manish, you wrote in a comment that you need to cascade font-size regardless of whether it was specified to handle mathml's script min size and language changes. Both of those are super-rare, is there a reason that shouldn't be guarded by the seen bitfield? Seems quite wasteful to cascade all the time unnecessarily.
Flags: needinfo?(manishearth)
(I believe that accounts for most of the memmove traffic, given the font struct is specially large)
Let's move the font-size discussion to bug 1361126.
Flags: needinfo?(manishearth)
This is a verbatim copy of the source at [1]. It won't compile on stable rust, but I'm including it here to make the changes clear. [1] https://doc.rust-lang.org/src/alloc/arc.rs.html MozReview-Commit-ID: XEbOK4fXQX
Attachment #8863881 - Flags: review?(emilio+bugs)
We remove most of the doc comments to minimize the number of lines of forked code. MozReview-Commit-ID: LehEisKxkJW
Attachment #8863882 - Flags: review?(emilio+bugs)
MozReview-Commit-ID: flF0fv9E9M
Attachment #8863883 - Flags: review?(emilio+bugs)
MozReview-Commit-ID: 8wSsYPEmYE4
Attachment #8863884 - Flags: review?(emilio+bugs)
These patches improve cascade performance by about 25%. It appears to be mostly patch 4. StyleArc itself improves release and drop performance for arcs by a couple of nanoseconds each.
here is a profile of the cascade on the obama testcase with these patches: https://perfht.ml/2p2K9RU . It's a bit noisy (a lot of which seems to depend on malloc performance), but this profile shows us a 22ms for the cascade, which is a lot better than the 35ms we were at last week.
Blocks: 1360878
Attachment #8863881 - Flags: review?(emilio+bugs) → review+
Comment on attachment 8863882 [details] [diff] [review] Part 2 - Strip down StyleArc to what we need. v1 Review of attachment 8863882 [details] [diff] [review]: ----------------------------------------------------------------- It's worth perhaps to put it into a standalone crate? ::: servo/components/style/stylearc.rs @@ +158,5 @@ > + // Note: in stable rust we can't use intrinsics::abort. > + // Panic is good enough in practice here (it will trigger > + // an abort at least in Gecko, and this case is degenerate > + // enough that Servo shouldn't have code that triggers it). > + panic!(); If you keep the panic! there's no need for unsafe {}. We can also call at ::std::process::abort, which is both stable and safe, as you prefer. @@ -583,5 @@ > - // Use Acquire to ensure that we see any writes to `weak` that happen > - // before release writes (i.e., decrements) to `strong`. Since we hold a > - // weak count, there's no chance the ArcInner itself could be > - // deallocated. > - if this.inner().strong.compare_exchange(1, 0, Acquire, Relaxed).is_err() { ugh, yeah, I see now why this would be way more slower than needed. @@ +206,5 @@ > +impl<T: ?Sized> Arc<T> { > + #[inline] > + pub fn get_mut(this: &mut Self) -> Option<&mut T> { > + // See make_mut() for documentation of memory ordering and threadsafety. > + if this.inner().count.load(Relaxed) == 1 { Can we keep this in is_unique()? Seemed a bit clear, even if it desugars to the same. @@ +327,5 @@ > } > } > > #[cfg(test)] > mod tests { Do we really want to keep the tests? We don't run them except on travis, perhaps it's a good time to run them from test-unit instead? Otherwise, perhaps move the tests to tests/unit, assuming they don't rely on internals?
Attachment #8863882 - Flags: review?(emilio+bugs) → review+
Attachment #8863883 - Flags: review?(emilio+bugs) → review+
Comment on attachment 8863884 [details] [diff] [review] Part 4 - Make StyleBuilder more efficient using stylearc. v1 Review of attachment 8863884 [details] [diff] [review]: ----------------------------------------------------------------- I told you the StyleBuilder refactoring was going to be worth it regardless of this, hope this patch wasn't too unpleasant to write ;-) ::: servo/components/style/properties/properties.mako.rs @@ +2024,5 @@ > pub enum StyleStructRef<'a, T: 'a> { > /// A borrowed struct from the parent, for example, for inheriting style. > Borrowed(&'a Arc<T>), > /// An owned struct, that we've already mutated. > + Owned(Arc<T>), It'd be nice to compare this with and without StyleArc (that is, reverting part 3), to see what's the impact of the weak count on perf.
Attachment #8863884 - Flags: review?(emilio+bugs) → review+
(In reply to Emilio Cobos Álvarez [:emilio] from comment #16) > Comment on attachment 8863882 [details] [diff] [review] > Part 2 - Strip down StyleArc to what we need. v1 > > Review of attachment 8863882 [details] [diff] [review]: > ----------------------------------------------------------------- > > It's worth perhaps to put it into a standalone crate? We can eventually, but given that we might do arena stuff, I want to leave it all in the style creates for simplicity. Once things settle down, we can break out what makes sense. > > ::: servo/components/style/stylearc.rs > @@ +158,5 @@ > > + // Note: in stable rust we can't use intrinsics::abort. > > + // Panic is good enough in practice here (it will trigger > > + // an abort at least in Gecko, and this case is degenerate > > + // enough that Servo shouldn't have code that triggers it). > > + panic!(); > > If you keep the panic! there's no need for unsafe {}. We can also call at > ::std::process::abort, which is both stable and safe, as you prefer. It's only stable as of 1.17, which we don't require yet. I'll add a comment. > > @@ -583,5 @@ > > - // Use Acquire to ensure that we see any writes to `weak` that happen > > - // before release writes (i.e., decrements) to `strong`. Since we hold a > > - // weak count, there's no chance the ArcInner itself could be > > - // deallocated. > > - if this.inner().strong.compare_exchange(1, 0, Acquire, Relaxed).is_err() { > > ugh, yeah, I see now why this would be way more slower than needed. > > @@ +206,5 @@ > > +impl<T: ?Sized> Arc<T> { > > + #[inline] > > + pub fn get_mut(this: &mut Self) -> Option<&mut T> { > > + // See make_mut() for documentation of memory ordering and threadsafety. > > + if this.inner().count.load(Relaxed) == 1 { > > Can we keep this in is_unique()? Seemed a bit clear, even if it desugars to > the same. Done. > > @@ +327,5 @@ > > } > > } > > > > #[cfg(test)] > > mod tests { > > Do we really want to keep the tests? We don't run them except on travis, > perhaps it's a good time to run them from test-unit instead? Yeah, good catch. I'll just remove them. > > Otherwise, perhaps move the tests to tests/unit, assuming they don't rely on > internals?
(In reply to Emilio Cobos Álvarez [:emilio] from comment #17) > Comment on attachment 8863884 [details] [diff] [review] > Part 4 - Make StyleBuilder more efficient using stylearc. v1 > > Review of attachment 8863884 [details] [diff] [review]: > ----------------------------------------------------------------- > > I told you the StyleBuilder refactoring was going to be worth it regardless > of this, hope this patch wasn't too unpleasant to write ;-) Yeah totally. :-) > > ::: servo/components/style/properties/properties.mako.rs > @@ +2024,5 @@ > > pub enum StyleStructRef<'a, T: 'a> { > > /// A borrowed struct from the parent, for example, for inheriting style. > > Borrowed(&'a Arc<T>), > > /// An owned struct, that we've already mutated. > > + Owned(Arc<T>), > > It'd be nice to compare this with and without StyleArc (that is, reverting > part 3), to see what's the impact of the weak count on perf. I did that. The numbers were definitely worse than with part 4, but it was a bit noisy and hard to tell how much they improved over std::sync::arc. Given my microbenchmarks (a few ns improvement on release and drop), my guess is that it isn't a huge difference. A more interesting question is how much the performance would regress if we used the CAS get_mut() on std::sync::arc rather than the free get_mut() on stylearc::Arc. If we end up not doing arena allocation, and we at some point want to consider switching back to std::Arc, we could measure that.
Summary: stylo: Add a StyleArc to use for style structs → stylo: Add a custom Arc to use for style structs
I think this can be marked as fixed now, please reopen if it's not the case.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: