NoteDirtyElement is either called too often or is too slow or both

RESOLVED FIXED in Firefox 59

Status

()

P2
normal
RESOLVED FIXED
a year ago
9 months ago

People

(Reporter: smaug, Assigned: bholley)

Tracking

({perf})

unspecified
mozilla59
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox59 fixed)

Details

(Whiteboard: [qf:f60][qf:p1])

Attachments

(4 attachments, 3 obsolete attachments)

(Reporter)

Description

a year ago
NoteDirtyElement is 4.6% of innerHTML set time in a profile I'm looking at.
(Reporter)

Comment 2

a year ago
innerHTML speed is rather important for Speedometer.
Depends on: 1383332
Whiteboard: [qf]
(Assignee)

Comment 3

a year ago
So 3/4 samples are in MaybeConstructLazily, so the work involved here is basically propagating up the lazy frame construction bits, which Gecko does too. Stylo's code here is slightly more complex than Gecko's, but would be faster in a lot of cases (the first time we set the bit we don't need to walk up the tree, whereas Gecko always walks up to the root).

How does the time under MaybeConstructLazily compare between stylo and non-stylo?
(Assignee)

Updated

a year ago
Flags: needinfo?(bugs)
I took a look, and we're mostly taking the fast paths there, so unless the `parent` checks are hurting us (in which case we could at least move the aElement == existingRoot up), I'm not sure there's much we can do here.

In my profile NoteDirtyElement was also incorrectly blamed with some refcount traffic, that really cam from BindToTree / UnbindFromTree, for some reason.

The other case where we may pay more than Gecko is in propagating the HAS_DIRTY_DESCENDANTS bits, which we could try to coalesce in MaybeConstructLazily.

I'll post a patch with the first idea, please take a profile if you wish, because mine was kinda busted because of the refcount traffic that was incorrectly reported.

I can also try to poke at (2), I guess...
Comment hidden (mozreview-request)
(Reporter)

Comment 6

a year ago
stylo vs non-stylo isn't too interesting question here. We are spending too much time in a _very_ hot code path (and doesn't matter whether it is stylo or non-stylo, but in this case it happens to be stylo).
nsIMutationObserver implementations should be as fast as possible.
Can we do something to optimize the method, or call it less often, or call it asynchronously?
Flags: needinfo?(bugs)
(Assignee)

Comment 7

a year ago
Realistically, assuming we aren't taking a regression with the switch to stylo, I probably don't have time to think about this until after the 57 merge.
(Reporter)

Comment 8

a year ago
Oh, sure, I don't think I indicated anywhere this _must_ go into FF57.
It is just that we have a slow method in a very hot code path.

I may take a look at this.
(Reporter)

Comment 9

a year ago
Let me profile without stylo
(Reporter)

Comment 10

a year ago
The stuff under nsCSSFrameConstructor::ContentAppended seems to take more time when Stylo is enabled, 
but doesn't affect bug 1347525 too badly.
Note, this seems to be a case where Gecko profiler may not give low-level enough information, so some native profiler may be better.

Updated

a year ago
Whiteboard: [qf] → [qf:p2]
Priority: -- → P2
Now that 57 is shipped, let's re-evaluate if we can do work here. I'm marking this qf:p1 but am willing to hear objections :)
Whiteboard: [qf:p2] → [qf:p1]

Updated

11 months ago
Whiteboard: [qf:p1] → [qf:i60][qf:p1]

Updated

10 months ago
Whiteboard: [qf:i60][qf:p1] → [qf:f60][qf:p1]
smaug said he'd profile this again to see where things are these days.
Flags: needinfo?(bugs)
(Reporter)

Comment 13

10 months ago
Ok, I was reprofiling this using Nightly, and I can't see any high numbers for NoteDirtyElement anymore.
Marking this WFM. Please reopen if needed.
(FWIW, I'm getting quite a bit faster innerHTML numbers for Nightly than for Chrome, on Linux.)
Status: NEW → RESOLVED
Last Resolved: 10 months ago
Flags: needinfo?(bugs)
Resolution: --- → WORKSFORME
(Reporter)

Comment 14

9 months ago
Reopening. Using a different testcase the method is there.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
(Reporter)

Comment 16

9 months ago
https://perfht.ml/2D1L6os might be a bit better link
(Assignee)

Comment 17

9 months ago
I can have a look at this testcase.
Assignee: nobody → bobbyholley
(Assignee)

Comment 19

9 months ago
Created attachment 8941992 [details] [diff] [review]
Part 1 - Add machinery to assert single-threadedness from geckolib. v1

MozReview-Commit-ID: 9LBNm2h5Ct3
Attachment #8941992 - Flags: review?(emilio)
(Assignee)

Comment 20

9 months ago
Created attachment 8941993 [details] [diff] [review]
Part 2 - Avoid atomic overhead in Servo_Element_IsDisplayNone. v1

This reduces time spent in NoteDirtyElement by about 15% in the testcase. Even
though the subsequent patch will cause us to call Servo_Element_IsDisplayNone less
for this particular testcase, it could still be hot on other testcases, and so
it's worth optimizing.

MozReview-Commit-ID: 3F3Zfp48dDW
Attachment #8941993 - Flags: review?(emilio)
(Assignee)

Comment 21

9 months ago
Created attachment 8941994 [details] [diff] [review]
Part 3 - Rearrange NoteDirtyElement for faster bailouts on the same restyle root. v1

This eliminates ~90% of the time spent in NoteDirtyElement on this testcase.

MozReview-Commit-ID: Lm5hf7QRiOK
Attachment #8941994 - Flags: review?(emilio)
(Assignee)

Updated

9 months ago
Attachment #8906035 - Attachment is obsolete: true
(Assignee)

Comment 22

9 months ago
(Note that I bumped the iterations on the testcase to get better resolution of the effects of these patches)
(Assignee)

Comment 23

9 months ago
Created attachment 8942007 [details] [diff] [review]
Part 1 - Add machinery to assert single-threadedness from geckolib. v2

MozReview-Commit-ID: 9LBNm2h5Ct3
Attachment #8942007 - Flags: review?(emilio)
(Assignee)

Updated

9 months ago
Attachment #8941992 - Attachment is obsolete: true
Attachment #8941992 - Flags: review?(emilio)
(Assignee)

Comment 26

9 months ago
Created attachment 8942022 [details] [diff] [review]
Part 1 - Add machinery to assert single-threadedness from geckolib. v3

MozReview-Commit-ID: 9LBNm2h5Ct3
Attachment #8942022 - Flags: review?(emilio)
(Assignee)

Updated

9 months ago
Attachment #8942007 - Attachment is obsolete: true
Attachment #8942007 - Flags: review?(emilio)
Comment on attachment 8942022 [details] [diff] [review]
Part 1 - Add machinery to assert single-threadedness from geckolib. v3

Review of attachment 8942022 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with those allows removed.

::: servo/ports/geckolib/glue.rs
@@ +215,5 @@
>  unsafe fn dummy_url_data() -> &'static RefPtr<URLExtraData> {
>      RefPtr::from_ptr_ref(&DUMMY_URL_DATA)
>  }
>  
> +#[allow(dead_code)]

Why is this needed? This shouldn't warn even in release builds.
Attachment #8942022 - Flags: review?(emilio) → review+
Comment on attachment 8941993 [details] [diff] [review]
Part 2 - Avoid atomic overhead in Servo_Element_IsDisplayNone. v1

Review of attachment 8941993 [details] [diff] [review]:
-----------------------------------------------------------------

I'd rather not take it if it happens to not be needed with the next patch tbh. But looks good if it is indeed needed...
Attachment #8941993 - Flags: review?(emilio) → review+
Comment on attachment 8941994 [details] [diff] [review]
Part 3 - Rearrange NoteDirtyElement for faster bailouts on the same restyle root. v1

Review of attachment 8941994 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsIDocumentInlines.h
@@ +97,5 @@
> +nsIDocument::SetServoRestyleRootDirtyBits(uint32_t aDirtyBits)
> +{
> +  MOZ_ASSERT(aDirtyBits);
> +  MOZ_ASSERT((aDirtyBits & ~Element::kAllServoDescendantBits) == 0);
> +  MOZ_ASSERT((aDirtyBits & mServoRestyleRootDirtyBits) == mServoRestyleRootDirtyBits);

Let's assert mServoRestyleRoot here?
Attachment #8941994 - Flags: review?(emilio) → review+
(Assignee)

Comment 30

9 months ago
(In reply to Emilio Cobos Álvarez [:emilio] from comment #27)
> Comment on attachment 8942022 [details] [diff] [review]
> Part 1 - Add machinery to assert single-threadedness from geckolib. v3
> 
> Review of attachment 8942022 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me with those allows removed.
> 
> ::: servo/ports/geckolib/glue.rs
> @@ +215,5 @@
> >  unsafe fn dummy_url_data() -> &'static RefPtr<URLExtraData> {
> >      RefPtr::from_ptr_ref(&DUMMY_URL_DATA)
> >  }
> >  
> > +#[allow(dead_code)]
> 
> Why is this needed? This shouldn't warn even in release builds.

It does: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd55825e8b0ab26262e2dbe59962dbb55d375caf&selectedJob=155779839

This is because is_in_servo_traversal is currently only used in a #[cfg(not(debug_asssertions))] block. I could remove it for is_main_thread, but that would be asymmetric, and also likely to bite people again in the future if code gets rearranged. The advantage of deny(dead_code) is that it makes sure that stuff gets cleaned up when it's not used, but I don't think that's a particularly high priority for these functions.

(In reply to Emilio Cobos Álvarez [:emilio] from comment #28)
> I'd rather not take it if it happens to not be needed with the next patch
> tbh. But looks good if it is indeed needed...

Per the commit message, it depends on the testcase. Part 3 further optimizes this particular testcase so that Servo_Element_IsDisplayNone will never be reached, but it will on other testcases.

In general I think it's fine to bypass locks in main-thread-only glue.rs FFI hooks because:
* The locks are designed to protect us from racing with rayon, but the assertions ensure that rayon can't be running at these callsites.
* We're one frame away from C++ code, so we don't get deep transitive benefits of type-safety. The only reason we're calling into Rust at all is that the data representation is opaque to C++, and we we wouldn't dream of paying this kind of runtime cost in C++.

That said, I also don't think we should sprinkle safety bypasses willy-nilly, even in glue.rs. So I think a good compromise is "we can bypass if we can measure a clear speedup in a profiler".

(In reply to Emilio Cobos Álvarez [:emilio] from comment #29)
> Let's assert mServoRestyleRoot here?

Good idea.

Thanks for the reviews!

Comment 32

9 months ago
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/af26f7cb29ca
Add machinery to assert single-threadedness from geckolib. r=emilio
https://hg.mozilla.org/integration/autoland/rev/dae099622091
Rearrange NoteDirtyElement for faster bailouts on the same restyle root. r=emilio

Comment 33

9 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/af26f7cb29ca
https://hg.mozilla.org/mozilla-central/rev/dae099622091
Status: REOPENED → RESOLVED
Last Resolved: 10 months ago9 months ago
status-firefox59: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
You need to log in before you can comment on or make changes to this bug.