Closed Bug 1398119 Opened 7 years ago Closed 6 years ago

NoteDirtyElement is either called too often or is too slow or both

Categories

(Core :: DOM: Core & HTML, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla59
Performance Impact high
Tracking Status
firefox59 --- fixed

People

(Reporter: smaug, Assigned: bholley)

References

Details

(Keywords: perf)

Attachments

(4 files, 3 obsolete files)

NoteDirtyElement is 4.6% of innerHTML set time in a profile I'm looking at.
innerHTML speed is rather important for Speedometer.
Depends on: 1383332
Whiteboard: [qf]
So 3/4 samples are in MaybeConstructLazily, so the work involved here is basically propagating up the lazy frame construction bits, which Gecko does too. Stylo's code here is slightly more complex than Gecko's, but would be faster in a lot of cases (the first time we set the bit we don't need to walk up the tree, whereas Gecko always walks up to the root).

How does the time under MaybeConstructLazily compare between stylo and non-stylo?
Flags: needinfo?(bugs)
I took a look, and we're mostly taking the fast paths there, so unless the `parent` checks are hurting us (in which case we could at least move the aElement == existingRoot up), I'm not sure there's much we can do here.

In my profile NoteDirtyElement was also incorrectly blamed with some refcount traffic, that really cam from BindToTree / UnbindFromTree, for some reason.

The other case where we may pay more than Gecko is in propagating the HAS_DIRTY_DESCENDANTS bits, which we could try to coalesce in MaybeConstructLazily.

I'll post a patch with the first idea, please take a profile if you wish, because mine was kinda busted because of the refcount traffic that was incorrectly reported.

I can also try to poke at (2), I guess...
stylo vs non-stylo isn't too interesting question here. We are spending too much time in a _very_ hot code path (and doesn't matter whether it is stylo or non-stylo, but in this case it happens to be stylo).
nsIMutationObserver implementations should be as fast as possible.
Can we do something to optimize the method, or call it less often, or call it asynchronously?
Flags: needinfo?(bugs)
Realistically, assuming we aren't taking a regression with the switch to stylo, I probably don't have time to think about this until after the 57 merge.
Oh, sure, I don't think I indicated anywhere this _must_ go into FF57.
It is just that we have a slow method in a very hot code path.

I may take a look at this.
Let me profile without stylo
The stuff under nsCSSFrameConstructor::ContentAppended seems to take more time when Stylo is enabled, 
but doesn't affect bug 1347525 too badly.
Note, this seems to be a case where Gecko profiler may not give low-level enough information, so some native profiler may be better.
Whiteboard: [qf] → [qf:p2]
Priority: -- → P2
Keywords: perf
Now that 57 is shipped, let's re-evaluate if we can do work here. I'm marking this qf:p1 but am willing to hear objections :)
Whiteboard: [qf:p2] → [qf:p1]
Whiteboard: [qf:p1] → [qf:i60][qf:p1]
Whiteboard: [qf:i60][qf:p1] → [qf:f60][qf:p1]
smaug said he'd profile this again to see where things are these days.
Flags: needinfo?(bugs)
Ok, I was reprofiling this using Nightly, and I can't see any high numbers for NoteDirtyElement anymore.
Marking this WFM. Please reopen if needed.
(FWIW, I'm getting quite a bit faster innerHTML numbers for Nightly than for Chrome, on Linux.)
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(bugs)
Resolution: --- → WORKSFORME
Reopening. Using a different testcase the method is there.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
https://perfht.ml/2D1L6os might be a bit better link
I can have a look at this testcase.
Assignee: nobody → bobbyholley
MozReview-Commit-ID: 9LBNm2h5Ct3
Attachment #8941992 - Flags: review?(emilio)
This reduces time spent in NoteDirtyElement by about 15% in the testcase. Even
though the subsequent patch will cause us to call Servo_Element_IsDisplayNone less
for this particular testcase, it could still be hot on other testcases, and so
it's worth optimizing.

MozReview-Commit-ID: 3F3Zfp48dDW
Attachment #8941993 - Flags: review?(emilio)
This eliminates ~90% of the time spent in NoteDirtyElement on this testcase.

MozReview-Commit-ID: Lm5hf7QRiOK
Attachment #8941994 - Flags: review?(emilio)
Attachment #8906035 - Attachment is obsolete: true
(Note that I bumped the iterations on the testcase to get better resolution of the effects of these patches)
MozReview-Commit-ID: 9LBNm2h5Ct3
Attachment #8942007 - Flags: review?(emilio)
Attachment #8941992 - Attachment is obsolete: true
Attachment #8941992 - Flags: review?(emilio)
MozReview-Commit-ID: 9LBNm2h5Ct3
Attachment #8942022 - Flags: review?(emilio)
Attachment #8942007 - Attachment is obsolete: true
Attachment #8942007 - Flags: review?(emilio)
Comment on attachment 8942022 [details] [diff] [review]
Part 1 - Add machinery to assert single-threadedness from geckolib. v3

Review of attachment 8942022 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with those allows removed.

::: servo/ports/geckolib/glue.rs
@@ +215,5 @@
>  unsafe fn dummy_url_data() -> &'static RefPtr<URLExtraData> {
>      RefPtr::from_ptr_ref(&DUMMY_URL_DATA)
>  }
>  
> +#[allow(dead_code)]

Why is this needed? This shouldn't warn even in release builds.
Attachment #8942022 - Flags: review?(emilio) → review+
Comment on attachment 8941993 [details] [diff] [review]
Part 2 - Avoid atomic overhead in Servo_Element_IsDisplayNone. v1

Review of attachment 8941993 [details] [diff] [review]:
-----------------------------------------------------------------

I'd rather not take it if it happens to not be needed with the next patch tbh. But looks good if it is indeed needed...
Attachment #8941993 - Flags: review?(emilio) → review+
Comment on attachment 8941994 [details] [diff] [review]
Part 3 - Rearrange NoteDirtyElement for faster bailouts on the same restyle root. v1

Review of attachment 8941994 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsIDocumentInlines.h
@@ +97,5 @@
> +nsIDocument::SetServoRestyleRootDirtyBits(uint32_t aDirtyBits)
> +{
> +  MOZ_ASSERT(aDirtyBits);
> +  MOZ_ASSERT((aDirtyBits & ~Element::kAllServoDescendantBits) == 0);
> +  MOZ_ASSERT((aDirtyBits & mServoRestyleRootDirtyBits) == mServoRestyleRootDirtyBits);

Let's assert mServoRestyleRoot here?
Attachment #8941994 - Flags: review?(emilio) → review+
(In reply to Emilio Cobos Álvarez [:emilio] from comment #27)
> Comment on attachment 8942022 [details] [diff] [review]
> Part 1 - Add machinery to assert single-threadedness from geckolib. v3
> 
> Review of attachment 8942022 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me with those allows removed.
> 
> ::: servo/ports/geckolib/glue.rs
> @@ +215,5 @@
> >  unsafe fn dummy_url_data() -> &'static RefPtr<URLExtraData> {
> >      RefPtr::from_ptr_ref(&DUMMY_URL_DATA)
> >  }
> >  
> > +#[allow(dead_code)]
> 
> Why is this needed? This shouldn't warn even in release builds.

It does: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd55825e8b0ab26262e2dbe59962dbb55d375caf&selectedJob=155779839

This is because is_in_servo_traversal is currently only used in a #[cfg(not(debug_asssertions))] block. I could remove it for is_main_thread, but that would be asymmetric, and also likely to bite people again in the future if code gets rearranged. The advantage of deny(dead_code) is that it makes sure that stuff gets cleaned up when it's not used, but I don't think that's a particularly high priority for these functions.

(In reply to Emilio Cobos Álvarez [:emilio] from comment #28)
> I'd rather not take it if it happens to not be needed with the next patch
> tbh. But looks good if it is indeed needed...

Per the commit message, it depends on the testcase. Part 3 further optimizes this particular testcase so that Servo_Element_IsDisplayNone will never be reached, but it will on other testcases.

In general I think it's fine to bypass locks in main-thread-only glue.rs FFI hooks because:
* The locks are designed to protect us from racing with rayon, but the assertions ensure that rayon can't be running at these callsites.
* We're one frame away from C++ code, so we don't get deep transitive benefits of type-safety. The only reason we're calling into Rust at all is that the data representation is opaque to C++, and we we wouldn't dream of paying this kind of runtime cost in C++.

That said, I also don't think we should sprinkle safety bypasses willy-nilly, even in glue.rs. So I think a good compromise is "we can bypass if we can measure a clear speedup in a profiler".

(In reply to Emilio Cobos Álvarez [:emilio] from comment #29)
> Let's assert mServoRestyleRoot here?

Good idea.

Thanks for the reviews!
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/af26f7cb29ca
Add machinery to assert single-threadedness from geckolib. r=emilio
https://hg.mozilla.org/integration/autoland/rev/dae099622091
Rearrange NoteDirtyElement for faster bailouts on the same restyle root. r=emilio
https://hg.mozilla.org/mozilla-central/rev/af26f7cb29ca
https://hg.mozilla.org/mozilla-central/rev/dae099622091
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Component: DOM → DOM: Core & HTML
Performance Impact: --- → P1
Whiteboard: [qf:f60][qf:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: