The default bug view has changed. See this FAQ.

stylo: handle URLs more efficiently

NEW
Assigned to

Status

()

Core
CSS Parsing and Computation
P1
normal
8 days ago
3 hours ago

People

(Reporter: bholley, Assigned: bholley)

Tracking

(Blocks: 3 bugs)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 2 obsolete attachments)

In the current setup, cssparser parses urls using rust-url. Then, during the cascade, we extract the string of the URL value and pass it over to Gecko, which lazily resolves the URLs on the main thread.

This is terribly inefficient, not the least because rust-url is very slow. Right now, we're burning about 10% of total CSS parsing time parsing URLs, which is silly.

My memory on this is a bit fuzzy, so hopefully Cameron can remind me. Why aren't we creating nsStandardURLs from the parser?
Flags: needinfo?(cam)
Another inefficiency is that, we resolve url after servo hands url to gecko's style struct, rather than resolving them during parsing.

It is inefficient because a single url can be applied to many elements across various places of the document tree. We may be unable to share them efficiently.

(After typing this out, I actually guess it is probably not that bad, since most elements with same url value applied should usually be siblings or cousins, which tends to share the same style context? Not sure whether it is still true for Stylo, though.)

But if we can resolve urls into nsStandardURLs directly during parsing, I think we can get rid of lots of complexity we currently have for resolving urls.

I don't recall why we cannot do that from the beginning.
One thing I suppose is that not all URLs are represented with nsStandardURLs.  But anyway, the CSS parser needs to store specified URL values, which in Gecko are represented by css::URLValue objects, doesn't it?  And it's only at computed value time that we need to resolve the specified URL string into an actual URL (be it an nsStandardURL, nsSimpleURI, etc.).

Can we make Servo's SpecifiedUrl different in stylo builds, and have it just store the value from inside the url() token (along with the current extra data)?
Flags: needinfo?(cam)
Oh, hmmm, so we store URLValue object from nsCSSParser, and only change it to ImageValue during computing. So Gecko actually resolves URL during computation-time, not parsing-time.

I wonder whether we should make it resolved at parsing-time. Is there any concern to do that? It seems to me resolving url eagerly should give better memory sharing and less time spent on resolving urls.
We need to keep the potentially-relative URL, for serialization purposes.  And not all specified URLs will need to be computed into absolute URLs in the end (just due to overriding rules in the cascade).

The current stylo behaviour we have is kind of wrong -- we're using the post-resolved rust Url value to feed into URLValue, but that is really expecting to see the specified potentially-relative URL.
> In the current setup, cssparser parses urls using rust-url.

(The cssparser crate does not, it only resolves CSS backslash-escapes. The style crate does.)

As mentioned on IRC, it seems like for Stylo we can skip rust-url entirely and use instead the Gecko code that does the equivalent normalizing (percent-encoding, ".." and "." path components, etc.) a resolving a relative URL from a base URL.
> Is there any concern to do that?

One concern: in practice many (data needed!) URLs never have to get resolved at all because sites have site-wide stylesheets with only a small fraction of rules matching anything on any given page.  This is the same reason we don't convert to an ImageValue at parse-time too...
Assignee: nobody → bobbyholley
> for Stylo we can skip rust-url entirely

There’s already a `ServoUrl` type in servo/components/url that we can conditionally compile with a cargo "feature" to wrap different types on Stylo and non-Stylo.
Created attachment 8848398 [details] [diff] [review]
Part 1 - Use a wrapper class to maintain the mBinding invariant and stop resolving during the cascade. v1

MozReview-Commit-ID: ozcxOA9pt1
So, the patch in comment 8 handles the mBinding case. I've got another patch which mostly refactors things so that SpecifiedUrl just stores the string serialization (at least in stylo). Resolution gets done after the cascade by our lazy resolution machinery.

However, I've found two other places where we depend on whether the URL was successfully resolved:
http://searchfox.org/mozilla-central/rev/006005beff40d377cfd2f69d3400633c5ff09127/servo/components/style/stylesheets.rs#858
http://searchfox.org/mozilla-central/rev/006005beff40d377cfd2f69d3400633c5ff09127/servo/components/style/properties/longhand/svg.mako.rs#260

These were added by emilio and Manish respectively. Could you guys weigh on on whether we can remove this usage?
Flags: needinfo?(manishearth)
Flags: needinfo?(emilio+bugs)
mask-image had this weird duplication between entire-image masks specified by URLs without fragments and (often local) paintserver masks that have a fragment. We explicitly distinguish at parse time based on whether or not they have a fragment, and store them separately.

bug 1301245 exists to simplify this situation a bit, but it doesn't get rid of the weirdness.
Flags: needinfo?(manishearth)
https://drafts.csswg.org/css-values/#local-urls was added recently to the spec to treat specially fragment-only URL tokens. Is that related?
Yes, that's what the fragment-only stuff in Gecko implements (from bug 1291283).
(Bug 652991, rather.)
The other place is at @import. We can definitely change the semantics of the loader to accept invalid URLS and handle them inside.

Make sure to update `components/script/stylesheet_loader.rs` in Servo to bail out if there's no URL if you do that.
Flags: needinfo?(emilio+bugs)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=883376b819b534b11bd8a28a84a17cd028822c59
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1dd382ce2e2ecabd4399e1dafbe72ee84877a60
Attachment #8848398 - Flags: review?(emilio+bugs)
Created attachment 8848789 [details] [diff] [review]
Part 2 - Don't resolve URLs at parse time for Stylo. v1

It's a bit unfortunate the use separate implementations of SpecifiedUrl for Servo
and Gecko, but they're different enough at this point that I don't think it really
makes sense to try to share everything. Splitting them out has some nice
simplifications as well.

I recognize that there's still some potential correctness issues for Servo using
the resolved URI in various places where the original URI may be the right thing,
but I've got too much on my plate to look into that for now.

MozReview-Commit-ID: BeDu93TQ4Ow
Attachment #8848789 - Flags: review?(emilio+bugs)
Comment on attachment 8848398 [details] [diff] [review]
Part 1 - Use a wrapper class to maintain the mBinding invariant and stop resolving during the cascade. v1

Review of attachment 8848398 [details] [diff] [review]:
-----------------------------------------------------------------

I suggest a different approach, changing the _callers_ to check both mBinding and GetURI in order to do work. But let me know if I missed something below.

::: layout/style/nsStyleStruct.cpp
@@ +3391,5 @@
>  nsStyleDisplay::CalcDifference(const nsStyleDisplay& aNewData) const
>  {
>    nsChangeHint hint = nsChangeHint(0);
>  
> +  if (!DefinitelyEqualURIsAndPrincipal(mBinding.Get(), aNewData.mBinding.Get())

How is this right? calling Get() will call GetURI() now, which is main-thread only (and will assert). Am I missing something obvious here?
Attachment #8848398 - Flags: review?(emilio+bugs) → review-
Comment on attachment 8848789 [details] [diff] [review]
Part 2 - Don't resolve URLs at parse time for Stylo. v1

Review of attachment 8848789 [details] [diff] [review]:
-----------------------------------------------------------------

This part looks reasonable. I wonder, is there a reason we can't get rid of rust-url as a dependency of stylo now?

Not a big deal, since it'll probably involve a bit more work, but let's file a bug on it if it's doable.

::: layout/style/ServoBindings.cpp
@@ +1472,5 @@
>  void
>  Gecko_LoadStyleSheet(css::Loader* aLoader,
>                       ServoStyleSheet* aParent,
>                       RawServoImportRuleBorrowed aImportRule,
> +                     nsIURI* aBaseURI,

I don't think we need to get this for @import, do we? The loader is main-thread only, and has access to the document.

Fine if you want to keep it, but please assert aBaseURI is the expected one.

::: servo/components/style/values/specified/url.rs
@@ +9,5 @@
>  use gecko_bindings::structs::ServoBundledURI;
>  #[cfg(feature = "gecko")]
>  use gecko_bindings::sugar::refptr::{GeckoArcPrincipal, GeckoArcURI};
>  use parser::{Parse, ParserContext};
> +#[cfg(feature = "servo")]

Let's move each different implementation to the two different files in servo/ and gecko/ directories please.
Attachment #8848789 - Flags: review?(emilio+bugs) → review+
Blocks: 1347412
(In reply to Emilio Cobos Álvarez [:emilio] from comment #18)
> Comment on attachment 8848398 [details] [diff] [review]
> Part 1 - Use a wrapper class to maintain the mBinding invariant and stop
> resolving during the cascade. v1
> 
> Review of attachment 8848398 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I suggest a different approach, changing the _callers_ to check both
> mBinding and GetURI in order to do work. But let me know if I missed
> something below.

Why? There are at least 5 callers that would need to do the checking. That means that we'd probably want to wrap it up into a nice helper abstraction, which is exactly what this patch does. There's minimal overhead because GetURI returns the cached result if URI has already been resolved, so I think this more or less does what we want.

> 
> ::: layout/style/nsStyleStruct.cpp
> @@ +3391,5 @@
> >  nsStyleDisplay::CalcDifference(const nsStyleDisplay& aNewData) const
> >  {
> >    nsChangeHint hint = nsChangeHint(0);
> >  
> > +  if (!DefinitelyEqualURIsAndPrincipal(mBinding.Get(), aNewData.mBinding.Get())
> 
> How is this right? calling Get() will call GetURI() now, which is
> main-thread only (and will assert). Am I missing something obvious here?

Good catch (the static analysis probably would have caught it too if it were running by default). I'll add a .ForceGet().
Created attachment 8849846 [details] [diff] [review]
Part 1 - Use a wrapper class to maintain the mBinding invariant and stop resolving during the cascade. v2

MozReview-Commit-ID: ozcxOA9pt1
Attachment #8849846 - Flags: review?(emilio+bugs)
Attachment #8848398 - Attachment is obsolete: true
Blocks: 1349438
(In reply to Emilio Cobos Álvarez [:emilio] from comment #19)
> Comment on attachment 8848789 [details] [diff] [review]
> Part 2 - Don't resolve URLs at parse time for Stylo. v1
> 
> Review of attachment 8848789 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This part looks reasonable. I wonder, is there a reason we can't get rid of
> rust-url as a dependency of stylo now?

I think we still use it for Base URI handling for CSSOM (though that should probably be fixed). See bug 1343964.
 
> Not a big deal, since it'll probably involve a bit more work, but let's file
> a bug on it if it's doable.

Bug 1349438.

> 
> ::: layout/style/ServoBindings.cpp
> @@ +1472,5 @@
> >  void
> >  Gecko_LoadStyleSheet(css::Loader* aLoader,
> >                       ServoStyleSheet* aParent,
> >                       RawServoImportRuleBorrowed aImportRule,
> > +                     nsIURI* aBaseURI,
> 
> I don't think we need to get this for @import, do we? The loader is
> main-thread only, and has access to the document.
> 
> Fine if you want to keep it, but please assert aBaseURI is the expected one.

You mean asserting that it equals aLoader->GetDocument()->GetDocBaseURI()? I can do that, but the code to pass around base URIs is a huge jungle, so it's hard to verify from the code that this property will definitely hold. If the assert trips, I really don't want to get sucked into debugging it.

Anyway, try will tell.

> 
> ::: servo/components/style/values/specified/url.rs
> @@ +9,5 @@
> >  use gecko_bindings::structs::ServoBundledURI;
> >  #[cfg(feature = "gecko")]
> >  use gecko_bindings::sugar::refptr::{GeckoArcPrincipal, GeckoArcURI};
> >  use parser::{Parse, ParserContext};
> > +#[cfg(feature = "servo")]
> 
> Let's move each different implementation to the two different files in
> servo/ and gecko/ directories please.

Ok.
Created attachment 8849855 [details] [diff] [review]
Part 2 - Don't resolve URLs at parse time for Stylo. r=emilio
Attachment #8849855 - Flags: review+
Attachment #8848789 - Attachment is obsolete: true
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0a3b85f4c7b03c86f2c2ac8bcdc10b2c0d87487
Comment on attachment 8849846 [details] [diff] [review]
Part 1 - Use a wrapper class to maintain the mBinding invariant and stop resolving during the cascade. v2

Review of attachment 8849846 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine, sorry for the lag here :)

::: layout/style/nsStyleStruct.h
@@ +2740,5 @@
> +// behavior dynamically.
> +class BindingHolder {
> +public:
> +  BindingHolder() {}
> +  BindingHolder(mozilla::css::URLValue* aPtr) : mPtr(aPtr) {}

nit: explicit.
Attachment #8849846 - Flags: review?(emilio+bugs) → review+
So, the baseURI assertion doesn't hold. When starting the browser, we get a load for a stylesheet at chrome://global/skin/in-content/common.css with a base URI of chrome://global/skin/in-content/info-pages.css and a document URI of about:sessionrestore.

I'm going to remove the assertion.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=32de65c3fedd7f9205f303cf8a01c1a7c869c860
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b765aca3c0295990a34bbcc157bbcf428c0abbfe
You need to log in before you can comment on or make changes to this bug.