Closed Bug 1354989 Opened 3 years ago Closed 2 years ago

stylo: Don’t go through UTF-16 for external stylesheets

Categories

(Core :: CSS Parsing and Computation, enhancement, P4)

enhancement

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: SimonSapin, Assigned: hsivonen)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

ServoStyleSheet::ParseSheet currently takes a `const nsAString& aInput` parameter and immediately calls NS_ConvertUTF16toUTF8 on it: 

https://searchfox.org/mozilla-central/rev/eace920a0372051a11d8d275bd9b8f14f3024ecd/layout/style/ServoStyleSheet.cpp#93

In the case of external stylesheets this UTF-16 string is the result decoding bytes from the network in some character encoding. If that decoding/conversion could be done directly to UTF-8, we could eliminate the second conversion (and the time spent doing it).
Henri, I know that encoding_rs supports UTF-8 as the memory representation for Unicode (the output of decoders). Can uconv do the same?
Flags: needinfo?(hsivonen)
(In reply to Simon Sapin (:SimonSapin) from comment #1)
> Henri, I know that encoding_rs supports UTF-8 as the memory representation
> for Unicode (the output of decoders). Can uconv do the same?

No. uconv always converts to or from UTF-16. Allowing conversions to and from UTF-8 is one of the key motivators of encoding_rs.

(As status report, encoding_rs-enabled Gecko now builds for me locally with clang only. I need to fix some HTML parser Quantum Flow P1 bugs this week, but encoding_rs is indeed becoming less and less vaporware.)
Flags: needinfo?(hsivonen)
Priority: -- → P4
Henri, now that encoding_rs has landed, do you perchance have any cycles to help us eliminate the conversion overhead here?
Flags: needinfo?(hsivonen)
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #3)
> Henri, now that encoding_rs has landed, do you perchance have any cycles to
> help us eliminate the conversion overhead here?

Not immediately, but I'll check back here once I've landed by currently in-flight encoding_rs polish that wasn't part of the minimal viable landing.
Flags: needinfo?(hsivonen)
Taking this now that encoding_rs 0.7.0 is on autoland.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Considering that Rust Vec doesn't do fallible allocation, I think I'm going to leave the placement of the boundary between C++ and Rust unchanged.
(In reply to Henri Sivonen (:hsivonen) from comment #6)
> Considering that Rust Vec doesn't do fallible allocation, I think I'm going
> to leave the placement of the boundary between C++ and Rust unchanged.

Note that we're adding vec fallibility in bug 1389009.
Blocks: 1395114
WPT already has the necessary black-box tests. Checked with gdb that the buffer non-copying works as designed.
Attachment #8902660 - Flags: review?(simon.sapin)
Comment on attachment 8902660 [details]
Bug 1354989 - Avoid pivoting via UTF-16 when loading CSS in the Stylo mode.

https://reviewboard.mozilla.org/r/174334/#review179890

I read through the diff and don’t see anything wrong, but I’m not at all familiar with that code so I don’t feel confident…

::: layout/style/StreamLoader.cpp:37
(Diff revision 2)
> +/* nsIRequestObserver implementation */
> +NS_IMETHODIMP
> +StreamLoader::OnStartRequest(nsIRequest* aRequest, nsISupports*)
> +{
> +  // It's kinda bad to let Web content send a number that results
> +  // in a potentially large allocation directly, but efficiency of

Would it make sense to cap the initial allocation at much less than MaxValue? (Perhaps a few MBs?)

::: layout/style/StreamLoader.cpp:74
(Diff revision 2)
> +  nsresult rv = NS_OK;
> +
> +  {
> +    // Hold the nsStringBuffer for the bytes from the stack to ensure release
> +    // no matter which return branch is taken.
> +    nsCString bytes(mBytes);

Is this a copy?

::: layout/style/nsLayoutStylesheetCache.cpp:998
(Diff revision 2)
>      sheet->AsGecko()->ReparseSheet(sheetText);
>    } else {
>      ServoStyleSheet* servoSheet = sheet->AsServo();
>      // NB: The pref sheet never has @import rules.
> -    nsresult rv =
> -      servoSheet->ParseSheet(nullptr, sheetText, uri, uri, nullptr, 0,
> +    nsresult rv = servoSheet->ParseSheet(nullptr,
> +                                         NS_ConvertUTF16toUTF8(sheetText),

Could this conversion be made unnecessary (or replaced by its reverse in the non-Stylo case) by changing `nsString sheetText;` to `nsCString sheetText;` above?

::: layout/style/test/gtest/StyloParsingBench.cpp:29
(Diff revision 2)
>  
>  static void ServoParsingBench() {
>    NS_NAMED_LITERAL_CSTRING(css_, EXAMPLE_STYLESHEET);
>    const nsACString& css = css_;
>    ASSERT_TRUE(IsUTF8(css));
> +  Span<const uint8_t> span(css);

Perhaps building a Span from a literal string requires fewer steps than nsACString* ?
Yeah, we may need someone more familiar with the Necko code to do the full review here. Josh, is that you? If not, could you suggest someone?
Flags: needinfo?(josh)
Comment on attachment 8902660 [details]
Bug 1354989 - Avoid pivoting via UTF-16 when loading CSS in the Stylo mode.

https://reviewboard.mozilla.org/r/174334/#review180120

The necko-related pieces look fine to me.

::: layout/style/Loader.h:562
(Diff revision 2)
> -  // Set aCompleted to true if the parse finished, false otherwise (e.g. if the
> +  // UTF-16 and from aUTF8 if UTF-8.
> +  // Sets aCompleted to true if the parse finished, false otherwise (e.g. if the
>    // sheet had an @import).  If aCompleted is true when this returns, then
>    // ParseSheet also called SheetComplete on aLoadData.
> -  nsresult ParseSheet(const nsAString& aInput,
> +  nsresult ParseSheet(const nsAString& aUTF16,
> +                      Span<const uint8_t> aUTF8,

Would it make sense to use a single Variant argument instead?

::: layout/style/Loader.cpp:632
(Diff revision 2)
> + * Here we need to check that the load did not give us an http error
> + * page and check the mimetype on the channel to make sure we're not
> + * loading non-text/css data in standards mode.
> + */
> +nsresult
> +SheetLoadData::OnStreamCompleteImpl(nsresult aStatus,

Let's call this something more meaningful - VerifySheetReadyToParse, perhaps?

::: layout/style/StreamLoader.cpp:14
(Diff revision 2)
> +#include "mozilla/Encoding.h"
> +#include "nsIChannel.h"
> +#include "nsIInputStream.h"
> +
> +using namespace mozilla;
> +;

nit: remove this.
Attachment #8902660 - Flags: review+
Flags: needinfo?(josh)
Comment on attachment 8902660 [details]
Bug 1354989 - Avoid pivoting via UTF-16 when loading CSS in the Stylo mode.

https://reviewboard.mozilla.org/r/174334/#review179890

> Would it make sense to cap the initial allocation at much less than MaxValue? (Perhaps a few MBs?)

Do we have a good way to come up with a magic number? OOMing 32-bit Firefox with a gzip bomb is so easy that I figured that legitimate large sizes work better without realloc and malicious large sizes are trivial to achieve anyway.

> Could this conversion be made unnecessary (or replaced by its reverse in the non-Stylo case) by changing `nsString sheetText;` to `nsCString sheetText;` above?

Reversed.

> Perhaps building a Span from a literal string requires fewer steps than nsACString* ?

Used fewer steps.
Comment on attachment 8902660 [details]
Bug 1354989 - Avoid pivoting via UTF-16 when loading CSS in the Stylo mode.

https://reviewboard.mozilla.org/r/174334/#review180120

> Would it make sense to use a single Variant argument instead?

According to the [docs](https://searchfox.org/mozilla-central/source/mfbt/Variant.h#484), `Variant` doesn't work with an argument type except by pointer or reference, and that seems unnecessary complexity. Moreover, there are no `Variant`s wrapping an XPCOM string in the codebase, so I'm a bit nervous I'd do something subtly wrong.

> Let's call this something more meaningful - VerifySheetReadyToParse, perhaps?

Renamed to `VerifySheetReadyToParse`.

> nit: remove this.

Removed. Thanks.
Comment on attachment 8902660 [details]
Bug 1354989 - Avoid pivoting via UTF-16 when loading CSS in the Stylo mode.

https://reviewboard.mozilla.org/r/174334/#review179890

> Do we have a good way to come up with a magic number? OOMing 32-bit Firefox with a gzip bomb is so easy that I figured that legitimate large sizes work better without realloc and malicious large sizes are trivial to achieve anyway.

Not really. Feel free to ignore this idea.

> Is this a copy?

I’m not familiar with C++ move semantics. Is this line allocating and copying the entire stylesheet? If not how? If so can that be avoided?
Comment on attachment 8902660 [details]
Bug 1354989 - Avoid pivoting via UTF-16 when loading CSS in the Stylo mode.

https://reviewboard.mozilla.org/r/174334/#review179890

> I’m not familiar with C++ move semantics. Is this line allocating and copying the entire stylesheet? If not how? If so can that be avoided?

Hmm. Somehow my reply from last time didn't get saved.

No; this is a ref count increment. nsCString holds a refcounted nsStringBuffer. (Truncate() on the line that follows decrements the ref count back. As far as I can tell, we don't have a way to transfer an nsStringBuffer from one nsCString to another without touching the ref count. I expect the reason to be that there would be complications with the subclasses of nsCString in the general case. In this case, though, we know that we are dealing with two nsCStrings and not instances of subclasses.)

The various decoding methods used later in this method also only increment the ref count instead of copying when possible.
Comment on attachment 8902660 [details]
Bug 1354989 - Avoid pivoting via UTF-16 when loading CSS in the Stylo mode.

https://reviewboard.mozilla.org/r/174334/#review179890

> Hmm. Somehow my reply from last time didn't get saved.
> 
> No; this is a ref count increment. nsCString holds a refcounted nsStringBuffer. (Truncate() on the line that follows decrements the ref count back. As far as I can tell, we don't have a way to transfer an nsStringBuffer from one nsCString to another without touching the ref count. I expect the reason to be that there would be complications with the subclasses of nsCString in the general case. In this case, though, we know that we are dealing with two nsCStrings and not instances of subclasses.)
> 
> The various decoding methods used later in this method also only increment the ref count instead of copying when possible.

Alright, thanks.
> Alright, thanks.

Did you intentionally leave the patch still without your r+ despite the issue list going to zero?
Flags: needinfo?(simon.sapin)
Comment on attachment 8902660 [details]
Bug 1354989 - Avoid pivoting via UTF-16 when loading CSS in the Stylo mode.

https://reviewboard.mozilla.org/r/174334/#review180478

I can click the r+ button, but as I said I don’t feel confident as I hadn’t looked at any of this code before.
Attachment #8902660 - Flags: review?(simon.sapin) → review+
(In reply to Simon Sapin (:SimonSapin) from comment #22)
> I can click the r+ button, but as I said I don’t feel confident as I hadn’t
> looked at any of this code before.

Thanks. It appears that ReviewBoard remembered jdm's r+, too, even though it wasn't synced to Bugzilla.
Flags: needinfo?(simon.sapin)
hg error in cmd: hg push -r tip ssh://hg.mozilla.org/integration/autoland: pushing to ssh://hg.mozilla.org/integration/autoland
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 25 changes to 25 files
remote: (95608a1cbd14 modifies servo/components/style/gecko/generated/bindings.rs from Servo; not enforcing peer review)
remote: (95608a1cbd14 modifies servo/ports/geckolib/glue.rs from Servo; not enforcing peer review)
remote: (1 changesets contain changes to protected servo/ directory: 95608a1cbd14)
remote: ************************************************************************
remote: you do not have permissions to modify files under servo/
remote: 
remote: the servo/ directory is kept in sync with the canonical upstream
remote: repository at https://github.com/servo/servo
remote: 
remote: changes to servo/ are only allowed by the syncing tool and by sheriffs
remote: performing cross-repository "merges"
remote: 
remote: to make changes to servo/, submit a Pull Request against the servo/servo
remote: GitHub project
remote: ************************************************************************
remote: transaction abort!
remote: rollback completed
remote: pretxnchangegroup.e_prevent_vendored hook failed
abort: push failed on remote
AFAICT, if I created a GitHub PR and the code under servo/ got synced to m-c without the rest of this patch, m-c would burn.

What's the right way to land this atomically?
Flags: needinfo?(bobbyholley)
You need to land the servo PR, and right after land the patch via autoland. We know it sucks :(
Flags: needinfo?(bobbyholley)
Removed the Servo bits from ReviewBoard changeset.
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/32fb897ebe85
Avoid pivoting via UTF-16 when loading CSS in the Stylo mode. r=jdm,SimonSapin
Thanks for fixing this Henri!
Hmm. Did this turn bug 1346041 from intermittent to perma-orange?
Unfortunately, the now perma-orange test actually looks related to this change. (I overlooked it as intermittent on try.)

It's too late on my time zone to debug it now, but given the complications with the GitHub landing coordination, my preference would be disabling the test and debugging off-treeherder instead of backing this out.
Pushed by ecoal95@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/85ba48e32d74
Mark layout/reftests/bugs/485012-1.html as failing in stylo pending investigation in bug 1396093. r=hsivonen
https://hg.mozilla.org/mozilla-central/rev/32fb897ebe85
https://hg.mozilla.org/mozilla-central/rev/85ba48e32d74
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.