Closed Bug 1349528 Opened 3 years ago Closed Last year

NormalizeUSVStringInternal looks inefficient

Categories

(Core :: DOM: Core & HTML, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

Attachments

(2 files)

NormalizeUSVStringInternal uses UTF16CharEnumerator::NextChar for the sole purpose of finding unpaired surrogates. The latter does math that's unnecessary for this goal.

It would seem prudent to inline mere surrogateness checks without the math to compute the code point value from surrogate pairs.

(Since NextChar() is defined in a header, I guess the compiler has the chance to optimize things the right way, but can we trust it to?)
Should this be a higher priority than P3 (~"backlog"), Henri?
Flags: needinfo?(hsivonen)
Priority: -- → P3
(In reply to Andrew Overholt [:overholt] from comment #1)
> Should this be a higher priority than P3 (~"backlog"), Henri?

As the use of USVString in Web APIs increases, possibly yes. Hard to say.

The SSE2 solution to this isn't particularly complicated.
Flags: needinfo?(hsivonen)
I'm trying to develop code that could address this as part of bug 1402247.
Depends on: 1402247
encoding_rs::mem now has SIMD-accelerated code for this.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
MozReview-Commit-ID: 9uG6j8UdfKR
Comment on attachment 9001244 [details]
Bug 1349528 - Use encoding_rs for normalizing USVString.

Olli Pettay [:smaug] has approved the revision.
Attachment #9001244 - Flags: review+
Microbenchmarks with this patch plus changing NormalizeUSVString to inline:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=36ecf703a4c5fccc95b6002deca9177bbeb6a188
So, unsurprisingly, the new code shows a bit more function call overhead, which can be be seen with strings of length one. Also, unsurprisingly, the new code is slower in the contrived case of the input being 100% emoji (+44% in the worst case).

However, as expected, in the common case of the string being valid, within the Basic Multilingual Plane and longer than one, the new code is faster (-80% in the best case).
Now that I've written microbenchmarks, should I land them (in a different bug)? I.e. is it worthwhile to run them on every commit. Changes to this code aren't expected.
Flags: needinfo?(bugs)
Nah, I don't think so. But when landing some patch affecting performance, better to actually know how it affects to performance.



But perhaps upload a test here?
oh, it is binary a test. I would have written some web page, since that after all tests what web pages would see. oh, well.
Flags: needinfo?(bugs)
Another try run for the added assertion, this time without other patches in my queue interfering:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=341b71f1e6767c9dd96e88d2928ba4eaf89063b5
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/42d6f6784fc2
Use encoding_rs for normalizing USVString. r=smaug
https://hg.mozilla.org/mozilla-central/rev/42d6f6784fc2
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.