Closed Bug 1349528 Opened 3 years ago Closed Last year
USVString Internal looks inefficient
NormalizeUSVStringInternal uses UTF16CharEnumerator::NextChar for the sole purpose of finding unpaired surrogates. The latter does math that's unnecessary for this goal. It would seem prudent to inline mere surrogateness checks without the math to compute the code point value from surrogate pairs. (Since NextChar() is defined in a header, I guess the compiler has the chance to optimize things the right way, but can we trust it to?)
Should this be a higher priority than P3 (~"backlog"), Henri?
Priority: -- → P3
(In reply to Andrew Overholt [:overholt] from comment #1) > Should this be a higher priority than P3 (~"backlog"), Henri? As the use of USVString in Web APIs increases, possibly yes. Hard to say. The SSE2 solution to this isn't particularly complicated.
encoding_rs::mem now has SIMD-accelerated code for this.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Comment on attachment 9001244 [details] Bug 1349528 - Use encoding_rs for normalizing USVString. Olli Pettay [:smaug] has approved the revision.
Attachment #9001244 - Flags: review+
Microbenchmarks without this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a6165920841cdb3bfb45b07b8828f61a9aebb423 Microbenchmarks with this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5159aa9986405b5ee81df6cb323922b5f4ec438a Debug test suite run with the newly-added assertion: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3a1cdcb4bf32c175a2c0badb29d66f74d368d3e
Microbenchmarks with this patch plus changing NormalizeUSVString to inline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=36ecf703a4c5fccc95b6002deca9177bbeb6a188
So, unsurprisingly, the new code shows a bit more function call overhead, which can be be seen with strings of length one. Also, unsurprisingly, the new code is slower in the contrived case of the input being 100% emoji (+44% in the worst case). However, as expected, in the common case of the string being valid, within the Basic Multilingual Plane and longer than one, the new code is faster (-80% in the best case).
Now that I've written microbenchmarks, should I land them (in a different bug)? I.e. is it worthwhile to run them on every commit. Changes to this code aren't expected.
Nah, I don't think so. But when landing some patch affecting performance, better to actually know how it affects to performance. But perhaps upload a test here? oh, it is binary a test. I would have written some web page, since that after all tests what web pages would see. oh, well.
Another try run for the added assertion, this time without other patches in my queue interfering: https://treeherder.mozilla.org/#/jobs?repo=try&revision=341b71f1e6767c9dd96e88d2928ba4eaf89063b5
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/42d6f6784fc2 Use encoding_rs for normalizing USVString. r=smaug
You need to log in before you can comment on or make changes to this bug.