Closed
Bug 1301742
Opened 8 years ago
Closed 8 years ago
Improve performance of UTF16 -> UTF8 conversion
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla51
Tracking | Status | |
---|---|---|
firefox51 | --- | fixed |
People
(Reporter: erahm, Assigned: erahm)
References
Details
Attachments
(1 file, 1 obsolete file)
17.16 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
In bug 1287292 we saw significant overhead in the UTF16 -> UTF8 conversion for a XHR post using strings.
I propose we add an optimized path for strings that can be represented in UTF8 without conversion (all characters < 0x80).
One option is to update |AppendUTF16toUTF8| [1] by using a modified version of the existing SSE2 optimized path in nsTextFragment [2].
[1] http://searchfox.org/mozilla-central/rev/124c120ce9d7e8407c9ad330a9d6b1c4ad432637/xpcom/string/nsReadableUtils.cpp#180
[2] http://searchfox.org/mozilla-central/rev/124c120ce9d7e8407c9ad330a9d6b1c4ad432637/dom/base/nsTextFragmentSSE2.cpp#27
Assignee | ||
Comment 1•8 years ago
|
||
This adds an SSE2 optimized path for UTF-16 strings that can be trivially
converted to UTF-8 (they contain no characters > 0X7F). It has a
speedup for the case where many of the leading characters are < 0X80 and
should have the same performance in the worst-case scenario where the first
character is > 0X7F.
Nathan let me know what you think of this approach. The best I can tell it's
no slower than the current implementation and much faster for the trivial
conversion case. I'll follow up with stats from the gtest.
Attachment #8789870 -
Flags: feedback?(nfroyd)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → erahm
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•8 years ago
|
||
Test results on my Ubuntu 16.04 64-bit machine. The 'bail' tests are with a character > 0x7F in the middle. There's a slight slowdown for the 5 char test, we might want to only take the optimized path for strings > X length.
> Huge == 50M char Large == 5K char Med == 512 char Small == 5 char
>
> TEST Non-Patch With Patch Speedup
> ==== ========= ========== =======
> UTFPerfTest.AppendUTF16to8Huge 26163 ms 5014 ms 5.22X
> UTFPerfTest.AppendUTF16to8Large 2443 ms 190 ms 12.86X
> UTFPerfTest.AppendUTF16to8Med 2639 ms 482 ms 5.47X
> UTFPerfTest.AppendUTF16to8Small 2568 ms 2709 ms 0.94X
> UTFPerfTest.AppendUTF16to8HugeBail 26071 ms 18465 ms 1.42X
> UTFPerfTest.AppendUTF16to8LargeBail 2446 ms 1694 ms 1.44X
> UTFPerfTest.AppendUTF16to8MedBail 2654 ms 1944 ms 1.36X
> UTFPerfTest.AppendUTF16to8SmallBail 2598 ms 2721 ms 0.95X
Assignee | ||
Comment 3•8 years ago
|
||
Doing a test run of startup, loading techcrunch.com, signing in to inbox.gmail.com and clicking a few emails showed 50,580 uses of the fast path and 1 use of the slow path. Almost all were < 32K characters (a few outliers in the 300K range). 98% were less than 302 chars.
I get the feeling the sweet spot for improved perf is somewhere around 16-32 characters. I'll drill down on smaller size classes.
Comment 4•8 years ago
|
||
Comment on attachment 8789870 [details] [diff] [review]
WIP: Improve UTF-16 to UTF-8 conversion speed
Review of attachment 8789870 [details] [diff] [review]:
-----------------------------------------------------------------
Looks reasonable. A few comments here and there.
::: xpcom/string/nsReadableUtils.cpp
@@ +19,5 @@
> +#include <algorithm>
> +
> +#include <stdint.h>
> +
> +template<size_t size> struct Non8BitParameters;
This should probably be named something like NonUTF8BitParameters?
@@ +43,5 @@
> +
> +#ifdef MOZILLA_MAY_SUPPORT_SSE2
> +namespace mozilla {
> + namespace SSE2 {
> + int32_t FirstNon8Bit(const char16_t *str, const char16_t *end);
This declaration and whole block seems dead.
@@ +48,5 @@
> + } // namespace SSE2
> +} // namespace mozilla
> +#endif
> +
> +//#if 0
Dead #if.
@@ +333,5 @@
> AppendUTF16toUTF8(const nsAString& aSource, nsACString& aDest,
> const mozilla::fallible_t& aFallible)
> {
> + int32_t firstNon8bit = FirstNonUTF8Bit(aSource.BeginReading(), aSource.EndReading());
> + //int32_t firstNon8bit = 0;
Commented out things should go away.
@@ +341,5 @@
> + if (!aDest.SetCapacity(aDest.Length() + aSource.Length(), aFallible)) {
> + return false;
> + }
> +
> + LossyAppendUTF16toASCII(aSource, aDest);
Cool, this is already SSE2-optimized. I think we could almost stand to re-reroll the inner loop for LossyConvertEncoding16to8::write_sse2 to walk 16 16-bit characters at a time rather than 32, or even just 8 characters, as small strings are overwhelmingly common. Or maybe 8 characters at a time for strings < 64 characters and 32 characters otherwise.
::: xpcom/tests/gtest/TestUTF.cpp
@@ +25,5 @@
> {
> for (unsigned int i = 0; i < ArrayLength(ValidStrings); ++i) {
> nsDependentCString str8(ValidStrings[i].m8);
> nsDependentString str16(ValidStrings[i].m16);
> + printf("Testing string %d: %s\n", (int)i, str8.get());
Probably want to get rid of this.
@@ +122,5 @@
> }
> #endif
> }
>
> +class UTFPerfTest : public ::testing::Test {
It's hard to tell where the performance testing in all of this is; I think we should rename this.
@@ +175,5 @@
> +TEST_F(UTFPerfTest, AppendUTF16to8Huge)
> +{
> + for (size_t i = 0; i < 200; i++) {
> + nsCString test;
> + AppendUTF16toUTF8(huge_string, test);
We should not only be testing actual appending, we should also ensure that the newly-created UTF8 string has the expected contents. A more structured test might also check what happens when the non-8bit-UTF8 character is at the end of the string, or all possible positions of such a charcter in an SSE register. So I'm not sure if size--ever if that's convenient for performance tests--is really the right thing to be differentiating our tests on here.
Attachment #8789870 -
Flags: feedback?(nfroyd) → feedback+
Comment 5•8 years ago
|
||
Testing on my i7-4790K, Debian 7.0, I see roughly the same speedups that Eric sees in comment 2. The notable exceptions are:
* the *Small cases are much closer, on the order of 1% difference;
* the Huge tests is more like ~4.2X faster, not 5X+.
Comment 6•8 years ago
|
||
(Oh, GCC 4.9.2, if that makes any difference.)
Assignee | ||
Comment 7•8 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #6)
> (Oh, GCC 4.9.2, if that makes any difference.)
I tested with a debug build produced by clang 3.8.1 on Ubuntu 16.04 w/ a Core i7-4770 CPU.
Assignee | ||
Comment 8•8 years ago
|
||
With further testing it seems that 16 characters is where we consistently get win in both the fast path and early bailout cases. With that in mind my proposal is to only attempt the fast path if |aSource.Length() >= 16|. We could go as low as 10-12 characters where the performance of the fast path was still greater.
Testing against the top 15 sites in alexa (google, youtube, amazon, baidu, etc) I found that > 50% of calls (~100K) in my tests were with strings >= 16 chars. 204,974 of the conversions took the fast path, 168 took the slow path. Given that stat we might consider a lower character threshold.
We also see about 10% of conversion calls only have 1 character. It might be worth pursuing optimization for that path as well, but I'll defer that to a follow-up.
Comment 9•8 years ago
|
||
Here's something else for you to benchmark, too:
https://bugzilla.mozilla.org/show_bug.cgi?id=1295762#c17
Flags: needinfo?(erahm)
Assignee | ||
Comment 10•8 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #9)
> Here's something else for you to benchmark, too:
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=1295762#c17
Tested with just that optimization (aSource.Length() == count). It's a bit faster than without, but nowhere near the speedup we see.
> Huge == 50M char Large == 5K char Med == 512 char Small == 5 char
>
> TEST Non-Patch With Patch Speedup
> ==== ========= ========== =======
> UTFPerfTest.AppendUTF16to8Huge 26163 ms 20078 ms 1.30X
> UTFPerfTest.AppendUTF16to8Large 2443 ms 1728 ms 1.41X
> UTFPerfTest.AppendUTF16to8Med 2639 ms 2004 ms 1.32X
> UTFPerfTest.AppendUTF16to8Small 2568 ms 2972 ms 0.86X
> UTFPerfTest.AppendUTF16to8HugeBail 26071 ms 26116 ms 0.99X
> UTFPerfTest.AppendUTF16to8LargeBail 2446 ms 2465 ms 0.99X
> UTFPerfTest.AppendUTF16to8MedBail 2654 ms 2691 ms 0.98X
> UTFPerfTest.AppendUTF16to8SmallBail 2598 ms 2598 ms 1.00X
Flags: needinfo?(erahm)
Assignee | ||
Comment 11•8 years ago
|
||
I added another optimization in the slow path inspired by the |aSource.Length() == count| optimization. We can still do a lossy convert up to the first non-ascii character even in the slow path. That shows a nice speedup as well:
> Huge == 50M char Large == 5K char Med == 512 char Small == 5 char
>
> TEST Non-Patch With Patch Speedup
> ==== ========= ========== =======
> UTFPerfTest.AppendUTF16to8HugeBail 26071 ms 15577 ms 1.67X
> UTFPerfTest.AppendUTF16to8LargeBail 2446 ms 1332 ms 1.86X
> UTFPerfTest.AppendUTF16to8MedBail 2654 ms 1648 ms 1.61X
> UTFPerfTest.AppendUTF16to8SmallBail 2598 ms 2722 ms 0.95X
I should have a cleaned up version of the patch up shortly.
Assignee | ||
Comment 12•8 years ago
|
||
Assignee | ||
Comment 13•8 years ago
|
||
This adds an SSE2 optimized path for UTF-16 strings that can be trivially
converted to UTF-8 (they contain no characters > 0X7F). It has a
speedup for the case where many of the leading characters are < 0X80 and
should have the same performance in the worst-case scenario where the first
character is > 0X7F.
This is a cleaner implementation than before. I moved the SSE function to it's
own file, cleaned up the types used, and added a more thorough test.
Attachment #8791447 -
Flags: review?(nfroyd)
Assignee | ||
Updated•8 years ago
|
Attachment #8789870 -
Attachment is obsolete: true
Comment 14•8 years ago
|
||
Comment on attachment 8791447 [details] [diff] [review]
WIP: Improve UTF-16 to UTF-8 conversion speed
Review of attachment 8791447 [details] [diff] [review]:
-----------------------------------------------------------------
This is much nicer than the previous attempt. r=me with a few comments below.
::: xpcom/string/nsReadableUtils.cpp
@@ +244,5 @@
> bool
> AppendUTF16toUTF8(const nsAString& aSource, nsACString& aDest,
> const mozilla::fallible_t& aFallible)
> {
> + const nsAString::size_type kFastPathMinLength = 16;
We should say something about how this constant was determined.
::: xpcom/string/nsReadableUtilsImpl.h
@@ +20,5 @@
> + return reinterpret_cast<const char16_t*>(
> + reinterpret_cast<const uintptr_t>(aPtr) & ~aMask);
> +}
> +
> +template<size_t size> struct NonASCIIParameters;
Maybe:
/**
* Structures for word-sized vectorization of ASCII checking for Unicode strings.
*/
?
::: xpcom/tests/gtest/TestUTF.cpp
@@ +142,5 @@
> + auto str_buff = asciiString.BeginWriting();
> + auto cstr_buff = asciiCString.BeginWriting();
> + for (size_t i = 0; i < kTestSize; i++) {
> + str_buff[i] = i % kMaxASCII;
> + cstr_buff[i] = i % kMaxASCII;
Worth noting that this will only give you 0..126, inclusive, rather than the full ASCII range.
Attachment #8791447 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 15•8 years ago
|
||
Assignee | ||
Comment 16•8 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #14)
> Comment on attachment 8791447 [details] [diff] [review]
> WIP: Improve UTF-16 to UTF-8 conversion speed
>
> Review of attachment 8791447 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> This is much nicer than the previous attempt. r=me with a few comments
> below.
>
> ::: xpcom/string/nsReadableUtils.cpp
> @@ +244,5 @@
> > bool
> > AppendUTF16toUTF8(const nsAString& aSource, nsACString& aDest,
> > const mozilla::fallible_t& aFallible)
> > {
> > + const nsAString::size_type kFastPathMinLength = 16;
>
> We should say something about how this constant was determined.
I'll add a note.
>
> ::: xpcom/string/nsReadableUtilsImpl.h
> @@ +20,5 @@
> > + return reinterpret_cast<const char16_t*>(
> > + reinterpret_cast<const uintptr_t>(aPtr) & ~aMask);
> > +}
> > +
> > +template<size_t size> struct NonASCIIParameters;
>
> Maybe:
>
> /**
> * Structures for word-sized vectorization of ASCII checking for Unicode
> strings.
> */
>
> ?
Sounds good, will update.
>
> ::: xpcom/tests/gtest/TestUTF.cpp
> @@ +142,5 @@
> > + auto str_buff = asciiString.BeginWriting();
> > + auto cstr_buff = asciiCString.BeginWriting();
> > + for (size_t i = 0; i < kTestSize; i++) {
> > + str_buff[i] = i % kMaxASCII;
> > + cstr_buff[i] = i % kMaxASCII;
>
> Worth noting that this will only give you 0..126, inclusive, rather than the
> full ASCII range.
Oh right I'm using mod! I'll make kMaxASCII 0x80.
Assignee | ||
Comment 17•8 years ago
|
||
Not seeing anything pop up in talos [1], I think we're good to land.
[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=247f805a03e2&newProject=try&newRevision=70de461594486a094c1acb5b1e0403210aaadf8a&framework=1&showOnlyImportant=0
Assignee | ||
Comment 18•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ea86c2d5e56edf346cc0e4e4f184fb9f5a65164
Bug 1301742 - Improve UTF-16 to UTF-8 conversion speed. r=froydnj
Comment 19•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in
before you can comment on or make changes to this bug.
Description
•