Improve performance of UTF16 -> UTF8 conversion

RESOLVED FIXED in Firefox 51

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: erahm, Assigned: erahm)

Tracking

Trunk
mozilla51
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox51 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

In bug 1287292 we saw significant overhead in the UTF16 -> UTF8 conversion for a XHR post using strings.

I propose we add an optimized path for strings that can be represented in UTF8 without conversion (all characters < 0x80).

One option is to update |AppendUTF16toUTF8| [1] by using a modified version of the existing SSE2 optimized path in nsTextFragment [2].

[1] http://searchfox.org/mozilla-central/rev/124c120ce9d7e8407c9ad330a9d6b1c4ad432637/xpcom/string/nsReadableUtils.cpp#180
[2] http://searchfox.org/mozilla-central/rev/124c120ce9d7e8407c9ad330a9d6b1c4ad432637/dom/base/nsTextFragmentSSE2.cpp#27
This adds an SSE2 optimized path for UTF-16 strings that can be trivially
converted to UTF-8 (they contain no characters > 0X7F). It has a
speedup for the case where many of the leading characters are < 0X80 and
should have the same performance in the worst-case scenario where the first
character is > 0X7F.

Nathan let me know what you think of this approach. The best I can tell it's
no slower than the current implementation and much faster for the trivial
conversion case. I'll follow up with stats from the gtest.
Attachment #8789870 - Flags: feedback?(nfroyd)
Assignee: nobody → erahm
Status: NEW → ASSIGNED
Test results on my Ubuntu 16.04 64-bit machine. The 'bail' tests are with a character > 0x7F in the middle. There's a slight slowdown for the 5 char test, we might want to only take the optimized path for strings > X length.

> Huge == 50M char Large == 5K char Med == 512 char Small == 5 char
> 
>      TEST                            Non-Patch   With Patch   Speedup
>      ====                            =========   ==========   =======
> UTFPerfTest.AppendUTF16to8Huge       26163 ms     5014 ms       5.22X
> UTFPerfTest.AppendUTF16to8Large       2443 ms      190 ms      12.86X
> UTFPerfTest.AppendUTF16to8Med         2639 ms      482 ms       5.47X
> UTFPerfTest.AppendUTF16to8Small       2568 ms     2709 ms       0.94X
> UTFPerfTest.AppendUTF16to8HugeBail   26071 ms    18465 ms       1.42X
> UTFPerfTest.AppendUTF16to8LargeBail   2446 ms     1694 ms       1.44X
> UTFPerfTest.AppendUTF16to8MedBail     2654 ms     1944 ms       1.36X
> UTFPerfTest.AppendUTF16to8SmallBail   2598 ms     2721 ms       0.95X
Doing a test run of startup, loading techcrunch.com, signing in to inbox.gmail.com and clicking a few emails showed 50,580 uses of the fast path and 1 use of the slow path. Almost all were < 32K characters (a few outliers in the 300K range). 98% were less than 302 chars.

I get the feeling the sweet spot for improved perf is somewhere around 16-32 characters. I'll drill down on smaller size classes.
Comment on attachment 8789870 [details] [diff] [review]
WIP: Improve UTF-16 to UTF-8 conversion speed

Review of attachment 8789870 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable.  A few comments here and there.

::: xpcom/string/nsReadableUtils.cpp
@@ +19,5 @@
> +#include <algorithm>
> +
> +#include <stdint.h>
> +
> +template<size_t size> struct Non8BitParameters;

This should probably be named something like NonUTF8BitParameters?

@@ +43,5 @@
> +
> +#ifdef MOZILLA_MAY_SUPPORT_SSE2
> +namespace mozilla {
> +  namespace SSE2 {
> +    int32_t FirstNon8Bit(const char16_t *str, const char16_t *end);

This declaration and whole block seems dead.

@@ +48,5 @@
> +  } // namespace SSE2
> +} // namespace mozilla
> +#endif
> +
> +//#if 0

Dead #if.

@@ +333,5 @@
>  AppendUTF16toUTF8(const nsAString& aSource, nsACString& aDest,
>                    const mozilla::fallible_t& aFallible)
>  {
> +  int32_t firstNon8bit = FirstNonUTF8Bit(aSource.BeginReading(), aSource.EndReading());
> +  //int32_t firstNon8bit = 0;

Commented out things should go away.

@@ +341,5 @@
> +    if (!aDest.SetCapacity(aDest.Length() + aSource.Length(), aFallible)) {
> +      return false;
> +    }
> +
> +    LossyAppendUTF16toASCII(aSource, aDest);

Cool, this is already SSE2-optimized.  I think we could almost stand to re-reroll the inner loop for LossyConvertEncoding16to8::write_sse2 to walk 16 16-bit characters at a time rather than 32, or even just 8 characters, as small strings are overwhelmingly common.  Or maybe 8 characters at a time for strings < 64 characters and 32 characters otherwise.

::: xpcom/tests/gtest/TestUTF.cpp
@@ +25,5 @@
>  {
>    for (unsigned int i = 0; i < ArrayLength(ValidStrings); ++i) {
>      nsDependentCString str8(ValidStrings[i].m8);
>      nsDependentString str16(ValidStrings[i].m16);
> +    printf("Testing string %d: %s\n", (int)i, str8.get());

Probably want to get rid of this.

@@ +122,5 @@
>    }
>  #endif
>  }
>  
> +class UTFPerfTest : public ::testing::Test {

It's hard to tell where the performance testing in all of this is; I think we should rename this.

@@ +175,5 @@
> +TEST_F(UTFPerfTest, AppendUTF16to8Huge)
> +{
> +  for (size_t i = 0; i < 200; i++) {
> +    nsCString test;
> +    AppendUTF16toUTF8(huge_string, test);

We should not only be testing actual appending, we should also ensure that the newly-created UTF8 string has the expected contents.  A more structured test might also check what happens when the non-8bit-UTF8 character is at the end of the string, or all possible positions of such a charcter in an SSE register.  So I'm not sure if size--ever if that's convenient for performance tests--is really the right thing to be differentiating our tests on here.
Attachment #8789870 - Flags: feedback?(nfroyd) → feedback+
Testing on my i7-4790K, Debian 7.0, I see roughly the same speedups that Eric sees in comment 2.  The notable exceptions are:

* the *Small cases are much closer, on the order of 1% difference;
* the Huge tests is more like ~4.2X faster, not 5X+.
(Oh, GCC 4.9.2, if that makes any difference.)
(In reply to Nathan Froyd [:froydnj] from comment #6)
> (Oh, GCC 4.9.2, if that makes any difference.)

I tested with a debug build produced by clang 3.8.1 on Ubuntu 16.04 w/ a Core i7-4770 CPU.
With further testing it seems that 16 characters is where we consistently get win in both the fast path and early bailout cases. With that in mind my proposal is to only attempt the fast path if |aSource.Length() >= 16|. We could go as low as 10-12 characters where the performance of the fast path was still greater.

Testing against the top 15 sites in alexa (google, youtube, amazon, baidu, etc) I found that > 50% of calls (~100K) in my tests were with strings >= 16 chars. 204,974 of the conversions took the fast path, 168 took the slow path. Given that stat we might consider a lower character threshold.

We also see about 10% of conversion calls only have 1 character. It might be worth pursuing optimization for that path as well, but I'll defer that to a follow-up.
Here's something else for you to benchmark, too:

https://bugzilla.mozilla.org/show_bug.cgi?id=1295762#c17
Flags: needinfo?(erahm)
(In reply to Nathan Froyd [:froydnj] from comment #9)
> Here's something else for you to benchmark, too:
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1295762#c17

Tested with just that optimization (aSource.Length() == count). It's a bit faster than without, but nowhere near the speedup we see.

> Huge == 50M char Large == 5K char Med == 512 char Small == 5 char
> 
>      TEST                            Non-Patch   With Patch   Speedup
>      ====                            =========   ==========   =======
> UTFPerfTest.AppendUTF16to8Huge       26163 ms     20078 ms       1.30X
> UTFPerfTest.AppendUTF16to8Large       2443 ms      1728 ms       1.41X
> UTFPerfTest.AppendUTF16to8Med         2639 ms      2004 ms       1.32X
> UTFPerfTest.AppendUTF16to8Small       2568 ms      2972 ms       0.86X
> UTFPerfTest.AppendUTF16to8HugeBail   26071 ms     26116 ms       0.99X
> UTFPerfTest.AppendUTF16to8LargeBail   2446 ms      2465 ms       0.99X
> UTFPerfTest.AppendUTF16to8MedBail     2654 ms      2691 ms       0.98X
> UTFPerfTest.AppendUTF16to8SmallBail   2598 ms      2598 ms       1.00X
Flags: needinfo?(erahm)
I added another optimization in the slow path inspired by the |aSource.Length() == count| optimization. We can still do a lossy convert up to the first non-ascii character even in the slow path. That shows a nice speedup as well:

> Huge == 50M char Large == 5K char Med == 512 char Small == 5 char
> 
>      TEST                            Non-Patch   With Patch   Speedup
>      ====                            =========   ==========   =======
> UTFPerfTest.AppendUTF16to8HugeBail   26071 ms     15577 ms     1.67X
> UTFPerfTest.AppendUTF16to8LargeBail   2446 ms      1332 ms     1.86X
> UTFPerfTest.AppendUTF16to8MedBail     2654 ms      1648 ms     1.61X
> UTFPerfTest.AppendUTF16to8SmallBail   2598 ms      2722 ms     0.95X

I should have a cleaned up version of the patch up shortly.
This adds an SSE2 optimized path for UTF-16 strings that can be trivially
converted to UTF-8 (they contain no characters > 0X7F). It has a
speedup for the case where many of the leading characters are < 0X80 and
should have the same performance in the worst-case scenario where the first
character is > 0X7F.

This is a cleaner implementation than before. I moved the SSE function to it's
own file, cleaned up the types used, and added a more thorough test.
Attachment #8791447 - Flags: review?(nfroyd)
Attachment #8789870 - Attachment is obsolete: true
Comment on attachment 8791447 [details] [diff] [review]
WIP: Improve UTF-16 to UTF-8 conversion speed

Review of attachment 8791447 [details] [diff] [review]:
-----------------------------------------------------------------

This is much nicer than the previous attempt.  r=me with a few comments below.

::: xpcom/string/nsReadableUtils.cpp
@@ +244,5 @@
>  bool
>  AppendUTF16toUTF8(const nsAString& aSource, nsACString& aDest,
>                    const mozilla::fallible_t& aFallible)
>  {
> +  const nsAString::size_type kFastPathMinLength = 16;

We should say something about how this constant was determined.

::: xpcom/string/nsReadableUtilsImpl.h
@@ +20,5 @@
> +  return reinterpret_cast<const char16_t*>(
> +      reinterpret_cast<const uintptr_t>(aPtr) & ~aMask);
> +}
> +
> +template<size_t size> struct NonASCIIParameters;

Maybe:

/**
 * Structures for word-sized vectorization of ASCII checking for Unicode strings.
 */

?

::: xpcom/tests/gtest/TestUTF.cpp
@@ +142,5 @@
> +  auto str_buff = asciiString.BeginWriting();
> +  auto cstr_buff = asciiCString.BeginWriting();
> +  for (size_t i = 0; i < kTestSize; i++) {
> +    str_buff[i] = i % kMaxASCII;
> +    cstr_buff[i] = i % kMaxASCII;

Worth noting that this will only give you 0..126, inclusive, rather than the full ASCII range.
Attachment #8791447 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #14)
> Comment on attachment 8791447 [details] [diff] [review]
> WIP: Improve UTF-16 to UTF-8 conversion speed
> 
> Review of attachment 8791447 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is much nicer than the previous attempt.  r=me with a few comments
> below.
> 
> ::: xpcom/string/nsReadableUtils.cpp
> @@ +244,5 @@
> >  bool
> >  AppendUTF16toUTF8(const nsAString& aSource, nsACString& aDest,
> >                    const mozilla::fallible_t& aFallible)
> >  {
> > +  const nsAString::size_type kFastPathMinLength = 16;
> 
> We should say something about how this constant was determined.

I'll add a note.

> 
> ::: xpcom/string/nsReadableUtilsImpl.h
> @@ +20,5 @@
> > +  return reinterpret_cast<const char16_t*>(
> > +      reinterpret_cast<const uintptr_t>(aPtr) & ~aMask);
> > +}
> > +
> > +template<size_t size> struct NonASCIIParameters;
> 
> Maybe:
> 
> /**
>  * Structures for word-sized vectorization of ASCII checking for Unicode
> strings.
>  */
> 
> ?

Sounds good, will update.

> 
> ::: xpcom/tests/gtest/TestUTF.cpp
> @@ +142,5 @@
> > +  auto str_buff = asciiString.BeginWriting();
> > +  auto cstr_buff = asciiCString.BeginWriting();
> > +  for (size_t i = 0; i < kTestSize; i++) {
> > +    str_buff[i] = i % kMaxASCII;
> > +    cstr_buff[i] = i % kMaxASCII;
> 
> Worth noting that this will only give you 0..126, inclusive, rather than the
> full ASCII range.

Oh right I'm using mod! I'll make kMaxASCII 0x80.
https://hg.mozilla.org/mozilla-central/rev/3ea86c2d5e56
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.