1301742 - Improve performance of UTF16 -> UTF8 conversion

Assignee

Description

•

8 years ago

In bug 1287292 we saw significant overhead in the UTF16 -> UTF8 conversion for a XHR post using strings. I propose we add an optimized path for strings that can be represented in UTF8 without conversion (all characters < 0x80). One option is to update |AppendUTF16toUTF8| [1] by using a modified version of the existing SSE2 optimized path in nsTextFragment [2]. [1] http://searchfox.org/mozilla-central/rev/124c120ce9d7e8407c9ad330a9d6b1c4ad432637/xpcom/string/nsReadableUtils.cpp#180 [2] http://searchfox.org/mozilla-central/rev/124c120ce9d7e8407c9ad330a9d6b1c4ad432637/dom/base/nsTextFragmentSSE2.cpp#27

Eric Rahm [:erahm]

Assignee

Comment 1

•

8 years ago

Attached patch WIP: Improve UTF-16 to UTF-8 conversion speed (obsolete) — Details — Splinter Review

This adds an SSE2 optimized path for UTF-16 strings that can be trivially converted to UTF-8 (they contain no characters > 0X7F). It has a speedup for the case where many of the leading characters are < 0X80 and should have the same performance in the worst-case scenario where the first character is > 0X7F. Nathan let me know what you think of this approach. The best I can tell it's no slower than the current implementation and much faster for the trivial conversion case. I'll follow up with stats from the gtest.

Attachment #8789870 - Flags: feedback?(nfroyd)

Eric Rahm [:erahm]

Assignee

Updated

•

8 years ago

Assignee: nobody → erahm

Status: NEW → ASSIGNED

Eric Rahm [:erahm]

Assignee

Comment 2

•

8 years ago

Test results on my Ubuntu 16.04 64-bit machine. The 'bail' tests are with a character > 0x7F in the middle. There's a slight slowdown for the 5 char test, we might want to only take the optimized path for strings > X length. > Huge == 50M char Large == 5K char Med == 512 char Small == 5 char > > TEST Non-Patch With Patch Speedup > ==== ========= ========== ======= > UTFPerfTest.AppendUTF16to8Huge 26163 ms 5014 ms 5.22X > UTFPerfTest.AppendUTF16to8Large 2443 ms 190 ms 12.86X > UTFPerfTest.AppendUTF16to8Med 2639 ms 482 ms 5.47X > UTFPerfTest.AppendUTF16to8Small 2568 ms 2709 ms 0.94X > UTFPerfTest.AppendUTF16to8HugeBail 26071 ms 18465 ms 1.42X > UTFPerfTest.AppendUTF16to8LargeBail 2446 ms 1694 ms 1.44X > UTFPerfTest.AppendUTF16to8MedBail 2654 ms 1944 ms 1.36X > UTFPerfTest.AppendUTF16to8SmallBail 2598 ms 2721 ms 0.95X

Eric Rahm [:erahm]

Assignee

Comment 3

•

8 years ago

Doing a test run of startup, loading techcrunch.com, signing in to inbox.gmail.com and clicking a few emails showed 50,580 uses of the fast path and 1 use of the slow path. Almost all were < 32K characters (a few outliers in the 300K range). 98% were less than 302 chars. I get the feeling the sweet spot for improved perf is somewhere around 16-32 characters. I'll drill down on smaller size classes.

Nathan Froyd [:froydnj]

Comment 4

•

8 years ago

Comment on attachment 8789870 [details] [diff] [review] WIP: Improve UTF-16 to UTF-8 conversion speed Review of attachment 8789870 [details] [diff] [review]: ----------------------------------------------------------------- Looks reasonable. A few comments here and there. ::: xpcom/string/nsReadableUtils.cpp @@ +19,5 @@ > +#include <algorithm> > + > +#include <stdint.h> > + > +template<size_t size> struct Non8BitParameters; This should probably be named something like NonUTF8BitParameters? @@ +43,5 @@ > + > +#ifdef MOZILLA_MAY_SUPPORT_SSE2 > +namespace mozilla { > + namespace SSE2 { > + int32_t FirstNon8Bit(const char16_t *str, const char16_t *end); This declaration and whole block seems dead. @@ +48,5 @@ > + } // namespace SSE2 > +} // namespace mozilla > +#endif > + > +//#if 0 Dead #if. @@ +333,5 @@ > AppendUTF16toUTF8(const nsAString& aSource, nsACString& aDest, > const mozilla::fallible_t& aFallible) > { > + int32_t firstNon8bit = FirstNonUTF8Bit(aSource.BeginReading(), aSource.EndReading()); > + //int32_t firstNon8bit = 0; Commented out things should go away. @@ +341,5 @@ > + if (!aDest.SetCapacity(aDest.Length() + aSource.Length(), aFallible)) { > + return false; > + } > + > + LossyAppendUTF16toASCII(aSource, aDest); Cool, this is already SSE2-optimized. I think we could almost stand to re-reroll the inner loop for LossyConvertEncoding16to8::write_sse2 to walk 16 16-bit characters at a time rather than 32, or even just 8 characters, as small strings are overwhelmingly common. Or maybe 8 characters at a time for strings < 64 characters and 32 characters otherwise. ::: xpcom/tests/gtest/TestUTF.cpp @@ +25,5 @@ > { > for (unsigned int i = 0; i < ArrayLength(ValidStrings); ++i) { > nsDependentCString str8(ValidStrings[i].m8); > nsDependentString str16(ValidStrings[i].m16); > + printf("Testing string %d: %s\n", (int)i, str8.get()); Probably want to get rid of this. @@ +122,5 @@ > } > #endif > } > > +class UTFPerfTest : public ::testing::Test { It's hard to tell where the performance testing in all of this is; I think we should rename this. @@ +175,5 @@ > +TEST_F(UTFPerfTest, AppendUTF16to8Huge) > +{ > + for (size_t i = 0; i < 200; i++) { > + nsCString test; > + AppendUTF16toUTF8(huge_string, test); We should not only be testing actual appending, we should also ensure that the newly-created UTF8 string has the expected contents. A more structured test might also check what happens when the non-8bit-UTF8 character is at the end of the string, or all possible positions of such a charcter in an SSE register. So I'm not sure if size--ever if that's convenient for performance tests--is really the right thing to be differentiating our tests on here.

Attachment #8789870 - Flags: feedback?(nfroyd) → feedback+

Nathan Froyd [:froydnj]

Comment 5

•

8 years ago

Testing on my i7-4790K, Debian 7.0, I see roughly the same speedups that Eric sees in comment 2. The notable exceptions are: * the *Small cases are much closer, on the order of 1% difference; * the Huge tests is more like ~4.2X faster, not 5X+.

Nathan Froyd [:froydnj]

Comment 6

•

8 years ago

(Oh, GCC 4.9.2, if that makes any difference.)

Eric Rahm [:erahm]

Assignee

Comment 7

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #6) > (Oh, GCC 4.9.2, if that makes any difference.) I tested with a debug build produced by clang 3.8.1 on Ubuntu 16.04 w/ a Core i7-4770 CPU.

Eric Rahm [:erahm]

Assignee

Comment 8

•

8 years ago

With further testing it seems that 16 characters is where we consistently get win in both the fast path and early bailout cases. With that in mind my proposal is to only attempt the fast path if |aSource.Length() >= 16|. We could go as low as 10-12 characters where the performance of the fast path was still greater. Testing against the top 15 sites in alexa (google, youtube, amazon, baidu, etc) I found that > 50% of calls (~100K) in my tests were with strings >= 16 chars. 204,974 of the conversions took the fast path, 168 took the slow path. Given that stat we might consider a lower character threshold. We also see about 10% of conversion calls only have 1 character. It might be worth pursuing optimization for that path as well, but I'll defer that to a follow-up.

Nathan Froyd [:froydnj]

Comment 9

•

8 years ago

Here's something else for you to benchmark, too: https://bugzilla.mozilla.org/show_bug.cgi?id=1295762#c17

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Assignee

Comment 10

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #9) > Here's something else for you to benchmark, too: > > https://bugzilla.mozilla.org/show_bug.cgi?id=1295762#c17 Tested with just that optimization (aSource.Length() == count). It's a bit faster than without, but nowhere near the speedup we see. > Huge == 50M char Large == 5K char Med == 512 char Small == 5 char > > TEST Non-Patch With Patch Speedup > ==== ========= ========== ======= > UTFPerfTest.AppendUTF16to8Huge 26163 ms 20078 ms 1.30X > UTFPerfTest.AppendUTF16to8Large 2443 ms 1728 ms 1.41X > UTFPerfTest.AppendUTF16to8Med 2639 ms 2004 ms 1.32X > UTFPerfTest.AppendUTF16to8Small 2568 ms 2972 ms 0.86X > UTFPerfTest.AppendUTF16to8HugeBail 26071 ms 26116 ms 0.99X > UTFPerfTest.AppendUTF16to8LargeBail 2446 ms 2465 ms 0.99X > UTFPerfTest.AppendUTF16to8MedBail 2654 ms 2691 ms 0.98X > UTFPerfTest.AppendUTF16to8SmallBail 2598 ms 2598 ms 1.00X

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Assignee

Comment 11

•

8 years ago

I added another optimization in the slow path inspired by the |aSource.Length() == count| optimization. We can still do a lossy convert up to the first non-ascii character even in the slow path. That shows a nice speedup as well: > Huge == 50M char Large == 5K char Med == 512 char Small == 5 char > > TEST Non-Patch With Patch Speedup > ==== ========= ========== ======= > UTFPerfTest.AppendUTF16to8HugeBail 26071 ms 15577 ms 1.67X > UTFPerfTest.AppendUTF16to8LargeBail 2446 ms 1332 ms 1.86X > UTFPerfTest.AppendUTF16to8MedBail 2654 ms 1648 ms 1.61X > UTFPerfTest.AppendUTF16to8SmallBail 2598 ms 2722 ms 0.95X I should have a cleaned up version of the patch up shortly.

Eric Rahm [:erahm]

Assignee

Comment 12

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cd8c88dbae19

Eric Rahm [:erahm]

Assignee

Comment 13

•

8 years ago

Attached patch WIP: Improve UTF-16 to UTF-8 conversion speed — Details — Splinter Review

This adds an SSE2 optimized path for UTF-16 strings that can be trivially converted to UTF-8 (they contain no characters > 0X7F). It has a speedup for the case where many of the leading characters are < 0X80 and should have the same performance in the worst-case scenario where the first character is > 0X7F. This is a cleaner implementation than before. I moved the SSE function to it's own file, cleaned up the types used, and added a more thorough test.

Attachment #8791447 - Flags: review?(nfroyd)

Eric Rahm [:erahm]

Assignee

Updated

•

8 years ago

Attachment #8789870 - Attachment is obsolete: true

Nathan Froyd [:froydnj]

Comment 14

•

8 years ago

Comment on attachment 8791447 [details] [diff] [review] WIP: Improve UTF-16 to UTF-8 conversion speed Review of attachment 8791447 [details] [diff] [review]: ----------------------------------------------------------------- This is much nicer than the previous attempt. r=me with a few comments below. ::: xpcom/string/nsReadableUtils.cpp @@ +244,5 @@ > bool > AppendUTF16toUTF8(const nsAString& aSource, nsACString& aDest, > const mozilla::fallible_t& aFallible) > { > + const nsAString::size_type kFastPathMinLength = 16; We should say something about how this constant was determined. ::: xpcom/string/nsReadableUtilsImpl.h @@ +20,5 @@ > + return reinterpret_cast<const char16_t*>( > + reinterpret_cast<const uintptr_t>(aPtr) & ~aMask); > +} > + > +template<size_t size> struct NonASCIIParameters; Maybe: /** * Structures for word-sized vectorization of ASCII checking for Unicode strings. */ ? ::: xpcom/tests/gtest/TestUTF.cpp @@ +142,5 @@ > + auto str_buff = asciiString.BeginWriting(); > + auto cstr_buff = asciiCString.BeginWriting(); > + for (size_t i = 0; i < kTestSize; i++) { > + str_buff[i] = i % kMaxASCII; > + cstr_buff[i] = i % kMaxASCII; Worth noting that this will only give you 0..126, inclusive, rather than the full ASCII range.

Attachment #8791447 - Flags: review?(nfroyd) → review+

Eric Rahm [:erahm]

Assignee

Comment 15

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f34caaf9a92c

Eric Rahm [:erahm]

Assignee

Comment 16

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #14) > Comment on attachment 8791447 [details] [diff] [review] > WIP: Improve UTF-16 to UTF-8 conversion speed > > Review of attachment 8791447 [details] [diff] [review]: > ----------------------------------------------------------------- > > This is much nicer than the previous attempt. r=me with a few comments > below. > > ::: xpcom/string/nsReadableUtils.cpp > @@ +244,5 @@ > > bool > > AppendUTF16toUTF8(const nsAString& aSource, nsACString& aDest, > > const mozilla::fallible_t& aFallible) > > { > > + const nsAString::size_type kFastPathMinLength = 16; > > We should say something about how this constant was determined. I'll add a note. > > ::: xpcom/string/nsReadableUtilsImpl.h > @@ +20,5 @@ > > + return reinterpret_cast<const char16_t*>( > > + reinterpret_cast<const uintptr_t>(aPtr) & ~aMask); > > +} > > + > > +template<size_t size> struct NonASCIIParameters; > > Maybe: > > /** > * Structures for word-sized vectorization of ASCII checking for Unicode > strings. > */ > > ? Sounds good, will update. > > ::: xpcom/tests/gtest/TestUTF.cpp > @@ +142,5 @@ > > + auto str_buff = asciiString.BeginWriting(); > > + auto cstr_buff = asciiCString.BeginWriting(); > > + for (size_t i = 0; i < kTestSize; i++) { > > + str_buff[i] = i % kMaxASCII; > > + cstr_buff[i] = i % kMaxASCII; > > Worth noting that this will only give you 0..126, inclusive, rather than the > full ASCII range. Oh right I'm using mod! I'll make kMaxASCII 0x80.

Eric Rahm [:erahm]

Assignee

Comment 17

•

8 years ago

Not seeing anything pop up in talos [1], I think we're good to land. [1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=247f805a03e2&newProject=try&newRevision=70de461594486a094c1acb5b1e0403210aaadf8a&framework=1&showOnlyImportant=0

Eric Rahm [:erahm]

Assignee

Comment 18

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/3ea86c2d5e56edf346cc0e4e4f184fb9f5a65164 Bug 1301742 - Improve UTF-16 to UTF-8 conversion speed. r=froydnj

Carsten Book [:Tomcat]

Comment 19

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3ea86c2d5e56

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox51: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla51

WIP: Improve UTF-16 to UTF-8 conversion speed 8 years ago Eric Rahm [:erahm] 11.34 KB, patch	froydnj : feedback+	Details \| Diff \| Splinter Review
WIP: Improve UTF-16 to UTF-8 conversion speed 8 years ago Eric Rahm [:erahm] 17.16 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review