Closed Bug 585538 Opened 9 years ago Closed 9 years ago

Use SIMD UTF8 to UTF16 code on Linux 32-bit

Categories

(Core :: String, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

(Keywords: perf)

Attachments

(1 file, 5 obsolete files)

I haven't tested, but I think MOZILLA_COMPILE_WITH_SSE2 is defined on all 64-bit platforms and on all Intel Mac platforms.  I hope it's defined on Windows 32.  But I don't think it's defined on Linux 32.

We currently won't run the vectorized version of Convert_ascii_run() in nsUTF8TOUnicode.cpp wherever MOZILLA_COMPILE_WITH_SSE2 is not defined.

Until we can work around the GCC bugs described in SSE.h, my understanding is that we can only use SSE intrinsics inside files compiled with -msse2.  This flag gives GCC free reign to use SSE2 instructions wherever it wants, so it's not correct to compile nsUTF8ToUnicode.cpp with -msse2.  Instead, we'd need to move the intrinsics code into a separate file and use a CPUID call to decide whether to call into that file or run the unvectorized code.
Keywords: perf
> I hope it's defined on Windows 32

It's not at the moment, no, since last we checked a nontrivial number of users was on hardware with no SSE2 support.

We've been considering changing that for Gecko 2.0, but there's a separate tracking bug for that, right?  Given that, and assuming we decide to do so, is this bug still relevant?
Assuming we require SSE2 on all platforms, this bug is no longer relevant.
Shaver says he doesn't think the bug for requiring SSE2 on all platforms has been filed yet.
(In reply to comment #1)
> > I hope it's defined on Windows 32
> 
> It's not at the moment, no, since last we checked a nontrivial number of users
> was on hardware with no SSE2 support.

This suggests otherwise?

  http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/SSE.h#348

Perhaps we can turn on COMPILE_WITH_SSE2 because the equivalent of -msse2 on MSVC doesn't give MSVC free reign to put SSE2 instructions wherever it wants?
There's a big difference between dynamically checking for the extended instruction sets (which is what that code is for) and requiring them. We do dynamic detection in a number of places, and decided to centralize that code in SSE.h (bug 513422)
bsmedberg, the issue is that GCC doesn't let you use sse2 intrinsics inside a file which wasn't compiled with -msse2, even if the intrinsics are guarded by a CPUID check.  The code I linked above suggests to me that perhaps MSVC does allow this.

MOZILLA_COMPILE_WITH_SSE2 is (as I understand) a macro which indicates whether the compiler will accept intrinsics in your source file.  But you still have to check the result of CPUID.

Am I misunderstanding this?
Depends on: 585708
Depends on: 585818
Attached patch Patch v1 (obsolete) — Splinter Review
Here's a patch which builds on top of the patches in bug 585818 and bug 585708.
Assignee: nobody → justin.lebar+bug
Status: NEW → ASSIGNED
Attachment #464243 - Flags: review?(vladimir)
Attached patch Patch v1.1 (obsolete) — Splinter Review
Whitespace change.
Attachment #464243 - Attachment is obsolete: true
Attachment #464453 - Flags: review?(vladimir)
Attachment #464243 - Flags: review?(vladimir)
Attached patch Patch v1.2 (obsolete) — Splinter Review
Build changes: Now based on patch v2 in bug 585818 (and patch v1 in bug
585708).  Sorry for the churn.
Attachment #464453 - Attachment is obsolete: true
Attachment #464560 - Flags: review?(vladimir)
Attachment #464453 - Flags: review?(vladimir)
Attachment #464560 - Flags: review?(vladimir) → review?(me)
This is just a build change, so I don't need vlad to review it.  Khuey, you up for it?
Comment on attachment 464560 [details] [diff] [review]
Patch v1.2

>diff --git a/intl/uconv/src/Makefile.in b/intl/uconv/src/Makefile.in
>+endif
> 
>+# Are we targeting x86-32 or x86-64?
>+ifneq (,$(filter x86 x86_64,$(CPU_ARCH)))
>+  CPPSRCS			+= nsUTF8ToUnicodeSSE2.cpp
>+
>+  # This file uses SSE2 intrinsics, so we need to pass -msse2 if we're using gcc.
>+  ifdef __GNUC__
>+    nsUTF8ToUnicodeSSE2.$(OBJ_SUFFIX): MODULE_OPTIMIZE_FLAGS += -msse2
>+  endif
> endif

Patience diff ftw.  Nit: don't indent anything here; we generally don't indent in makefiles unless we have to.  Also, could we put a comment with a link to the gcc bug (or summary of the situation in our Bugzilla) here so it's clear why we're doing this?

>@@ -242,23 +156,38 @@ Convert_ascii_run (const char *&src,
>   dst = (PRUnichar *) dst32;
> 
> finish:
>   while (len-- > 0 && (*src & 0x80U) == 0) {
>     *dst++ = (PRUnichar) *src++;
>   }
> }
> 
>-#else /* generic code */
>+#else
>+
>+#ifdef MOZILLA_MAY_SUPPORT_SSE2
>+namespace mozilla {
>+  namespace SSE2 {
>+    void Convert_ascii_run(const char *&src, PRUnichar *&dst, PRInt32 len);
>+  }
>+}
>+#endif

Nit: I believe the style is to not indent for the namespace declaration.  E.g.

namespace foo {
namespace bar {

int baz;

}
}

>diff --git a/intl/uconv/src/nsUTF8ToUnicodeSSE2.cpp b/intl/uconv/src/nsUTF8ToUnicodeSSE2.cpp
>new file mode 100644
>--- /dev/null
>+++ b/intl/uconv/src/nsUTF8ToUnicodeSSE2.cpp
>@@ -0,0 +1,92 @@
>+// This file should only be compiled if you're on x86 or x86_64.  Additionally,
>+// you'll need to compile this file with -msse2 if you're using gcc.
>+
>+#include <emmintrin.h>
>+#include "nscore.h"
>+
>+namespace mozilla {
>+namespace SSE2 {
>+
>+void
>+Convert_ascii_run(const char *&src,
>+                  PRUnichar  *&dst,
>+                  PRInt32      len)
>+{
>+  if (len > 15) {
>+    __m128i in, out1, out2;
>+    __m128d *outp1, *outp2;
>+    __m128i zeroes;
>+    PRUint32 offset;
>+
>+    // align input to 16 bytes
>+    while ((NS_PTR_TO_UINT32(src) & 15) && len > 0) {
>+      if (*src & 0x80U)
>+        return;
>+      *dst++ = (PRUnichar) *src++;
>+      len--;
>+    }
>+
>+    zeroes = _mm_setzero_si128();
>+
>+    offset = NS_PTR_TO_UINT32(dst) & 15;
>+
>+    // Note: all these inner loops have to break, not return; we need
>+    // to let the single-char loop below catch any leftover
>+    // byte-at-a-time ASCII chars, since this function must consume
>+    // all available ASCII chars before it returns
>+
>+    if (offset == 0) {
>+      while (len > 15) {
>+        in = _mm_load_si128((__m128i *) src);
>+        if (_mm_movemask_epi8(in))
>+          break;
>+        out1 = _mm_unpacklo_epi8(in, zeroes);
>+        out2 = _mm_unpackhi_epi8(in, zeroes);
>+        _mm_stream_si128((__m128i *) dst, out1);
>+        _mm_stream_si128((__m128i *) (dst + 8), out2);
>+        dst += 16;
>+        src += 16;
>+        len -= 16;
>+      }
>+    } else if (offset == 8) {
>+      outp1 = (__m128d *) &out1;
>+      outp2 = (__m128d *) &out2;
>+      while (len > 15) {
>+        in = _mm_load_si128((__m128i *) src);
>+        if (_mm_movemask_epi8(in))
>+          break;
>+        out1 = _mm_unpacklo_epi8(in, zeroes);
>+        out2 = _mm_unpackhi_epi8(in, zeroes);
>+        _mm_storel_epi64((__m128i *) dst, out1);
>+        _mm_storel_epi64((__m128i *) (dst + 8), out2);
>+        _mm_storeh_pd((double *) (dst + 4), *outp1);
>+        _mm_storeh_pd((double *) (dst + 12), *outp2);
>+        src += 16;
>+        dst += 16;
>+        len -= 16;
>+      }
>+    } else {
>+      while (len > 15) {
>+        in = _mm_load_si128((__m128i *) src);
>+        if (_mm_movemask_epi8(in))
>+          break;
>+        out1 = _mm_unpacklo_epi8(in, zeroes);
>+        out2 = _mm_unpackhi_epi8(in, zeroes);
>+        _mm_storeu_si128((__m128i *) dst, out1);
>+        _mm_storeu_si128((__m128i *) (dst + 8), out2);
>+        src += 16;
>+        dst += 16;
>+        len -= 16;
>+      }
>+    }
>+  }
>+
>+  // finish off a byte at a time
>+
>+  while (len-- > 0 && (*src & 0x80U) == 0) {
>+    *dst++ = (PRUnichar) *src++;
>+  }
>+}
>+
>+} // namespace SSE2
>+} // namespace mozilla

Needs a trilicense header with Mozilla Foundation as the Initial Developer.

r=me w/that.
Attachment #464560 - Flags: review?(me) → review+
Attachment #464560 - Flags: approval2.0?
> Also, could we put a comment with a link to the gcc bug (or summary of the 
> situation in our Bugzilla) here so it's clear why we're [passing -msse2 to 
> gcc?]

Sure, here's the summary:  gcc won't generate code which uses SSE2 instructions unless you pass -msse2.  That's the rule even if you explicitly use SSE2 intrinsics and I believe if you list SSE2 instructions in inline asm.

Conversely, gcc takes -msse2 as license to insert SSE2 instructions anywhere in the compilation unit, not just where you use SSE2 intrinsics or assembly instructions.

Thus we need to separate out into separate files functions which use SSE2.
Attachment #464560 - Flags: approval2.0? → approval2.0+
Argh, it looks like the change I made to the Makefile to add -msse2 has no effect.  The build worked on my Linux64 box since -m64 implies -msse2.  I could have sworn I pushed to try and saw it work there on Linux32, but I must have been mistaken (or perhaps I didn't push the right patch queue, so INTEL_ARCHITECTURE wasn't defined and the SSE2 file wasn't included in the build).

In any case, the question that now faces me is: Why does [1] work while

  nsUTF8ToUnicode.$(OBJ_SUFFIX): MODULE_OPTIMIZE_FLAGS=-msse2

doesn't?  I've messed around with it for a while to no avail.

If we can't figure it out, we could put the SSE2 file in a separate directory -- that should definitely work.  But that seems quite silly to me.

Anyone have a guess as to how I might get the per-file flags to work?

[1] http://mxr.mozilla.org/mozilla-central/source/js/src/Makefile.in#635
My guess would be that the command line includes a previous -m<foo> directive before or after (whatever takes precedence) that causes gcc to ignore -msse2.
(In reply to comment #14)
> My guess would be that the command line includes a previous -m<foo> directive
> before or after (whatever takes precedence) that causes gcc to ignore -msse2.

That's a good guess, but not the case.  -msse2 doesn't show up on the command-line at all.
Attached patch Patch v1.3 (obsolete) — Splinter Review
Okay, I tested this one in a 32-bit VM and verified that it generates the SSE2 object files without error.

Kyle, does this look OK to you?
Attachment #464560 - Attachment is obsolete: true
Attachment #468472 - Flags: review?(khuey)
Comment on attachment 468472 [details] [diff] [review]
Patch v1.3

Oops; that's the wrong file.  Just a minute.
Attachment #468472 - Flags: review?(khuey)
Attached patch Patch v1.3a (obsolete) — Splinter Review
Attachment #468472 - Attachment is obsolete: true
Attachment #468476 - Flags: review?(khuey)
Comment on attachment 468476 [details] [diff] [review]
Patch v1.3a

Requesting approval.

This is a hot function on startup; see bug 506430.  This patch is safe since it doesn't add any new code; it just moves things around so the existing code is accessible on GCC-based systems where we can't presume SSE2 support (i.e., on Linux32).
Attachment #468476 - Flags: approval2.0?
Attachment #468476 - Attachment is obsolete: true
Attachment #468476 - Flags: approval2.0?
Attached patch Patch v1.4Splinter Review
Fixing Solaris compiler, I hope.  khuey, does this look OK?  Interdiff v1.3a -> v1.4 below.

> --- a/intl/uconv/src/Makefile.in
> +++ b/intl/uconv/src/Makefile.in
> @@ -82,17 +82,21 @@ endif
>  ifneq (,$(INTEL_ARCHITECTURE))
>  CPPSRCS			+= nsUTF8ToUnicodeSSE2.cpp
>  
>  # nsUTF8ToUnicodeSSE2.cpp uses SSE2 intrinsics, so we need to pass -msse2 if
>  # we're using gcc. (See bug 585538 comment 12.)
>  ifdef GNU_CC
>  nsUTF8ToUnicodeSSE2.$(OBJ_SUFFIX): CXXFLAGS+=-msse2
>  endif
> +
> +ifdef SOLARIS_SUNPRO_CXX
> +nsUTF8ToUnicodeSSE2.$(OBJ_SUFFIX): OS_CXXFLAGS += -xarch=sse2 -xO4
>  endif
> +endif
>  
>  ifeq ($(MOZ_WIDGET_TOOLKIT),os2)
>  CPPSRCS			+= nsOS2Charset.cpp
>  else
>  ifeq ($(MOZ_WIDGET_TOOLKIT),windows)
>  CPPSRCS			+= nsWinCharset.cpp
>  else
>  ifeq (cocoa,$(MOZ_WIDGET_TOOLKIT))
> @@ -102,22 +106,16 @@ ifeq ($(OS_ARCH),BeOS)
>  CPPSRCS			+= nsBeOSCharset.cpp
>  else
>  CPPSRCS			+= nsUNIXCharset.cpp
>  endif
>  endif
>  endif
>  endif
>  
> -ifeq (86,$(findstring 86,$(OS_TEST)))
> -ifdef SOLARIS_SUNPRO_CXX
> -nsUTF8ToUnicode.$(OBJ_SUFFIX): OS_CXXFLAGS += -xarch=sse2 -xO4
> -endif
> -endif
> -
>  EXTRA_DSO_LDOPTS = \
>  		../util/$(LIB_PREFIX)ucvutil_s.$(LIB_SUFFIX) \
>  		$(MOZ_UNICHARUTIL_LIBS) \
>  		$(MOZ_NECKO_UTIL_LIBS) \
>  		$(MOZ_COMPONENT_LIBS) \
>  		$(NULL)
>  
>  LOCAL_INCLUDES	= -I$(srcdir)/../util
Attachment #495337 - Flags: review?(khuey)
Hm, bzexport apparently canceled my own a2.0!  It was there on patch v1.3a; since this is a small change, I'd like to carry over the approval.
Comment on attachment 495337 [details] [diff] [review]
Patch v1.4

r=me on the interdiff
Attachment #495337 - Flags: review?(khuey) → review+
http://hg.mozilla.org/mozilla-central/rev/03a8f34b46e9
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.