Rework SSE.h so that it supports and encourages putting intrinsics in a file separate from the main code

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

Tracking

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

Right now SSE.h is defined around the assumption that code using CPU intrinsics will be defined in the same file as code not using intrinsics.

Unfortunately it's not currently possible to do this in gcc.  gcc requires that files which use SSE intrinsics be compiled with -msse.  This option gives gcc permission to use SSE instructions anywhere in that file.  So if you want to write code like this:

  if (cpu_supports_sse())
    do_vectorized();
  else
    do_unvectorized();

you need to put do_vectorized() in a separate file and compile it with -msse.  The same applies for any other set of intrinsics.

gcc nominally supports a pragma for getting around this limitation.  But it doesn't currently work [1] [2].

I think we should rework SSE.h so that it assumes that intrinsics will be placed in a file separate from the main code.  We can then fix up consumers of SSE.h (I think there's only one).  If gcc ever fixes this bug, we can revert back to the (much more sane) idiom currently supported by SSE.h.

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39787
[2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41201
Depends on: 513422
Blocks: 585538
Blocks: 582858
It might be better to have an additional set of macros for this use case without disturbing the current set, which would allow any caller to migrate between the two as desired without have to do a global switch at any one time.
The only consumer which is affected by this proposed change is nsUTF8ToUnicode, which I want to change to this new form anyway.

I think there's some value in removing the macros from SSE.h so as to encourage people to separate out their intrinsics into a separate file.
Summary: Rework SSE.h so that it supports putting intrinsics in a file separate from the main code → Rework SSE.h so that it supports and encourages putting intrinsics in a file separate from the main code
Posted patch Patch v1Splinter Review
I removed the code that's not useful until gcc fixes their #pragma target bugs.  I'm fine adding it back now, but I figure it'll be easy to get it back from hg when we want it.

I have a patch for fixing the consumer, which I'll post in bug 585538.
Assignee: nobody → justin.lebar+bug
Status: NEW → ASSIGNED
Attachment #464231 - Flags: review?(dbaron)
Blocks: 585978
No longer blocks: 582858
dbaron, do you have an idea when you'll be able to look at this?  I'd like to land bug 585538, but I need these changes.

Alternatively, perhaps someone else could review?
I'm going to try to get to it tomorrow, but if I don't it will likely be a while.  vlad could also review if he has time.
Vlad, do you think you'll have some time to look at this?  It's mostly just a comment change and the removal of a bunch of code that we can't use until GCC fixes #pragma target.
This is blocking two a+'ed perf improvements: bug 585978 and bug 586698.  Additionally, it blocks bug 585538, a startup improvement on Linux32.

Is there something I can do to speed up this review?
How does a code removal patch block those bugs?
The code in those bugs could be reworked to avoid landing this change.  But I don't think that's the right thing to do.

This patch isn't entirely code removal; it also adds the MOZILLA_MAY_SUPPORT_ABC macros, which serve as guards around the mozilla::supports_abc() functions.

We could just guard on INTEL_ARCHITECTURE (from bug 585818) instead of MOZILLA_MAY_SUPPORT_ABC, but I think that's not as good.  We'll run into trouble if we try to write an SSE3 function in the future, since we then wouldn't be able to disable compilation of that function on compilers which don't support SSE3 (e.g. VS2003).  It's better, I think, to explicitly guard on the ISA extension being used.

More importantly, the bugs from comment 7 need to be written to an idiom different from the one currently described in SSE.h in order to be compiled on Linux32.  This is why bug 585538 exists in the first place.  In addition to removing code, this patch changes the suggested usage of SSE.h so that SSE2 intrinsics can be compiled on Linux32.  The interface change is also necessary for us to use newer intrinsics on any gcc platform.

If we went ahead with the bugs from comment 7 without this patch, that code would be using SSE.h in a way very different from what the header file describes.  I don't think that's useful.

Since I originally posted this patch, SSE.h appears in three more source files:

 * gfx/thebes/gfxAlphaRecovery.cpp
 * gfx/ycbcr/yuv_row_win.cpp
 * gfx/ycbcr/yuv_convert.cpp

gfxAlphaRecovery and yuv_convert both have vectorized routines guarded by MOZILLA_COMPILE_WITH_SSE2, so these routines are not being compiled on Linux32.  I guess I don't understand why we shouldn't change this header file now, before it gets even more use in its current form.
Comment on attachment 464231 [details] [diff] [review]
Patch v1

Fine, r=dbaron even though I'd have preferred adding support for a new idiom alongside the existing ones rather than the removal.
Attachment #464231 - Flags: review?(dbaron) → review+
Comment on attachment 464231 [details] [diff] [review]
Patch v1

Actually, never mind, I should actually review this.
Attachment #464231 - Flags: review+ → review?(dbaron)
Comment on attachment 464231 [details] [diff] [review]
Patch v1

>+#if defined(__i386__) || defined(__x86_64__)
> 
>+// FIXME: This isn't strictly right.  For instance, VS2003 doesn't support
>+// SSE3.  But in order to get this right, we need to sync up the build system
>+// (which is including the files with specialized instructions) and these
>+// macros (which are suggesting that the compiler supports those instructions).
>+// Since we're not to my knowledge using anything other than SSE2 anywhere in
>+// our code, and since all our supported compilers support through SSE2, I
>+// think this is good enough for now.
> 
>+#define MOZILLA_MAY_SUPPORT_MMX 1
>+#define MOZILLA_MAY_SUPPORT_SSE 1
>+#define MOZILLA_MAY_SUPPORT_SSE2 1
>+#define MOZILLA_MAY_SUPPORT_SSE3 1
>+#define MOZILLA_MAY_SUPPORT_SSSE3 1
>+#define MOZILLA_MAY_SUPPORT_SSE4A 1
>+#define MOZILLA_MAY_SUPPORT_SSE4_1 1
>+#define MOZILLA_MAY_SUPPORT_SSE4_2 1
>+
> #endif

I think this bit is pretty broken, because I don't think that __i386__ and __x86_64__ are defined by MSVC, and perhaps not by some other compilers.

I think you should only define these macros in cases where you know you're defining mozilla::supports_sse*, etc., and I think you should do that by defining them separately, as appropriate, in each of the code blocks that defines those functions.

That would also allow you to fix your FIXME quite easily.

r=dbaron with that fixed
Attachment #464231 - Flags: review?(dbaron) → review+
Also note that we've officially dropped support for VS2003 on trunk, so don't worry about it.
https://developer.mozilla.org/en/Windows_Build_Prerequisites
(In reply to comment #12)
> I think this bit is pretty broken, because I don't think that __i386__ and
> __x86_64__ are defined by MSVC, and perhaps not by some other compilers.

Hm; you're totally right.

> I think you should only define these macros in cases where you know you're
> defining mozilla::supports_sse*, etc., and I think you should do that by
> defining them separately, as appropriate, in each of the code blocks that
> defines those functions.

Here's the relevant code, which is not guarded by any other ifdefs:

> #if defined(MOZILLA_PRESUME_SSE2)
>   inline bool supports_sse2() { return true; }
> #elif defined(MOZILLA_SSE_HAVE_CPUID_DETECTION)
>   inline bool supports_sse2() { return sse_private::sse2_enabled; }
> #else
>   inline bool supports_sse2() { return false; }
> #endif

Do you mean this should be written as

> #if defined(MOZILLA_PRESUME_SSE2)
>   #define MOZILLA_MAY_SUPPORT_SSE2
>   inline bool supports_sse2() { return true; }
> #elif defined(MOZILLA_SSE_HAVE_CPUID_DETECTION)
>   #define MOZILLA_MAY_SUPPORT_SSE2
>   inline bool supports_sse2() { return sse_private::sse2_enabled; }
> #else
>   inline bool supports_sse2() { return false; }
> #endif

That's fine with me, although I don't see how it solves the problem of the FIXME.
I was thinking something slightly different, which was defining them in the per-compiler sections next to where we currently define MOZILLA_PRESUME_*.  That said, you're right that it should also be conditioned on MOZILLA_SSE_HAVE_CPUID_DETECTION for the ones where we're not presuming it.

But what you suggest is probably better, even though it makes MSVC-specific ifdefs a drop more awkward.
Depends on: 616778
Depends on: 616782
Blocks: 616782
No longer depends on: 616782
Blocks: 616778
Depends on: 585818
No longer depends on: 616778
Depends on: 619178
MSVC 2005 fails to compile files which include both <intrin.h> and <windows.h> due to a bug in their header files.  Since SSE.h included <intrin.h>, this meant that we couldn't include SSE.h anywhere that included <windows.h>.

This patch moves the cpuid functions out of SSE.h and into SSE.cpp.  This way, we don't need to include <intrin.h> in SSE.h.
Attachment #498142 - Flags: review?(dbaron)
Comment on attachment 498142 [details] [diff] [review]
Part 2 v1: Move cpuid check out of SSE.h

r=dbaron
Attachment #498142 - Flags: review?(dbaron) → review+
Comment on attachment 464231 [details] [diff] [review]
Patch v1

Requesting approval; this blocks blockers.
Attachment #464231 - Flags: approval2.0?
Attachment #498142 - Flags: approval2.0?
Attachment #464231 - Flags: approval2.0? → approval2.0+
Attachment #498142 - Flags: approval2.0? → approval2.0+
http://hg.mozilla.org/mozilla-central/rev/803b3b17821f
http://hg.mozilla.org/mozilla-central/rev/a4b026cae738

\o/
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.