Last Comment Bug 513422 - add common code for decisions about compiling and using SSE* code
: add common code for decisions about compiling and using SSE* code
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: All All
: -- normal with 3 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 585711
Blocks: 585738 585708
  Show dependency treegraph
 
Reported: 2009-08-28 22:14 PDT by David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
Modified: 2010-08-09 14:50 PDT (History)
25 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch for review (15.34 KB, patch)
2009-09-01 09:24 PDT, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
no flags Details | Diff | Review
patch for review (15.50 KB, patch)
2009-09-01 09:31 PDT, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
vladimir: review-
Details | Diff | Review
Example workaround for gcc bug (504 bytes, text/plain)
2009-09-01 20:03 PDT, Kyle Huey [:khuey] (khuey@mozilla.com)
no flags Details
My equivalent to dbaron's patch (the third in the set of three) (1.87 KB, text/plain)
2009-09-02 16:24 PDT, Kyle Huey [:khuey] (khuey@mozilla.com)
no flags Details
patch (21.82 KB, patch)
2009-11-13 20:35 PST, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
no flags Details | Diff | Review
patch (27.70 KB, patch)
2009-11-27 10:15 PST, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
vladimir: review+
Details | Diff | Review
use SSE.h in the code that led me to start this (2.47 KB, patch)
2009-12-11 11:47 PST, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
vladimir: review+
Details | Diff | Review
un-breaks the build on openSUSE 11.0 by avoiding cpuid.h (2.31 KB, patch)
2010-03-04 03:40 PST, Julian Seward [:jseward]
no flags Details | Diff | Review

Description David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-08-28 22:14:58 PDT
I started to look into making the optimization on bug 506430 work on Linux, and quickly came to the conclusion that we ought to both consolidate and improve our code for:
 * whether to compile things like SSE2 code
 * whether to run such code (based on CPUID checks, etc.)

This will allow us to:
 * add such optimizations more easily in the future
 * get them running across platforms more quickly
 * respect the -march gcc option that says to assume that the processor has certain capabilities
 * be able to use emmintrin.h, etc., on Linux, even when compiling without an -march option that says those instructions can be assumed to be available

I have some work in progress on this in http://hg.mozilla.org/users/dbaron_mozilla.com/patches/file/tip/sse-tests but it's still a rough sketch, and I haven't started testing it yet, and have a bunch more changes I intend to make (such as copying the cpuid code from gfx/qcms/transform.c and then using it).
Comment 1 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-08-30 10:55:18 PDT
(In reply to comment #0)
>  * be able to use emmintrin.h, etc., on Linux, even when compiling without an
> -march option that says those instructions can be assumed to be available

This seems like it actually won't be possible.  It's possible to work around the fact that emmintrin.h, etc., have a #error if __SSE2__ etc. aren't defined; however, after I do that I end up with errors like:

/tools/gcc-4.1.1/lib/gcc/i686-pc-linux-gnu/4.1.2/include/mmintrin.h: In function ‘void _mm_empty()’:
/tools/gcc-4.1.1/lib/gcc/i686-pc-linux-gnu/4.1.2/include/mmintrin.h:48: error: ‘__builtin_ia32_emms’ was not declared in this scope

and the others in this log:
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1251576661.1251577221.11207.gz
Comment 2 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-08-31 14:31:02 PDT
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
(see also http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007btarget_007d-function-attribute-2355 for more details) seems like it ought to help with the problem in comment 1 in gcc 4.4 and newer.  However, I ran into two gcc bugs that prevent this from working:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39787
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41201

So I've given up on the idea of CPUID-based stuff on gcc-based platforms for now.

For Mac this isn't a big deal since the Intel Mac's default -march includes support for MMX, SSE, and SSE2 (since I guess that's the minimum Intel CPU that they've shipped Macs with).

Hopefully those gcc bugs will get fixed not too far in the future and we'll be able to give this another shot.  (The other alternative is to hand-write the assembly rather than using the intrinsics, but that seems quite painful, and it looks like things are pretty close to being able to avoid that.)
Comment 3 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-08-31 14:51:46 PDT
(In reply to comment #2)
> However, I ran into two gcc bugs that prevent this from
> working:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39787
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41201

The second is relatively easy to work around, though, if the first happens to get fixed first.
Comment 4 Kyle Huey [:khuey] (khuey@mozilla.com) 2009-09-01 09:23:30 PDT
(In reply to comment #2)
> For Mac this isn't a big deal since the Intel Mac's default -march includes
> support for MMX, SSE, and SSE2 (since I guess that's the minimum Intel CPU that
> they've shipped Macs with).

Yes, no Mac ever shipped with an Intel processor that didn't support at least SSE2 IIRC.  I think the first Intel Macs shipped with the Core microarchitecture, but its possible that some end model Pentiums slipped in there too.  SSE2 was first introduced in the Williamette P4s in 2001 IIRC, so any Intel Mac should be good.

Obviously we can't use SSE2 on PPC, I don't know anything about the PPC architecture but it may not be worth writing separate optimizations for an architecture which has basically been end-of-lifed by Apple.

With respect to GCC, there are a lot of long-standing problems using sse.  It looks like the first gcc bug you mentioned (39787) isn't getting too much traction so the assembly workaround may be what we need to do.  Writing an ASM function to determine support for MMX, SSE, SSE2 shouldn't be too difficult.
Comment 5 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-09-01 09:24:47 PDT
Created attachment 397885 [details] [diff] [review]
patch for review

Here's a patch for review with the pieces that are working for me now.

Relative to what the patch in bug 506430 does (once that code is changed to use this code, which I have in a separate patch), all this changes is that:

  1. we use a function using the cpuid intrinsic instead of __sse2_available on Windows

  2. we honor the -march or -msse* options on gcc-based platforms.  On Intel Mac, -march defaults to enabling MMX, SSE, SSE2, and SSE3, which means we're not changing anything from that patch on the default configuration on Mac.  It does, however, change what that patch does for x86_64 Linux, since the base x86_64 chips have MMX, SSE, and SSE2 (at least according to gcc's defaults for the architecture).

This also adds mechanisms for detection of a whole bunch of features other than SSE2.


I don't yet have a solution that allows CPUID-based use of *intrin.h on gcc; however, gcc 4.4 is relatively close to such a solution; see comment 2 for the bugs known to block this working (one of the two is easy to work around if it comes to that; see comment 3)

We could add additional macros to this header for whether it makes sense to use inline assembly-based solutions for gcc, but I haven't done that (yet, anyway).  It would be practical to do inline assembly with CPUID-based detection for gcc today, but the only platform it would help (given that what we're using is SSE2) is 32-bit Linux.


I'm requesting review from Benjamin, but I'd welcome comments from the rest of the cc: list as well.
Comment 6 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-09-01 09:31:56 PDT
Created attachment 397887 [details] [diff] [review]
patch for review

Also, the previous patch, plus the patch to use this code in nsUTF8ToUnicode, passed the try server (modulo usual random orange).

However, it also probably triggers warnings for unknown pragmas on older gccs, which this one should fix; it also improves a comment that I noticed I hadn't updated.
Comment 7 Vladimir Vukicevic [:vlad] [:vladv] 2009-09-01 15:53:58 PDT
This looks wrong -- it calls has_cpuid_bit for every has_* function.  It should really be initialized once in global xpcom init, and these should just be straight static variable returns.
Comment 8 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-09-01 18:32:06 PDT
(In reply to comment #7)
> This looks wrong -- it calls has_cpuid_bit for every has_* function.  It should
> really be initialized once in global xpcom init, and these should just be
> straight static variable returns.

I did consider that, but the work needed to load a global variable (dealing with PIC, etc.) didn't seem obviously less than the work needed to execute the CPUID instructions twice and check its results.  And doing the global variable thing is a good bit more work dealing with XPCOM glue, etc., so I did what was easier.
Comment 9 Kyle Huey [:khuey] (khuey@mozilla.com) 2009-09-01 19:13:44 PDT
(In reply to comment #4)
> With respect to GCC, there are a lot of long-standing problems using sse.  It
> looks like the first gcc bug you mentioned (39787) isn't getting too much
> traction so the assembly workaround may be what we need to do.  Writing an ASM
> function to determine support for MMX, SSE, SSE2 shouldn't be too difficult.

So it's pretty obvious that I didn't read the upstream bug carefully enough before writing this.  Using inline assembly instead of the intrinsics for the actual algorithms probably isn't a good idea.

In your testcase in the GCC bugzilla what seems to blow it up is the push/pop_options pragmas.  Specifically, if you move pop_options to after the intrinsic call it compiles fine.  

Obviously GCC shouldn't ICE, but on the other hand it's not entirely clear to me why your testcase *should* work.  If you've moved back out of targeting sse2 architectures by pop_optioning why shouldn't SSE2 intrinsics fail to compile?

We would have to do something like 
#include "sse.h"

code code code
#ifdef MOZ_COMPILE_WITH_ABC
MOZ_ENTER_ABC_MODE();
if (mozilla::use_abc())
{
do_abc();
} else {
MOZ_EXIT_ABC_MODE();
#endif
do_fallback();
#ifdef MOZ_COMPILE_WITH_ABC
}
#endif
code code code


Where 
#ifdef GCC_ON_LINUX
#define MOZ_ENTER_ABC_MODE _Pragma ("GCC push_options") \
                           _Pragma ("GCC target(\"ABC\") \
#define MOZ_EXIT_ABC_MODE  _Pramga ("GCC pop_options)
#else // GCC_ON_MAC_THROUGH_AT_LEAST_SSE2, MSVC.
#define MOZ_ENTER_ABC_MODE      ; // Making it clear this is a NOOP
#define MOZ_EXIT_ABC_MODE       ; // Making it clear this is a NOOP
#endif

(you have to use _Pragma because macros can't expand to preprocessor directives)

I'm sure I'm missing something, but it appears to me that something along these lines would work.  This has some other less than desirable side effects, for instance GCC target pragmas work at a function level so the do_abc / do_fallback stuff needs to be wrapped in an inline function, but it will compile and generate code at least ;-)
Comment 10 Kyle Huey [:khuey] (khuey@mozilla.com) 2009-09-01 19:16:03 PDT
Actually you only wrap do_abc in an inline function.  Doing something like this to your testcase makes it compile fine with the correctly emitted instructions.
Comment 11 Kyle Huey [:khuey] (khuey@mozilla.com) 2009-09-01 20:03:44 PDT
Created attachment 398071 [details]
Example workaround for gcc bug

For clarity.
Comment 12 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-09-02 11:12:56 PDT
I tried that, and still got an ICE on the real thing.  The patches I used for that attempt are:

http://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/ab43016742d5/optimize-utf8
http://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/ab43016742d5/sse-tests
http://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/ab43016742d5/utf8-use-sse-h
Comment 13 Kyle Huey [:khuey] (khuey@mozilla.com) 2009-09-02 14:33:55 PDT
(In reply to comment #12)
> I tried that, and still got an ICE on the real thing.  The patches I used for
> that attempt are:
> 
> http://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/ab43016742d5/optimize-utf8
> http://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/ab43016742d5/sse-tests
> http://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/ab43016742d5/utf8-use-sse-h

Compiler log?  Your macros are too complicated for me to reason through them :-D
Comment 14 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-09-02 15:54:14 PDT
What sort of log?  I just got an internal compiler error again.
Comment 15 Kyle Huey [:khuey] (khuey@mozilla.com) 2009-09-02 16:16:57 PDT
Hmm it compiles correctly for me.  Which function does it ICE in?  Convert_ascii_run_sse2_part?
Comment 16 Kyle Huey [:khuey] (khuey@mozilla.com) 2009-09-02 16:24:49 PDT
Created attachment 398254 [details]
My equivalent to dbaron's patch (the third in the set of three)
Comment 17 Vladimir Vukicevic [:vlad] [:vladv] 2009-09-02 17:17:02 PDT
(In reply to comment #8)
> I did consider that, but the work needed to load a global variable (dealing
> with PIC, etc.) didn't seem obviously less than the work needed to execute the
> CPUID instructions twice and check its results.  And doing the global variable
> thing is a good bit more work dealing with XPCOM glue, etc., so I did what was
> easier.

Sure, but this means executing a cpuid at the start of pretty much every hot loop where we want maximum performance... it seems worth it to avoid that, especially since I very much doubt the cpuid instruction is optimized (calling it involves spilling some registers at the very least, might also include stalls/flushes/whatever).
Comment 18 Vladimir Vukicevic [:vlad] [:vladv] 2009-09-02 17:28:50 PDT
(In reply to comment #0)
>  * be able to use emmintrin.h, etc., on Linux, even when compiling without an
> -march option that says those instructions can be assumed to be available

Is this really a requirement?  I thought that you had to use -march etc. to have the compiler generate those instructions, but that you could always include emmintrin.h and use the intrinsics.
Comment 19 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-09-02 17:43:04 PDT
(In reply to comment #18)
> (In reply to comment #0)
> >  * be able to use emmintrin.h, etc., on Linux, even when compiling without an
> > -march option that says those instructions can be assumed to be available
> 
> Is this really a requirement?

The alternative would be inline assembly, which gets painful when done for 32-bit and 64-bit in addition to the intrinsics (or another two pairs) for Windows.

> I thought that you had to use -march etc. to
> have the compiler generate those instructions, but that you could always
> include emmintrin.h and use the intrinsics.

You can't use emmintrin.h and the intrinsics without -march or the equivalent in pragmas.  I tried.
Comment 20 Benjamin Smedberg [:bsmedberg] 2009-09-04 11:46:23 PDT
Comment on attachment 397887 [details] [diff] [review]
patch for review

I'm going to redirect this review because I've only barely understood what it's used for.

Does this typically mean that we'll be compiling two or more versions of various functions using makefiles, e.g.

nsStringOptimizationsSSE2.cpp

or something similar, and this support code lets us choose those versions dynamically?
Comment 21 Vladimir Vukicevic [:vlad] [:vladv] 2009-09-04 12:49:41 PDT
Comment on attachment 397887 [details] [diff] [review]
patch for review

We really, really don't want to call cpuid here.  This is already in xpcom glue -- why can't we just place the static symbols inside xpcom?  The getters would just return the static values, and we can call some init function either via a static constructor or as part of xpcom init.

JS may want to share this code, in which case it can pull in its own copy; other than that, everything else should already be touching xpcom at some level (or be entirely independent, like cairo).
Comment 22 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-09-04 17:12:06 PDT
(In reply to comment #20)
> Does this typically mean that we'll be compiling two or more versions of
> various functions using makefiles, e.g.
> 
> nsStringOptimizationsSSE2.cpp
> 
> or something similar, and this support code lets us choose those versions
> dynamically?

No, it's just about using ifdefs within a single file for our own hand-written SSE2 code.  Example use is demonstrated in http://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/49b0628bde4f/utf8-use-sse-h

That said, we probably also could get benefits from cpuid-detected compiler-optimized code, but that's not what I was working on.

(In reply to comment #21)
> JS may want to share this code, in which case it can pull in its own copy;
> other than that, everything else should already be touching xpcom at some level
> (or be entirely independent, like cairo).

I think if we end up making it require symbols inside XPCOM, other callers like JS are far less likely to share the code.

I'm also not convinced it would be faster, but I'll hopefully have time to experiment with it in a few weeks.
Comment 23 Vladimir Vukicevic [:vlad] [:vladv] 2009-09-04 17:25:00 PDT
I'm not too worried about JS sharing the code tbh; if it wants the same functionality, they can pull in a local copy.  There are very few places that don't touch xpcom, and they're generally "special" in one way or another.  We could put this in the new-nspr-like-thing, too.
Comment 24 Jeff Walden [:Waldo] (remove +bmo to email) 2009-09-04 18:06:50 PDT
Given that these decisions would be made within nanojit, which I believe is *extremely* unlikely to ever depend on anything in Mozilla's codebase (CC addends please chime in if you think otherwise), I wouldn't give much, if any, thought to what JS might want.
Comment 25 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-09-05 01:23:42 PDT
Indeed, don't worry about JS here.  We already have a solution to this problem, and our use of it is different anyway (check once and then generate appropriate code that's run many times, versus checking per hot-path run).  pixman likely also has its own mechanism, as will our image decoders, etc.  Make it easy for us to add new ones, but don't get hung up on JS sharing this stuff at the expense of making it cheap.

If/when we get some NSPR evolution, we could put the code there, since JS already depends on NSPR in some configuration.  For now, I recommend getting numbers on how much it costs to CPUID vs loading a symbol from the static xpcom glue -- in libxul builds I would hope that we could avoid inter-DSO operations here, really.
Comment 26 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-11-13 20:35:04 PST
Created attachment 412370 [details] [diff] [review]
patch

Revised to initialize *_available booleans inside the xpcom glue library as |extern| (following the example of gTLSIsMainThread).
Comment 27 Vladimir Vukicevic [:vlad] [:vladv] 2009-11-14 00:07:30 PST
This looks quite good, but one question... the suggested use:

+ *   #ifdef MOZILLA_COMPILE_WITH_ABC
+ *     if (mozilla::use_abc()) {
+ *       // abc-specific code here
+ *     } else {
+ *   #endif
+ *      // generic code here
+ *   #ifdef MOZILLA_COMPILE_WITH_ABC
+ *     }
+ *   #endif

feels pretty clunky.  What do you think about having use_abc() be guaranteed to exist, and just statically evaluate to false if not available (like your code does), and then have consumers do:

if (mozilla::use_abc()) {
#ifdef MOZILLA_COMPILE_WITH_ABC
  // abc
#endif
} else {
  // generic
}

That makes it much easier to do

if (mozilla::use_abc()) {
#ifdef ABC
  // abc
#endif
} else if (mozilla::use_def()) {
#ifdef DEF
  // def
#endif
} else {
  // generic
}

for when we have SSE2/MMX/generic code paths.  (It's really annoying that we need the ifdefs in there at all, but I don't know how to avoid them...)  If that sounds good to you, I think the only change needed is in the comment, as all the functions are already do the right thing.
Comment 28 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-11-14 01:22:21 PST
It actually requires more than fixing the comment; it requires making this header define functions for the non-x86 non-x86_64 case.  That said, that's not hard to do, and probably worth doing.
Comment 29 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-11-14 09:01:15 PST
And I'd need to make this:
+  // Note that this means there are cases where mozilla::use_abc() will
+  // return true even though we can't define MOZILLA_COMPILE_WITH_ABC.
not true anymore.
Comment 30 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-11-21 21:15:00 PST
I think a good way of doing both would be to actually define two sets of functions:

  mozilla::supports_abc would be what is currently mozilla::use_abc

and then I'd define mozilla::use_abc as:

#ifdef MOZILLA_COMPILE_WITH_ABC
  bool use_abc { return supports_abc(); }
#else
  bool use_abc { return false; }
#endif

This would allow the supports_abc() functions to be used by people who are hand-coding assembly, and leave the use_abc() functions for people who are using the builtins and MOZILLA_COMPILE_WITH_ABC.

That said, I'm not sure I like the names...
Comment 31 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-11-22 13:22:43 PST
I have a modified version of the patch, but I've modified enough that it needs a bit more testing again...
Comment 32 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-11-27 10:15:23 PST
Created attachment 414859 [details] [diff] [review]
patch
Comment 33 Vladimir Vukicevic [:vlad] [:vladv] 2009-11-30 10:59:43 PST
Comment on attachment 414859 [details] [diff] [review]
patch

Looks good; I like this approach.  Though, why do you need the static false/true variants in the .cpp file if the .h file is going to hardcode the values? (Which I agree that it should.)


>+#if defined(MOZILLA_PRESUME_MMX)
>+  bool mmx_enabled = true;
>+#elif defined(MOZILLA_SSE_HAVE_CPUID_DETECTION)
>+  bool mmx_enabled
>+    = sse_private::has_cpuid_bit(1u, sse_private::edx, (1u<<23));
>+#else
>+  bool mmx_enabled = false;
>+#endif


That is, I think the only #if block you need here is the HAVE_CPUID_DETECTION one (and you could probably combine them all into one chunk for x86 -- if we can't build with, say, SSE2 even if the CPU supports it, the inline functions will return false from the .h file).  For other cases having the extern declarations without the actual variable is fine I think, because nothing should be referencing them.  But the way you have it is also fine, if you'd rather keep it like that.
Comment 34 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-12-11 11:47:22 PST
Created attachment 417143 [details] [diff] [review]
use SSE.h in the code that led me to start this
Comment 35 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-12-11 11:47:47 PST
Main patch landed:
http://hg.mozilla.org/mozilla-central/rev/7505facff438
Comment 36 Giacomo Magnini 2009-12-12 05:04:32 PST
After the checkin, I'm not able to build again SeaMonkey (x64), not even after clobbering, on OpenSuSE 11.1 with "gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291]".
c++ -o SSE.o -c -I../../dist/system_wrappers -include /home/prometeo/src/mozilla/config/gcc_hidden.h -DMOZ_SUITE=1 -DOSTYPE=\"Linux2.6.27.39-0\" -DOSARCH=Linux -DTARGET_XPCOM_ABI=\"x86_64-gcc3\" -I/home/prometeo/src/mozilla/xpcom/glue/../build  -I/home/prometeo/src/mozilla/xpcom/glue -I. -I../../dist/include -I../../dist/include/nsprpub  -I/home/prometeo/suite-obj/mozilla/dist/include/nspr -I/home/prometeo/suite-obj/mozilla/dist/include/nss       -fPIC  -fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-variadic-macros -Wno-long-long -pedantic -fno-strict-aliasing -fshort-wchar -pthread -pipe  -DNDEBUG -DTRIMMED -Os -freorder-blocks -fno-reorder-functions    -DMOZILLA_CLIENT -include ../../mozilla-config.h -Wp,-MD,.deps/SSE.pp /home/prometeo/src/mozilla/xpcom/glue/SSE.cpp
In file included from /home/prometeo/src/mozilla/xpcom/glue/SSE.cpp:40:
/home/prometeo/src/mozilla/xpcom/glue/SSE.h:243:19: error: cpuid.h: No such file or directory
In file included from /home/prometeo/src/mozilla/xpcom/glue/SSE.cpp:40:
/home/prometeo/src/mozilla/xpcom/glue/SSE.h: In function ‘bool mozilla::sse_private::has_cpuid_bit(unsigned int, mozilla::sse_private::CPUIDRegister, unsigned int)’:
/home/prometeo/src/mozilla/xpcom/glue/SSE.h:256: error: ‘__get_cpuid’ was not declared in this scope
gmake[6]: *** [SSE.o] Error 1
gmake[6]: Leaving directory `/home/prometeo/suite-obj/mozilla/xpcom/glue'
gmake[5]: *** [libs] Error 2
gmake[5]: Leaving directory `/home/prometeo/suite-obj/mozilla/xpcom'
gmake[4]: *** [libs_tier_xpcom] Error 2
gmake[4]: Leaving directory `/home/prometeo/suite-obj/mozilla'
gmake[3]: *** [tier_xpcom] Error 2
gmake[3]: Leaving directory `/home/prometeo/suite-obj/mozilla'
gmake[2]: *** [default] Error 2
gmake[2]: Leaving directory `/home/prometeo/suite-obj/mozilla'
make[1]: *** [default] Error 2
make[1]: Leaving directory `/home/prometeo/suite-obj'
make: *** [build] Error 2
Comment 37 Robert Kaiser (not working on stability any more) 2009-12-12 06:03:09 PST
The problem Giacomo is having is because openSUSE guys forgot to package cpuid.h in their ggc 4.3 packages they shipped on openSUSE 11.1, and they only fixed it for the next version of their distro (11.2, which shipped only 3-4 weeks ago), see https://bugzilla.novell.com/show_bug.cgi?id=527433

That means everyone on openSUSE 11.1 breaks at this point :(
Comment 38 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-12-12 18:26:44 PST
I've been meaning to make that code use inline assembly instead; I'm currently testing a patch, written with the help of http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html .  I tested that gcc does use the input/output register constraints to know that ebx (PIC register) is modified by the assembly, and therefore gcc does generate code to save/restore ebx around the assembly block, as it should.  So my little chunk of inline assembly should work for both i386 and x86_64 without any ifdefs between them.
Comment 39 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-12-12 18:56:28 PST
(In reply to comment #38)
> that gcc does use the input/output register constraints to know that ebx (PIC
> register) is modified by the assembly, and therefore gcc does generate code to
> save/restore ebx around the assembly block, as it should.  So my little chunk
> of inline assembly should work for both i386 and x86_64 without any ifdefs
> between them.

Hmm.  Never mind that; it's fine on x86_64, but on i386 with -fPIC, gcc won't allow register constraints for ebx.
Comment 40 Andreas Gal :gal 2009-12-12 19:08:58 PST
The jit was some code to work around the ebx issue.
Comment 41 Julian Seward [:jseward] 2010-03-04 03:40:34 PST
Created attachment 430276 [details] [diff] [review]
un-breaks the build on openSUSE 11.0 by avoiding cpuid.h

This does cpuid "by hand".  WARNING: untested.  It un-breaks
the build on openSUSE 11.0 and the resulting Fx will at least
start.  It is slightly conservative in that it passes the %eax
argument to cpuid through memory, which is an unnecessary load/store,
but it means we don't have to get tied up writing and trying to
assess the correctness of complicated gcc register constraints to
do with earlyclobber and wotnot.  It also preserves ebx so as to
keep macos happy (again, untested).
Comment 42 Vladimir Vukicevic [:vlad] [:vladv] 2010-05-18 15:57:47 PDT
Comment on attachment 417143 [details] [diff] [review]
use SSE.h in the code that led me to start this

(sorry for long review time!)
Comment 43 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2010-05-21 16:53:59 PDT
landed attachment 417143 [details] [diff] [review]: http://hg.mozilla.org/mozilla-central/rev/3af8b1f39ee8
Comment 44 Justin Lebar (not reading bugmail) 2010-07-13 12:02:26 PDT
Is there a reason that this isn't marked RESOLVED FIXED?  Are we holding out for a solution to these issues with GCC and the intrinsics?
Comment 45 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2010-08-09 14:41:33 PDT
I filed bug 585738 as a followup to fix the gcc issues.

Note You need to log in before you can comment on or make changes to this bug.