331043 - Improve hash performance using _rotr intrinsic

Assignee

Description

•

20 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060318 Firefox/1.5.0.1 (mmoy CE 1.5.0.1 K8B-X69) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060318 Firefox/1.5.0.1 (mmoy CE 1.5.0.1 K8B-X69) There are about a dozen modules in the Mozilla code base that use the xor of a shift right and shift left to simulate a rotate as there is no rotate function in the C language. This frequently looks like: h = (h >> (JS_HASH_BITS - 4)) ^ (h << 4) ^ *s Microsoft provides the intrinsic _rotr(tartet, number_of_bits) which results in generated code using the ror (rotate right) instruction. In VS2002 and VS2003, two shifts, a mov and an xor are generated instead. In VS2005, the rotate instruction is generated so Microsoft fixed the bug. But I think that the 1.5 line is still using VS2002 so that line would benefit from using the intrinsic. On VS2005, using the intrinsic won't hurt. I've been using patches to use _rotr since last summer with no problems. I also use inline assembler for some routines. Here's the assembler output from VS2003 compiled with -O2 and you can see the difference between the intrinsic and the current Mozilla code. Hashing is used a lot in Mozilla and this improvement should make for at least a small improvement in overall performance. I've been using hash performance patches since the Summer of 2004 though the older ones were inline assembler. There is some additional performance gains possible using dword xors instead of byte xors, especially when the length of the object to be hashed is known in advance but perhaps that's something for another bug. ; 6 : main() { push ecx ; 7 : unsigned int a, b, c, d; ; 8 : ; 9 : scanf("%d", &a); lea eax, DWORD PTR _a$[esp+4] push eax push OFFSET FLAT:??_C@_02DPKJAMEF@?$CFd?$AA@ call _scanf ; 10 : b = a >> 28 | a << 4; ; 11 : c = _rotr(a, 28); mov eax, DWORD PTR _a$[esp+12] mov ecx, eax mov edx, eax ror ecx, 28 ; 0000001cH shr edx, 28 ; 0000001cH shl eax, 4 ; 12 : printf("%d %d\n", b, c); push ecx or edx, eax push edx push OFFSET FLAT:??_C@_06IPGPIAII@?$CFd?5?$CFd?6?$AA@ call _printf Reproducible: Always

Michael Moy

Assignee

Comment 1

•

20 years ago

Attached patch Sample hash patch using intrinsics (obsolete) — Details — Splinter Review

Michael Moy

Assignee

Comment 2

•

20 years ago

If you're interested in this change, I can code up the dozen or so patches. I'd just have to add the conditionals for Windows.

Phil Ringnalda (:philor)

Updated

•

20 years ago

Assignee: nobody → general

Component: General → JavaScript Engine

Product: Firefox → Core

QA Contact: general → general

Version: unspecified → Trunk

Christian :Biesinger (don't email me, ping me on IRC)

Comment 3

•

20 years ago

1.5 uses vc6 fwiw

Bob Clary [:bc] (inactive)

Updated

•

20 years ago

Flags: in-testsuite-

Michael Moy

Assignee

Comment 4

•

20 years ago

List of files that simulate rotate with left and right shifts related to hash: * nsprpub\lib\ds\plhash.c * js\src\jshash.c * js\src\jsdhash.c * xpcom\ds\nscrt.cpp * xpcom\glue\pldhash.c * js\src\liveconnect\jsj_hash.c * content\xul\document\src\nsElementMap.cpp * xpcom\ds\nsStaticNameTable.cpp * js\src\xpconnect\src\xpcmaps.cpp * netwerk\protocol\http\src\nsHttp.cpp * netwerk\cache\src\nsDiskCacheDevice.cpp * xpcom\glue\nsTHashtable.cpp * security\nss\lib\pki\pkistore.c * security\nss\lib\base\hash.c * js\src\jsscope.c * js\src\jsstr.c Can someone assign this to me?

Christian :Biesinger (don't email me, ping me on IRC)

Updated

•

20 years ago

Assignee: general → mmoy

Brendan Eich [:brendan]

Comment 5

•

20 years ago

Note that xpcom/ds/pldhash.[ch] are sed-generated from js/src/jsdhash.[ch], so you should modify only the latter, then generate the former using sed -f plify_jsdhash.sed jsdhash.c > ../../xpcom/ds/pldhash.c sed -f plify_jsdhash.sed jsdhash.h > ../../xpcom/ds/pldhash.h in js/src. Also the only place that contains #ifdefs should be two header files, one in NSPR and the other in SpiderMonkey (js/src). These two should mirror each other and define a macro, PR_ROTATE_RIGHT for NSPR and JS_ROTATE_RIGHT for SpiderMonkey. The best header file for SpiderMonkey is arguably jsbit.h. There may be an NSPR file with corresponding purpose, if not name. /be

Status: UNCONFIRMED → NEW

Ever confirmed: true

Michael Moy

Assignee

Comment 6

•

20 years ago

VS2003 requires Oi optimization or O2 optimization to inline _rotr. Without Oi in VS2003, there's a callout to the __rotr routine and this would make this not worth doing. I assume that VC6 has the same optimization switch requirement. The last I checked, Mozilla uses -O1 -G6 as optimization switches. If this is incorrect, please let me know. I will check to see if there are #pragmas available to turn on inline optimization. And I will need to do some testing with VC6.

Brendan Eich [:brendan]

Comment 7

•

20 years ago

Good to know what VS7.1/8 do, as well. I wonder why we aren't using O2 on Windows -- cc'ing bsmedberg, bryner, and vlad in case they know. /be

Brian Ryner (not reading)

Comment 8

•

20 years ago

O2 on windows flips the space/speed tradeoff towards "speed". I believe the last time we evaluated this, we determined that the rather small speedup (3%?) wasn't worth the code bloat.

Brian Ryner (not reading)

Comment 9

•

20 years ago

That was a vc6 test though, it would be useful to retest on our newer Windows compilers.

Michael Moy

Assignee

Comment 10

•

20 years ago

VS2005 requires -O2 or -Oi to get the inlined rotate command as well.

Michael Moy

Assignee

Comment 11

•

20 years ago

On VS2003 and VS2005, the ror inlines with: #include <stdlib.h> #pragma intrinsic(_rotr) This approach doesn't have any requirements on the command line optimization flags.

Michael Moy

Assignee

Comment 12

•

20 years ago

(In reply to comment #8) > O2 on windows flips the space/speed tradeoff towards "speed". I believe the > last time we evaluated this, we determined that the rather small speedup (3%?) > wasn't worth the code bloat. The official build image takes 6,999 KB (not sure whether that's 1.5 or 1.5.0.1 but it's probably 1.5.0.1 on my machine). Unofficial builders that use -O2 typically have image sizes in the 8,000 to 8,700 KB range with some of that due to -O2 and the rest due to private code or additional functionality or code. Most of the unofficial builders also use -GL (Whole Program Optmization) which adds to the image size by inlining across modules and routines. I also use Profile Guided Optimization which improves performance by about 7%. I haven't noticed larger code sizes with PGO but my comparisons are with GL and O2 assumed.

Nelson Bolyard (seldom reads bugmail)

Comment 13

•

20 years ago

(In reply to comment #11) > On VS2003 and VS2005, the ror inlines with: > > #include <stdlib.h> > #pragma intrinsic(_rotr) NSS has been using _lrotr and _lrotl in rotate macros for years, FWIW. http://lxr.mozilla.org/security/source/security/nss/lib/freebl/sha512.c#142

URL: http://lxr.mozilla.org/security/sourc...

Michael Moy

Assignee

Comment 14

•

20 years ago

(In reply to comment #5) > Also the only place that contains #ifdefs should be two header files, one in > NSPR and the other in SpiderMonkey (js/src). These two should mirror each > other and define a macro, PR_ROTATE_RIGHT for NSPR and JS_ROTATE_RIGHT for > SpiderMonkey. The best header file for SpiderMonkey is arguably jsbit.h. > There may be an NSPR file with corresponding purpose, if not name. how about prbit.h?

Michael Moy

Assignee

Comment 15

•

20 years ago

Attached patch Hash patch to use _rotr intrinsic in MSVC (obsolete) — Details — Splinter Review

This applies against MOZILLA_1_8_BRANCH

Attachment #215598 - Attachment is obsolete: true

Michael Moy

Assignee

Comment 16

•

20 years ago

I'm going to try the patch on a trunk build next.

Michael Moy

Assignee

Comment 17

•

20 years ago

BTW, I verified the assembler code on all but two or three of the modules. The -FAs switch didn't pass on to some of the compile commands.

Michael Moy

Assignee

Comment 18

•

20 years ago

Attached patch Trunk version of _rotr hash patch (obsolete) — Details — Splinter Review

This is the trunk version of the patch. The previous version is for the MOZILLA_1_8_BRANCH.

Michael Moy

Assignee

Comment 19

•

20 years ago

I'm building the trunk right now. I'm hoping that _X86_ covers Win64 as well. I'll try a Win64 build tonight if I have time.

Michael Moy

Assignee

Comment 20

•

20 years ago

Attached patch Trunk _rotr patch (obsolete) — Details — Splinter Review

Trunkn patch modified to support Win64

Michael Moy

Assignee

Comment 21

•

20 years ago

Attached patch Trunk _rotr patch (obsolete) — Details — Splinter Review

Trunkn patch modified to support Win64

Attachment #215741 - Attachment is obsolete: true

Michael Moy

Assignee

Updated

•

20 years ago

Attachment #215763 - Attachment is obsolete: true

Michael Moy

Assignee

Comment 22

•

20 years ago

Attached patch Branch _rotr patch — Details — Splinter Review

Modified to support Win64

Michael Moy

Assignee

Comment 23

•

20 years ago

The second set of patches has been tested on Win64 (Release 1.5.0.1 - Branch Patch), Win32 Branch (MOZILLA_1_8_BRANCH) and Win32 Trunk in that I checked one or two of the output assembler code and the ror (Rotate Right) instructions was there. How do I go about requesting reviews for the two patches?

Igor Bukanov

Comment 24

•

20 years ago

What is the definition of _rotr on Win64? Does it assumes 64 bit argument or 32 bit one?

Michael Moy

Assignee

Comment 25

•

20 years ago

(In reply to comment #24) > What is the definition of _rotr on Win64? Does it assumes 64 bit argument or > 32 bit one? That's a very good question. http://msdn2.microsoft.com/en-us/library/5cc576c4(VS.80).aspx indicates that _rotl and _rotr are 32-bit instructions while _rotl64 and _rotr64 are the 64-bit versions. There are also 8-bit and 16-bit versions of the intrinsic. I had a look at the generated code and the register for the 64-bit version was r8 while the register for the 32-bit version was r8d. I need to brush up on AMD 64 assembler notation. I wrote a small program that verified that the instruction produces results for a 32-bit rotate. And I've been using this intrinsic in my Community Edition Win64 1.5 and 1.5.0.1 Release builds since Summer 2005.

Igor Bukanov

Comment 26

•

20 years ago

Comment on attachment 215764 [details] [diff] [review] Trunk _rotr patch >+#if defined(_MSC_VER) && (defined(_X86_) || defined(_AMD64) || defined(_M_AMD64)) >+#include <stdlib.h> >+#pragma intrinsic(_rotr) >+#define JS_ROTATE_RIGHT(a, bits, length) _rotr(a, bits) >+#else >+#define JS_ROTATE_RIGHT(a, bits, length) (((a) >> (bits)) ^ ((a) << ((length) - (bits)))) >+#endif Given that _rotr always assumes 32-bits argument and result, IMO the macros should be called JS_ROTATE_RIGHT32 etc. and do not take the length argument at all.

Michael Moy

Assignee

Comment 27

•

20 years ago

(In reply to comment #26) > Given that _rotr always assumes 32-bits argument and result, IMO the macros > should be called JS_ROTATE_RIGHT32 etc. and do not take the length argument at > all. The other half of the macro deals with non-Windows x86 platforms and the assumption on those other platforms is that hashes are 32-bits as well (I think). The number of objects would have to be monsterous to suggest using 64-bits. The use of the length was to implicitly support some of the tags that indicate integer length but changing the value to something other than 32 would work in the Macro as a different intrinsic would be required. I could change it to ROTATE_RIGHT32 or take out the length with the assumption of 32 bits. I think that I'll leave this for discussion for a little while as it it takes about 90 minutes to do the testing on the Branch, Trunk and Win64 and I'd rather keep to a minimum the number of times that I have to retest everything.

Michael Moy

Assignee

Comment 28

•

20 years ago

One more comment is that the other side of the macro does not assume 32-bits in the macro. I'd guess that the number of bits depends on the datatype that's rotated.

Igor Bukanov

Comment 29

•

20 years ago

(In reply to comment #28) > One more comment is that the other side of the macro does not assume 32-bits in > the macro. And this is bad: just imagine that new code would use JS_ROTATE_RIGHT(a, 4, sizeof(word-sized-variable)*8). It would work everywhere except Windows 64. Thus the reason for JS_ROTATE_RIGHT32 since it tells very explicitly what kind of rotation is going on. If some places would require a rotation over a 64 bit value, then JS_ROTATE_RIGHT64 can be added.

Michael Moy

Assignee

Comment 30

•

20 years ago

(In reply to comment #29) > And this is bad: just imagine that new code would use JS_ROTATE_RIGHT(a, 4, > sizeof(word-sized-variable)*8). It would work everywhere except Windows 64. > Thus the reason for JS_ROTATE_RIGHT32 since it tells very explicitly what kind > of rotation is going on. If some places would require a rotation over a 64 bit > value, then JS_ROTATE_RIGHT64 can be added. The development environment on Windows64 is the same as it is on Windows32 as far as datatypes go. So ints are 32 bits and everything else works as it would in a 32-bit environment. If you want to use 64-bit datatypes, then you'd use long long or __int64. Which is what you'd probably do if you wanted to do 64-bit operations in the 32-bit environment. But the compiler would simulate the 64-bit operations for you. When I say the other side of the macro, I'm referring to the >> << case. Interestingly enough, the case that you refer to (8 * sizeof x) does exist in the Mozilla code base and works fine in the Windows 64 build. Brendan suggested the names for the macros. How about just swiping the macro used in sha512.c, rotr32, and attaching wither PR or JS for PR_rotr32 and JS_rotr32? Again, I'm looking for a concensus as I don't want to change the names or the functionality more than once.

Brendan Eich [:brendan]

Comment 31

•

20 years ago

Michael: my name suggestion was just a suggested starting point. Many similar macros in SpiderMonkey (js/src/*.[ch]) have implicit int32-based representation limits. If on the other hand, the rotate macros want to work on a type whose bit size depends on compilation target, and if that type is not jsword/jsuword, then I agree with Igor that it's better to have the bit-size in the macro names. The NSS names are kind of crufty-terse based on the MS names, and not ALL_CAPS (macros generally should be to follow prevailing style, unless free of side effects and multiple argument expansions). So depending on which type is implicit in the rotate macros, we could have either JS_ROTATE_RIGHT as proposed, with one universal type which would have to be either uint32 or jsuword -- or JS_ROTATE_RIGHT_32 and _64. If I'm reading things right here (recovering from a lost filesystem, also on Jury Duty now!), the type that's implicit in the macros is not well-defined in terms of jsuword (which is defined as the integral type big enough to hold any pointer type), so we should go with JS_ROTATE_RIGHT_32 and _64. If others are happy with this reasoning and proposal after today, I say act on it. Thanks for taking on this work. /be

Igor Bukanov

Comment 32

•

20 years ago

It was written in comment #25: > http://msdn2.microsoft.com/en-us/library/5cc576c4(VS.80).aspx indicates that > _rotl and _rotr are 32-bit instructions and it was written in comment #30: > (In reply to comment #29) > Interestingly enough, the case that you refer to (8 * sizeof x) does exist in > the Mozilla code base and works fine in the Windows 64 build. Given these 2 comments with sizeof(x) == sizeof(word) the proposed definition of the macro would break Win64.

Michael Moy

Assignee

Comment 33

•

20 years ago

(In reply to comment #32) > Given these 2 comments with sizeof(x) == sizeof(word) the proposed definition > of the macro would break Win64. The size of x apparently is 32 bits because the code has been working fine on Windows 64 since last summer. The size of a word, at least the way it is implemented in Mozilla, is apparently 32 bits as well. My Windows 64 builds are at vector64.com if you want to see for yourself.

Michael Moy

Assignee

Comment 34

•

20 years ago

(In reply to comment #31) > So depending on which type is implicit in the rotate macros, we could have > either JS_ROTATE_RIGHT as proposed, with one universal type which would have to > be either uint32 or jsuword -- or JS_ROTATE_RIGHT_32 and _64. > > If I'm reading things right here (recovering from a lost filesystem, also on > Jury Duty now!), the type that's implicit in the macros is not well-defined in > terms of jsuword (which is defined as the integral type big enough to hold any > pointer type), so we should go with JS_ROTATE_RIGHT_32 and _64. On Windows 64, pointers are 64 bits but I'd guess that the jsuword is 32 bits. My Windows 64 Trial License expired this morning and I just ordered the kit so I can't spend much time on Windows 64 (it lets me login for an hour at a time) but I should be able to verify that the correct code is generated. The Windows 64 license should arrive in a few days which will make it easier to do test builds. > If others are happy with this reasoning and proposal after today, I say act on > it. Thanks for your comments.

Igor Bukanov

Comment 35

•

20 years ago

I guess I was not presize enough to write why I do not like the definition of the macros from the last patch. If one uses ROTATE(x, 4, sizeof(x) * 8) where x is int64, then it would work everywhere but Win64. If one want universal macro, then it it should be something like: #include <stdlib.h> ... #if defined(_MSC_VER) && (defined(_X86_) || defined(_AMD64) || defined(_M_AMD64)) #pragma intrinsic(_rotr) #define JS_ROTATE_RIGHT(a, bits) \ (sizeof(a) == 4 ? _rotr(a, bits) : sizeof(a) == 8 ? _rotr64(a, bits) \ : (((a) >> (bits)) ^ ((a) << (sizeof(a) * JS_BITS_PER_BYTE - (bits))))) #else #define JS_ROTATE_RIGHT(a, bits) \ (((a) >> (bits)) ^ ((a) << (sizeof(a) * JS_BITS_PER_BYTE - (bits)))) #endif Note also that <stdlib.h> is moved outside of #if so it would be impossible to forget to include stdlib.h in non-msvc builds. But this also assumes that MSVC would eliminate constant expression from the code with options used with Mozilla builds. If not, then separated ROTATE_32/ROTATE_64 should be defined IMO.

Igor Bukanov

Comment 36

•

20 years ago

(In reply to comment #34) > On Windows 64, pointers are 64 bits but I'd guess that the jsuword is 32 bits. SpiderMonkey assumes that sizeof(jsword) == sizeof(void*). In other word, jsword is an alias to intptr_t from C99.

Brendan Eich [:brendan]

Comment 37

•

20 years ago

(In reply to comment #36) > (In reply to comment #34) > > On Windows 64, pointers are 64 bits but I'd guess that the jsuword is 32 bits. > > SpiderMonkey assumes that sizeof(jsword) == sizeof(void*). In other word, > jsword is an alias to intptr_t from C99. Or should be, but C99 support is still sadly lacking on a number of platforms we target, or targeted last year. For a long time, only LP64 was supported on 64-bit targets. That's the only sane model in my opinion, but of course MS came out with an LLP64 variant where long is 32 bits. So we have had to add #ifdefs to fundamental typedefs that used to use long to mean "pointer-sized int". Using C99 where we can autoconf-test for it and it works is great, so we should prefer that. I'm still recovering from data loss, not sure where things stand in the trunk tip. /be

Michael Moy

Assignee

Comment 38

•

20 years ago

(In reply to comment #37) > That's the only sane model in my opinion, but of course MS came > out with an LLP64 variant where long is 32 bits. I think that they looked at Alpha and Itanium and figured that they needed something that minimized porting effort for their installed base of developers and came up with what they did. There are some performance benefits as some 32-bit instructions (like multiply and divide) have lower latencies when working on 32-bit integers.

Igor Bukanov

Comment 39

•

20 years ago

(In reply to comment #38) > There are some > performance benefits as some 32-bit instructions (like multiply and > divide) have lower latencies when working on 32-bit integers. Interesting. I read on Linux kernel mailing list that for Amd64 using 64-bit values for things like loop counters, indexes etc. is faster than 32 bit ones. Perhaps on Intel this is different.

Brendan Eich [:brendan]

Comment 40

•

20 years ago

Itanium, cough. Alpha, so last millennium. They did what they did, but I think it's bogus in light of the prehistory going back 30+ years where int could hold a pointer, and then 20+ years ago we went through the effort of purging that assumption in favor of long being big enough. /be

Michael Moy

Assignee

Comment 41

•

20 years ago

(In reply to comment #39) > Interesting. I read on Linux kernel mailing list that for Amd64 using 64-bit > values for things like loop counters, indexes etc. is faster than 32 bit ones. > Perhaps on Intel this is different. Software Optimization Guide for AMD Athlon 64 and AMD Opteron Processors 3.4 32-Bit Legacy GPRs and Small Unsigned Integers Use the 32-bit legacy general-purpose registers EAX through ESI instead of their 64-bit extensions to store unsigned integer values whose range never requires more than 32 bits, even if subsequent statements use the 32-bit value in a 64-bit operation. ... In 64-bit mode, the machine-language representation of many instructions that operate on 64-bit register operands require a REX prefix byte, which increases the size of the code. However, instructions that operate on a 32-bit legacy register operand do not require the prefix and have the desirable side-effect of clearing the upper 32 bits of the extended register to zero. For example, using the AND instruction on ACX clears the upper half of RCX. I didn't see anything in Chapter 3 General 64-Bit Optimizations that recommended using 64-bit operations over 32-bit operations unless you actually are dealing with 64-bit data. In Appendix C, instructions that take longer to execute in 64-bit mode compared to 32-bit mode are BSF, DIV, IDIV, IMUL, SGDT, SIDT The LOOP* instructions have latencies of 9 in 32-bit mode and 8 in 64-bit mode so this may be source of the information on your mailing list. Though I think that most loops don't use the LOOP* instructions these days. The WBINVD is more expensive in 32-bit than 64-bit mode but I've never heard of that instruction before and it uses near 10,000 cycles. I don't think that I have looked at the Pentium 4/Xeon docs for EM64T. The Intel SOG that I have only refers to the 32-bit chips.

Michael Moy

Assignee

Comment 42

•

20 years ago

(In reply to comment #40) > Itanium, cough. Alpha, so last millennium. A bunch of companies are spending $10 billion on Itanium and that's a lot of good-paying software jobs. I don't know how successful it will be but there's a lot of gravy out there right now. Alpha had its problems but many of the hardware guys that worked on it migrated over to Intel and AMD and some of the technology that we enjoy today comes out of the work done on Alpha. I think that the market has indicated to Microsoft that customers want the easiest migration possible to 64-bits and that's what they have at the moment. To get further improvements, you have to then get your hands dirty. But I think that customers with legacy systems generally want to get the thing ported and running and then work on incremental improvements later on. > They did what they did, but I think it's bogus in light of the prehistory going > back 30+ years where int could hold a pointer, and then 20+ years ago we went > through the effort of purging that assumption in favor of long being big > enough. Microsoft put in a fair amount of effort into XP Pro x64 and there are still a lot of driver issues out there that make it difficult to get up and running with what you want to run on x64. I think that most people dual-boot and Microsoft has implicitly acknowledged the issue of drivers with a relatively inexpensive selling cost on x64. I paid $89 for xn x64 license today and this was a full license; not an upgrade. The same thing on 32-bits would run somewhere between $200 and $300. So from an IT perspective, Microsoft's approach carries a lot of value to their customers and developers. And of course AMD64 allows them to do this pretty easily. From an engineering and portability perspective, it may not be pure. But hey, they're Microsoft and they can quite often throw their weight around.

Nelson Bolyard (seldom reads bugmail)

Comment 43

•

20 years ago

For operations happening in registers, 32-bit instructions can be minutely faster than 64. This is true for both Intel and AMD 64 bit CPUs. But for memory stores (reg->mem) on AMD Opterons (only), any stores other than 64-bit aligned ones seem to require a read-modify-write memory cycle. We sped up a lot of crypto code for AMD Opteron by changing it to use 64-bit arrays in memory rather than 8-bit or 32-bit arrays. Every such store becomes 2-3x faster. Overall RC4 performance went up over 2x by changing one array of 1024 uchars to 1024 ulongs, alone. Technique was published by Mr Marc Bevand, http://etudiant.epita.fr/~bevand_m/ and converting to 64-bit values in memory for performance has become known as "Bevandizing".

Brendan Eich [:brendan]

Comment 44

•

20 years ago

Packing in structs for space efficiency has often been opposed to minimizing latency when retiring stores. Loads dominate stores. Crypto, DSP, and otherwise numerically or combinatorially intensive code may need hand tuning. All things considered, I still believe LP64 to be an easier migration target than LLP64, but I'm biased. And, I'm not MS ;-). /be

Michael Moy

Assignee

Comment 45

•

20 years ago

Attached patch Trunk patch to use _rotr for hash (obsolete) — Details — Splinter Review

This works on the Trunk Win32 build. I'm going to try it on Win64 Release 1.5.0.1 next. And then the Win64 Trunk. I've never built the Win64 Trunk so I don't know if I will be able to get that to work. I'm going to have to remove the 64-bit version for the Branch as VC6 doesn't support _rotr64. VS2003 does though.

Attachment #215715 - Attachment is obsolete: true

Attachment #215764 - Attachment is obsolete: true

Michael Moy

Assignee

Comment 46

•

20 years ago

The Trunk patch generates correct code on a Win64 Release 1.5.0.1 build so please have a look at that. On the Branch side, I'm either going to need to dump the 64-bit version or conditional code the 64-bit version based on the MSVC version number. ----- On the alignment thing: it sounds like something to do with the cache coherency channels as that's one of the few differences between the Opteron and the consumer CPUs. The SOG is pretty clear that you should naturally align everything in sight for best performance. Especially when you're working with SIMD instructions. The SOG indicates that you can get read/write performance on movlps/movhps of two instructions per cycle if the moves are 8-byte aligned with latency of 1. If unaligned, the latency doubles.

Michael Moy

Assignee

Comment 47

•

20 years ago

How do I proceed on getting this into the code base?

Steve Snyder

Comment 48

•

19 years ago

Might I suggest the addition of a XX_ROTATE_RIGHT16 macro? Apart from completeness, there are cases in the code of 16-bit SHIFT-and-OR where a simple rotate instruction would do. For example, check out the SwapBytes() routines in xpfe/components/history/src/nsGlobalHistory.cpp and intl/uconv/ucvlatin/nsUnicodeToUCS2BE.cpp It would be nice to see *p++ = (0x00FF & (aChar >> 8)) | (0xFF00 & (aChar << 8)); turned into a single rotate/move via *p++ = XX_ROTATE_RIGHT16(aChar, 8);

Steve Snyder

Comment 49

•

19 years ago

Micheal, I saw on your blog that you're looking at the 16-bit version of the rotate code. Here are 4 examples, all SwapBytes(), doing byte-swapping on 16-bit Unicode chars: mozilla/intl/uconv/ucvlatin/nsUCS2BEToUnicode.cpp mozilla/intl/uconv/ucvlatin/nsUnicodeToUCS2BE.cpp mozilla/toolkit/components/history/src/nsGlobalHistory.cpp mozilla/xpfe/components/history/src/nsGlobalHistory.cpp As noted above, they're all doing that godawfull shift/mask/or. Those unfortuneate Windows users who regularly translate Unicode from non-native byte-ordering would surely appreciate a more efficient translation. (Too bad MSVC doesn't like using the XCHG instruction.)

Michael Moy

Assignee

Comment 50

•

19 years ago

MSVC doesn't like to play with registers like al and ah in general so getting it to do an xchg like that would be difficult. I haven't checked to see if there's an intrinsic for that but I would doubt it. One other thing about the xchg instruction is that it is latency 2 and VectorPath on the K8. On the Pentium 4, the latency is 1.5. On the other hand, the rotate instructions have a 1 cycle latency on K8, P4 (except for Northwood where it is 4). I'm pretty sure that rotates are 1 cycle on Pentium M and Core though it isn't in my optimization guide for Intel. There's no doubt that 16-bit rotate is the most efficient. At any rate, I'm pretty busy at work right now but am doing test stuff at night on Mozilla. (In reply to comment #49) > Micheal, > > I saw on your blog that you're looking at the 16-bit version of the rotate > code. Here are 4 examples, all SwapBytes(), doing byte-swapping on 16-bit > Unicode chars: > > mozilla/intl/uconv/ucvlatin/nsUCS2BEToUnicode.cpp > mozilla/intl/uconv/ucvlatin/nsUnicodeToUCS2BE.cpp > mozilla/toolkit/components/history/src/nsGlobalHistory.cpp > mozilla/xpfe/components/history/src/nsGlobalHistory.cpp > > As noted above, they're all doing that godawfull shift/mask/or. Those > unfortuneate Windows users who regularly translate Unicode from non-native > byte-ordering would surely appreciate a more efficient translation. (Too bad > MSVC doesn't like using the XCHG instruction.) >

Steve Snyder

Comment 51

•

19 years ago

Attached patch Additional use of rotate macro, from branch. — Details — Splinter Review

Two more places where the standard "(h >> 28) ^ (h << 4)" code can be replaced with the PR_ROTATE_RIGHT32 macro. Patch modifies 2 files from the 1.8.0 branch.

Worcester12345

Comment 52

•

18 years ago

This one also needs "perf" key word.

Worcester12345

Comment 53

•

18 years ago

(In reply to comment #52) > This one also needs "perf" key word. >

Aiko

Updated

•

18 years ago

Keywords: perf

Mike Schroepfer

Comment 54

•

18 years ago

We are using VS2005 on trunk right now and cc in particular uses hashing quite a bit - so methinks we should definitely take a stab at this...

Flags: blocking1.9?

Steve Snyder

Comment 55

•

18 years ago

This bug is open against component JS, but the original patch touches several subsystems. What is desired here, a JS-only patch or one that applies the use of the rotate instructions (via compiler intrinsics) across the mozilla source tree?

(not currently active) Ted Mielczarek

Comment 56

•

18 years ago

Comment 0 implies that VC2005 produces good instructions here, and we're using that on trunk, so is there any need for a fix?

Steve Snyder

Comment 57

•

18 years ago

VS2005 will generate the rotate instructions if an OR is used in the source, but not if an XOR is used. Does generate rotate instruction: a >> 28 | a << 4 Does not generate rotate instruction: a >> 28 ^ a << 4 Most (all?) Mozilla code is using XOR, so the proposed changes really do provide some benefit.

Mike Schroepfer

Comment 58

•

18 years ago

(In reply to comment #56) > Comment 0 implies that VC2005 produces good instructions here, and we're using > that on trunk, so is there any need for a fix? > I believe that comment is talking about the instructions emitted if the instrinsic's are used - which we currently don't use. Is that right Michael?

Mike Schroepfer

Comment 59

•

18 years ago

(In reply to comment #55) > This bug is open against component JS, but the original patch touches several > subsystems. > > What is desired here, a JS-only patch or one that applies the use of the rotate > instructions (via compiler intrinsics) across the mozilla source tree? > If it helps why not use it everywhere?

Michael Moy

Assignee

Comment 60

•

18 years ago

VS2005 uses the rotate instruction when modules are compiled at -Oi or -O2. According to the post from March 2006, Firefox builds at -O1 as -O2 increases code size. If the optimization level has moved to add -Oi or moved from -O1 to O2, then this code isn't needed outside of clarifying the code and putting the code for rotate into one central place. I would personally prefer that Firefox moved to -O2 but this isn't the best place to discuss that. The idea, a long time ago, was to put the rotate intrinsics in one place so that they could be called from anywhere else and that it a change was required, that the change would only need be made in one place. There are other areas of code that could benefit from the intrinsic (they may be referenced above) used for a general rotate function.

Robert Sayre

Comment 61

•

18 years ago

(In reply to comment #56) > Comment 0 implies that VC2005 produces good instructions here, and we're using > that on trunk, so is there any need for a fix? > VC2005 will produce it at -O2 and -Oi. We build at -O1, so we should take this patch for our hashtables.

Brian Crowder

Comment 62

•

18 years ago

The code is clearer with the patch anyway. Let's take it. It seems likely we need two separate patches, one for JS (spidermonkey), and one for the rest of the browser, to get this committed in a timely fashion? Or at least, in my experience, it's been easier to drive patches home that way. If there's disagreement, let me know. I'll be glad to review the JS patch.

Mike Schroepfer

Comment 63

•

18 years ago

(In reply to comment #62) > The code is clearer with the patch anyway. Let's take it. It seems likely we > need two separate patches, one for JS (spidermonkey), and one for the rest of > the browser, to get this committed in a timely fashion? Or at least, in my > experience, it's been easier to drive patches home that way. If there's > disagreement, let me know. I'll be glad to review the JS patch. > yea - separate JS and other patches might be good. What do we need to do for GCC for linux/win? Either way this looks like a good, easy, win. So let's move on it.

Brendan Eich [:brendan]

Comment 64

•

18 years ago

Comment on attachment 215887 [details] [diff] [review] Trunk patch to use _rotr for hash >RCS file: /cvsroot/mozilla/js/src/jsbit.h,v >retrieving revision 3.13 >diff -u -r3.13 jsbit.h >--- js/src/jsbit.h 21 Feb 2006 11:28:28 -0000 3.13 >+++ js/src/jsbit.h 22 Mar 2006 04:13:56 -0000 >@@ -147,5 +147,28 @@ > JS_END_MACRO > #endif > >+/* >+** Macros for rotate right. There is no rotate operation in the C Language >+** so the construct a >> 28 ^ a << 4 is frequently used instead. Most >+** compilers convert this to a rotate instruction but MSVC doesn't without >+** a little help. To get MSVC to generate a rotate instruction, we have >+** to use the _rotr intrinsic and use a pragma to make _rotr inline. >+** >+** MSVC in VS2005 will do an inline rotate instruction on the above >+** construct if the shifts are put together using or (|) but Mozilla >+** uses xor (^) so we wouldn't pick up the improvement just by using >+** VS2005. >+*/ Please reformat like so: /* * Macros for rotate right. There is no rotate operation in the C Language so * the construct a >> 28 ^ a << 4 is frequently used instead. Most compilers * convert this to a rotate instruction but MSVC doesn't without a little help. * To get MSVC to generate a rotate instruction, we have to use the _rotr * intrinsic and use a pragma to make _rotr inline. * * MSVC in VS2005 will do an inline rotate instruction on the above construct * if the shifts are put together using or (|) but Mozilla uses xor (^) so we * wouldn't pick up the improvement just by using VS2005. */ The ** comment style is old and copied from NSPR, so not to be imitated. I used vim Qj a bunch of times to get the wrapping to the right margin uniform. I hope we can get this in with just my blanket r/sr+, but module owners should be cc'ed and have a chance to comment if they care. /be

Attachment #215887 - Flags: superreview+

Attachment #215887 - Flags: review+

Attachment #215887 - Flags: approval1.9+

Brendan Eich [:brendan]

Updated

•

18 years ago

Flags: blocking1.9? → blocking1.9+

Michael Moy

Assignee

Comment 65

•

18 years ago

(In reply to comment #63) > (In reply to comment #62) > > The code is clearer with the patch anyway. Let's take it. It seems likely we > > need two separate patches, one for JS (spidermonkey), and one for the rest of > > the browser, to get this committed in a timely fashion? Or at least, in my > > experience, it's been easier to drive patches home that way. If there's > > disagreement, let me know. I'll be glad to review the JS patch. > > > > yea - separate JS and other patches might be good. What do we need to do for > GCC for linux/win? Either way this looks like a good, easy, win. So let's > move on it. > I think that GNU does the right thing without an intrinsic.

Michael Moy

Assignee

Comment 66

•

18 years ago

(In reply to comment #64) ... > The ** comment style is old and copied from NSPR, so not to be imitated. I used > vim Qj a bunch of times to get the wrapping to the right margin uniform. > > I hope we can get this in with just my blanket r/sr+, but module owners should > be cc'ed and have a chance to comment if they care. > > /be I need to set up a new FF3 environment to redo the patch and retest.

Mike Schroepfer

Updated

•

18 years ago

Priority: -- → P1

Nelson Bolyard (seldom reads bugmail)

Comment 67

•

18 years ago

NSS does efficient rotates in code that does extensive rotation. NSS has optimizations for several platforms, including MSVC and others. See http://mxr.mozilla.org/security/search?string=_rot&find=fast&tree=security

Brian Crowder

Comment 68

•

18 years ago

Attached patch updated to current trunk (obsolete) — Details — Splinter Review

Attachment #215887 - Attachment is obsolete: true

Brian Crowder

Comment 69

•

18 years ago

The following is the only difference shown by a diff-of-diffs between attachment 296638 [details] [diff] [review] and attachment 215887 [details] [diff] [review]: 374,377c374,377 < retrieving revision 1.20 < diff -u -r1.20 nsStaticNameTable.cpp < --- xpcom/ds/nsStaticNameTable.cpp 18 Apr 2004 14:18:13 -0000 1.20 < +++ xpcom/ds/nsStaticNameTable.cpp 22 Mar 2006 04:14:27 -0000 --- > retrieving revision 1.24 > diff -u -p -3 -r1.24 nsStaticNameTable.cpp > --- xpcom/ds/nsStaticNameTable.cpp 8 Jul 2007 07:08:50 -0000 1.24 > +++ xpcom/ds/nsStaticNameTable.cpp 10 Jan 2008 18:58:18 -0000 386,387c386,387 < @@ -82,7 +83,7 @@ < NS_STATIC_CAST(const unsigned char*, key); --- > @@ -112,14 +113,14 @@ caseInsensitiveStringHashKey(PLDHashTabl > for (const PRUnichar* s = tableKey->mKeyStr.m2b->get(); 391a392,400 > } else { > for (const unsigned char* s = > reinterpret_cast<const unsigned char*> > (tableKey->mKeyStr.m1b->get()); > *s != '\0'; > s++) > - h = (h >> (PL_DHASH_BITS - 4)) ^ (h << 4) ^ (*s & ~0x20); > + h = PR_ROTATE_RIGHT32(h, PL_DHASH_BITS - 4) ^ (*s & ~0x20); > }

Brian Crowder

Comment 70

•

18 years ago

Attached patch with comment nit picked (obsolete) — Details — Splinter Review

Asking for review again, since this could introduce subtle performance bugs, if got wrong.

Attachment #296368 - Attachment is obsolete: true

Attachment #296370 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 71

•

18 years ago

Comment on attachment 296370 [details] [diff] [review] with comment nit picked >+ * Macros for rotate right. There is no rotate operation in the C Language >+ * so the construct a >> 28 ^ a << 4 is frequently used instead. Most Uber-nit, helps avoid ragged right margin too: parenthesize << and >> against ^ (not necessary but C's operator precedence hierarchy misplaces things, and gcc warns [or used to] if you don't over-parenthesize, so it's a good idea even in comments). Ditto for clone in NSPR: >+** Macros for rotate right. There is no rotate operation in the C Language >+** so the construct a >> 28 ^ a << 4 is frequently used instead. Most /be

Attachment #296370 - Flags: review?(brendan)

Attachment #296370 - Flags: review+

Attachment #296370 - Flags: approval1.9+

Mike Schroepfer

Comment 72

•

18 years ago

Crowder is this ready to go?

Brian Crowder

Comment 73

•

18 years ago

Attached patch updated to current trunk, and brendan's last nits picked (obsolete) — Details — Splinter Review

Attachment #296370 - Attachment is obsolete: true

Attachment #299100 - Flags: review+

Brian Crowder

Comment 74

•

18 years ago

Yeah, ready to go as soon as the tree is happy again.

Keywords: checkin-needed

Wan-Teh Chang

Comment 75

•

18 years ago

Comment on attachment 299100 [details] [diff] [review] updated to current trunk, and brendan's last nits picked Brian, why don't you use | instead of ^ in the default definition of the rotate right macros? Using ^ contradicts your comments.

Brian Crowder

Comment 76

•

18 years ago

Wan-Teh Chang: Not sure I see your issue, can you reference the code and comments you mean, specifically? Also, the original patch belongs to Michael Moy (who I hope is still listening here), who I hope can defend his logic better than I can.

Nelson Bolyard (seldom reads bugmail)

Comment 77

•

18 years ago

Brian, In comment #57, Steve Snyder wrote: > Does generate rotate instruction: > a >> 28 | a << 4 > > Does not generate rotate instruction: > a >> 28 ^ a << 4 Suggesting that these rotate expressions should use '|' rather than '^' but the patch in attachment 299100 [details] [diff] [review] persists in using '^'.

Steve Snyder

Comment 78

•

18 years ago

(In reply to comment #77) > Suggesting that these rotate expressions should use '|' rather than '^' > but the patch in attachment 299100 [details] [diff] [review] persists in using '^'. Well, OR vs XOR does make a difference in code generation to get the same result. But we're talking about an eccentricity of a specific compiler, MSVC. Given that the purpose of the patch is to specify the use of the _rotr intrinsics when building with MSVC we shouldn't have to be concerned with whether better code is generated with OR or with XOR. Put simply, _rotr() takes the OR / XOR issue off the table.

Brian Crowder

Comment 79

•

18 years ago

Nelson: Right. Therefore we must use the intrinsic, since we need "^". The intrinsic tells the compiler to do the right thing, since it does not divine the correct instruction, as it does in the "|" case.

Michael Moy

Assignee

Comment 80

•

18 years ago

Using OR accumulates bits so long strings will tend towards all of the bits being turned on. Using XOR can flip bits on and off better distributing incoming bytes, words and longs through a longword. I'm a bit short on Windows hardware for Mozilla at the moment and am in the middle of setting up a new system to finish this bug up.

Brian Crowder

Comment 81

•

18 years ago

I think we're pretty close to wrapping up this bug. mmoy: can you check out my patch? Next problem is getting the pieces landed; there is a limited subset of people who have access to all of these parts of the tree (brendan, only?). I will split the patches, if Brendan can't land.

Reed Loden [:reed]

Comment 82

•

18 years ago

(In reply to comment #81) > Next problem is getting the pieces landed; there is a limited subset of > people who have access to all of these parts of the tree (brendan, only?). I > will split the patches, if Brendan can't land. Brendan doesn't have access to all three restricted partitions; only gerv does, strangely enough (for licensing things). So, probably should just split the patches by putting the NSS and NSPR changes in a separate patch. Then wtc or kaie or somebody can land that part, and you can land everything else.

Brian Crowder

Comment 83

•

18 years ago

reed: Feel like splitting it for me? :)

Nelson Bolyard (seldom reads bugmail)

Comment 84

•

18 years ago

There's no real need to split patches. Wan-Teh or I can land the NSS & NSPR parts of that patch, as it is. One question though. Do the parts outside of NSS & NSPR depend on the NSS parts being applied first? (I'm too lazy to look. :) IF so, then the other parts of the patch should not land until the NSS & NSPR parts have landed, and the CVS tag for NSS used in mozilla builds is updated.

Brian Crowder

Comment 85

•

18 years ago

js/src relies on its own macros, so that chunk of the patch can land whenever. For the rest of the tree, the NSPR hunks should probably land first. After that, I think order does not matter.

Wan-Teh Chang

Comment 86

•

18 years ago

(a >> 28) ^ (a << 4) is equal to (a >> 28) | (a << 4) if a is unsigned, and a needs to be unsigned for either expression to be equal to _rotr(a, 28). So why don't we use | in the default definition of the ROTATE_RIGHT macros? The comments in the patch suggest that | has a better chance of causing the compiler to generate good code. Test program: #include <stdio.h> int main() { /*unsigned*/ int a = 0x87654321; printf("0x%08x\n", (a >> 28) ^ (a << 4)); printf("0x%08x\n", (a >> 28) | (a << 4)); printf("0x%08x\n", _rotr(a, 28)); return 0; }

Brendan Eich [:brendan]

Comment 87

•

18 years ago

Reed, I am a despot (https://despot.mozilla.org/despot.cgi) -- I have all the keys to the kingdoms. /be

Michael Moy

Assignee

Comment 88

•

18 years ago

(In reply to comment #86) > (a >> 28) ^ (a << 4) is equal to > (a >> 28) | (a << 4) if a is unsigned, > and a needs to be unsigned for either expression to be equal to _rotr(a, 28). > > So why don't we use | in the default definition of the ROTATE_RIGHT macros? > The comments in the patch suggest that | has a better chance of causing the > compiler to generate good code. In the past, I've run into problems with two or three cases of using ROR for the above hash code and I'm wondering if they were using signed integers which would imply a bug. I'm planning on going through all of the cases where this code is used to check for signed integers and should have results Saturday night EST. Using | does look simpler than using the intrinsic. Perhaps with casts to ensure unsigned ints.

Reed Loden [:reed]

Comment 89

•

18 years ago

(In reply to comment #87) > Reed, I am a despot (https://despot.mozilla.org/despot.cgi) -- I have all the > keys to the kingdoms. So am I, but that doesn't mean I go around and add myself to partitions that I don't own just because I want to check-in there. :)

Brendan Eich [:brendan]

Comment 90

•

18 years ago

Reed, lighten up -- reasoning based on who happens to be listed in despot is silly (Gerv wouldn't do random commits just because he got access to relicense, once upon a time). My point is that the commits could be done if needed. /be

Wan-Teh Chang

Comment 91

•

18 years ago

Comment on attachment 299100 [details] [diff] [review] updated to current trunk, and brendan's last nits picked I'd like to suggest a change to this patch. This patch shows that ROTATE_LEFT32 is more appropriate for our code base. If we only provide macros for rotating in one direction, we should provide ROTATE_LEFT32, then code like: PR_ROTATE_RIGHT32(result, 28) JS_ROTATE_RIGHT32(h, JS_DHASH_BITS - 4) would become PR_ROTATE_LEFT32(result, 4) JS_ROTATE_LEFT32(h, 4) Another change I need to make to the NSPR portions of this patch is to check the value of _MSC_VER as well because NSPR still supports MSVC 6.0.

Brendan Eich [:brendan]

Comment 92

•

18 years ago

(In reply to comment #90) > Reed, lighten up -- reasoning based on who happens to be listed in despot is > silly (Gerv wouldn't do random commits just because he got access to relicense, > once upon a time). My point is that the commits could be done if needed. I left out "only" after "reasoning", and even then this may sound like a fast and loose approach to who commits changes -- not my intention. /be

Wan-Teh Chang

Comment 93

•

18 years ago

Attached patch NSPR patch (checked in) — Details — Splinter Review

I added PR_ROTATE_LEFT32 and removed PR_ROTATE_RIGHT64. Please let me know if you need 64-bit rotate macros. I edited the comments to reflect this change and to call out that the first argument |a| must be an unsigned 32-bit integer type. I believe the macro _AMD64 is missing a trailing underscore, so I changed it to _AMD64_. MSDN says the _rotl and _rotr intrinsics are available on the "x86, Intel 64-bit, and AMD 64-bit" architectures, so perhaps we don't need to test any architecture macro. I compiled the code with MSVC 6.0 and 2005 successfully, so I didn't add a check for MSVC version. I checked in this patch on the NSPR trunk for NSPR 4.7. I will create a new NSPR CVS tag for the Mozilla trunk tomorrow. Checking in pr/include/prbit.h; /cvsroot/mozilla/nsprpub/pr/include/prbit.h,v <-- prbit.h new revision: 3.9; previous revision: 3.8 done Checking in lib/ds/plhash.c; /cvsroot/mozilla/nsprpub/lib/ds/plhash.c,v <-- plhash.c new revision: 3.11; previous revision: 3.10 done

Wan-Teh Chang

Comment 94

•

18 years ago

Comment on attachment 299463 [details] [diff] [review] NSPR patch (checked in) I also use OR (|) instead of XOR (^).

Wan-Teh Chang

Comment 95

•

18 years ago

Comment on attachment 299100 [details] [diff] [review] updated to current trunk, and brendan's last nits picked >+#if defined(_MSC_VER) && (defined(_X86_) || defined(_AMD64) || defined(_M_AMD64)) We need to test _M_IX86, which is defined by the compiler. _X86_ is defined by <windows.h>, so it isn't defined if the file doesn't include <windows.h>.

Michael Moy

Assignee

Comment 96

•

18 years ago

Thanks for doing the update. I went through all calls and saw unsigned in every case. I also checked the assembler code to ensure that there were no SARs. My AMD64 tags are from 2005 and the 2.0 line and they may have changed since then.

Nelson Bolyard (seldom reads bugmail)

Comment 97

•

18 years ago

NSS changes committed on trunk Checking in base/hash.c; new revision: 1.10; previous revision: 1.9 Checking in pki/pkistore.c; new revision: 1.31; previous revision: 1.30

Wan-Teh Chang

Comment 98

•

18 years ago

This is the macro test I ended up using: #if defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_AMD64) || \ defined(_M_X64)) You might want to do the same in JavaScript's jsbit.h. _M_AMD64 and _M_X64 mean the same thing, but _M_X64 is the official one now. I'd prefer shorter macro names such as PR_ROTL32 and JS_ROTR32. Not a big deal though if you disagree.

Brendan Eich [:brendan]

Comment 99

•

18 years ago

I'd prefer to match wtc's changes in the JS macrology. Crowder, can do? /be

Brian Crowder

Comment 100

•

18 years ago

Why keep ROTATE_RIGHT32?

Brian Crowder

Comment 101

•

18 years ago

Attached patch left is more right than right (obsolete) — Details — Splinter Review

Changed js/src and xpconnect to use ROTATE_LEFT instead of ROTATE_RIGHT, abandoned 64-bit rotate macros and changed the preprocessor check to use the same guards as used in NSPR. Still think there's no point in keeping JS_ROTATE_RIGHT32

Attachment #299100 - Attachment is obsolete: true

Attachment #299856 - Flags: review?(brendan)

Reed Loden [:reed]

Updated

•

18 years ago

Keywords: checkin-needed

Reed Loden [:reed]

Updated

•

18 years ago

Whiteboard: [needs landing]

Wan-Teh Chang

Comment 102

•

18 years ago

Comment on attachment 299856 [details] [diff] [review] left is more right than right >+#pragma intrinsic(_rotr, _rotr64) You need to list _rotl instead of _rotr64 here. I agree that JavaScript doesn't need JS_ROTATE_RIGHT32. I kept PR_ROTATE_RIGHT32 for completeness because NSPR is a general-purpose library. But jsbit.h is for JavaScript internal use only, so you only need to define what you need. You still have two calls to JS_ROTATE_RIGHT32 left.

Wan-Teh Chang

Comment 103

•

18 years ago

Comment on attachment 299856 [details] [diff] [review] left is more right than right >+/* >+ * Macros for rotate right. There is no rotate operation in the C Language >+ * so the construct (a >> 28) ^ (a << 4) is frequently used instead. Most Remember to update this block comment. You can use what I did in NSPR as a reference: http://lxr.mozilla.org/nspr/source/nsprpub/pr/include/prbit.h#110

Brian Crowder

Comment 104

•

18 years ago

Comment on attachment 299856 [details] [diff] [review] left is more right than right Patch coming that fixes the things Wan-Teh mentions

Attachment #299856 - Attachment is obsolete: true

Attachment #299856 - Flags: review?(brendan)

Brian Crowder

Comment 105

•

18 years ago

Attached patch one more try (obsolete) — Details — Splinter Review

Attachment #300095 - Flags: review?(brendan)

Wan-Teh Chang

Comment 106

•

18 years ago

Comment on attachment 300095 [details] [diff] [review] one more try r=wtc. There is a comment bug and a code bug. >+ * the construct (a << 28) | (a >> 4) is used instead. Most compilers convert Nit: say (a << 4) | (a >> 28) >+ * the _rotr intrinsic and use a pragma to make _rotr inline. _rotr ==> _rotl (two occurrences) >- hash = (hash >> (JS_DHASH_BITS - 4)) ^ (hash << 4) >+ hash = JS_ROTATE_LEFT32(hash, JS_DHASH_BITS - 4) The second argument should be 4 (this is in jsscope.c).

Attachment #300095 - Flags: review+

Brian Crowder

Comment 107

•

18 years ago

Attached patch more silly errors on my part fixed, thanks wtc — Details — Splinter Review

Attachment #300095 - Attachment is obsolete: true

Attachment #300095 - Flags: review?(brendan)

Brian Crowder

Comment 108

•

18 years ago

Comment on attachment 300165 [details] [diff] [review] more silly errors on my part fixed, thanks wtc Still like to give Brendan a chance to eyeball this before I land it.

Attachment #300165 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 109

•

18 years ago

Comment on attachment 300165 [details] [diff] [review] more silly errors on my part fixed, thanks wtc Looks good, thanks. /be

Attachment #300165 - Flags: review?(brendan)

Attachment #300165 - Flags: review+

Attachment #300165 - Flags: approval1.9+

Brian Crowder

Comment 110

•

18 years ago

Checking in jsbit.h; /cvsroot/mozilla/js/src/jsbit.h,v <-- jsbit.h new revision: 3.18; previous revision: 3.17 done Checking in jsdhash.c; /cvsroot/mozilla/js/src/jsdhash.c,v <-- jsdhash.c new revision: 1.48; previous revision: 1.47 done Checking in jshash.c; /cvsroot/mozilla/js/src/jshash.c,v <-- jshash.c new revision: 3.20; previous revision: 3.19 done Checking in jsscope.c; /cvsroot/mozilla/js/src/jsscope.c,v <-- jsscope.c new revision: 3.77; previous revision: 3.76 done Checking in jsstr.c; /cvsroot/mozilla/js/src/jsstr.c,v <-- jsstr.c new revision: 3.188; previous revision: 3.187 done Checking in liveconnect/jsj_hash.c; /cvsroot/mozilla/js/src/liveconnect/jsj_hash.c,v <-- jsj_hash.c new revision: 1.13; previous revision: 1.12 done Checking in xpconnect/src/xpcmaps.cpp; /cvsroot/mozilla/js/src/xpconnect/src/xpcmaps.cpp,v <-- xpcmaps.cpp new revision: 1.26; previous revision: 1.25 done

Brian Crowder

Comment 111

•

18 years ago

I believe that my landing means this bug is fixed. Someone else should disabuse me of that notion, if I am incorrect.

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → FIXED

Steve Snyder

Updated

•

18 years ago

Blocks: 415264

Steve Snyder

Updated

•

18 years ago

Blocks: 415262

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

17 years ago

Whiteboard: [needs landing]

Sample hash patch using intrinsics 20 years ago Michael Moy 1.86 KB, patch		Details \| Diff \| Splinter Review
Hash patch to use _rotr intrinsic in MSVC 20 years ago Michael Moy 16.12 KB, patch		Details \| Diff \| Splinter Review
Trunk version of _rotr hash patch 20 years ago Michael Moy 15.73 KB, patch		Details \| Diff \| Splinter Review
Trunk _rotr patch 20 years ago Michael Moy 15.81 KB, patch		Details \| Diff \| Splinter Review
Trunk _rotr patch 20 years ago Michael Moy 15.81 KB, patch		Details \| Diff \| Splinter Review
Branch _rotr patch 20 years ago Michael Moy 16.20 KB, patch		Details \| Diff \| Splinter Review
Trunk patch to use _rotr for hash 20 years ago Michael Moy 15.85 KB, patch	brendan : review+ brendan : superreview+ brendan : approval1.9+	Details \| Diff \| Splinter Review
Additional use of rotate macro, from branch. 19 years ago Steve Snyder 1.94 KB, patch		Details \| Diff \| Splinter Review
updated to current trunk 18 years ago Brian Crowder 23.71 KB, patch		Details \| Diff \| Splinter Review
with comment nit picked 18 years ago Brian Crowder 23.71 KB, patch	brendan : review+ brendan : approval1.9+	Details \| Diff \| Splinter Review
updated to current trunk, and brendan's last nits picked 18 years ago Brian Crowder 23.72 KB, patch	crowderbt : review+	Details \| Diff \| Splinter Review
NSPR patch (checked in) 18 years ago Wan-Teh Chang 2.39 KB, patch		Details \| Diff \| Splinter Review
left is more right than right 18 years ago Brian Crowder 8.22 KB, patch		Details \| Diff \| Splinter Review
one more try 18 years ago Brian Crowder 7.80 KB, patch	wtc : review+	Details \| Diff \| Splinter Review
more silly errors on my part fixed, thanks wtc 18 years ago Brian Crowder 7.81 KB, patch	brendan : review+ brendan : approval1.9+	Details \| Diff \| Splinter Review