performance optimizations for camellia

RESOLVED FIXED in 3.13

Status

NSS
Libraries
--
enhancement
RESOLVED FIXED
10 years ago
8 years ago

People

(Reporter: Nelson Bolyard (seldom reads bugmail), Assigned: Wan-Teh Chang)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 2 obsolete attachments)

camellia.a makes use of macros for 32-bit rotation and for byte swapping.
http://mxr.mozilla.org/security/source/security/nss/lib/freebl/camellia.c#73
But it doesn't appear to take advantage of all the optimizations for those
operations that are used elsewhere in NSS, e.g.
http://mxr.mozilla.org/security/source/security/nss/lib/freebl/sha_fast.h#60

I would suggest #including sha_fast.h and reusing those macros.
(Assignee)

Comment 1

8 years ago
Created attachment 396064 [details] [diff] [review]
Use SHA_HTONL and SHA_ALLOW_UNALIGNED_ACCESS

I did that for the MSVC x86/x64 case in bug 508113.

Unfortunately the GETU32/PUTU32 macros take 'unsigned char *'
arguments, so we can only use the SHA_HTONL macro for CPUs
that allow unaligned access to PRUint32.

Without changing the signature of the GETU32/PUTU32 macros,
the only thing left to do for this bug is to generalize the
MSVC x86/x64 case to all compilers and all CPUs (little or
big endian) that allow unaligned access.

Is it OK to use the SHA_ALLOW_UNALIGNED_ACCESS macro from
sha_fast.h here?  It seems like an internal macro of
sha_fast.{h,c} to me.
Attachment #396064 - Flags: review?(nelson)
(Assignee)

Updated

8 years ago
Assignee: nobody → wtc
Status: NEW → ASSIGNED
Comment on attachment 396064 [details] [diff] [review]
Use SHA_HTONL and SHA_ALLOW_UNALIGNED_ACCESS

r=nelson
Attachment #396064 - Flags: review?(nelson) → review+
(Assignee)

Comment 3

8 years ago
I checked in the patch on the SOFTOKEN_3_13_BRANCH (Softoken 3.13).

Checking in camellia.c;
/cvsroot/mozilla/security/nss/lib/freebl/camellia.c,v  <--  camellia.c
new revision: 1.2.8.3; previous revision: 1.2.8.2
done
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.13

Comment 4

8 years ago
This patch broke Solaris . It appears we have no ongoing builds of any kinds for SOFTOKEN_3_13_BRANCH , so I found this out manually .

cc -o SunOS5.10_i86pc_DBG.OBJ/SunOS_SINGLE_SHLIB/camellia.o -c -g -KPIC -DSVR4 -DSYSV -D__svr4 -D__svr4__ -DSOLARIS -D_REENTRANT -Di386 -DSOLARIS2_10 -D_SVID_GETTOD  -xs -DXP_UNIX -DSHLIB_SUFFIX=\"so\" -DSHLIB_PREFIX=\"lib\" -DSHLIB_VERSION=\"3\" -DSOFTOKEN_SHLIB_VERSION=\"3\" -DRIJNDAEL_INCLUDE_TABLES -DDEBUG -UNDEBUG -DDEBUG_jp96085 -DNSS_ENABLE_ECC -DNSS_ECC_MORE_THAN_SUITE_B -DUSE_UTIL_DIRECTLY -DNSS_X86_OR_X64 -DNSS_X86 -DMP_USE_UINT_DIGIT -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE  -DMP_ASSEMBLY_DIV_2DX1D -DMP_API_COMPATIBLE -I/usr/dt/include -I/usr/openwin/include -I../../../../dist/SunOS5.10_i86pc_DBG.OBJ/include -I../../../../dist/public/nss -I../../../../dist/private/nss -Impi -Iecl  camellia.c
"camellia.c", line 480: undefined symbol: tmp
"camellia.c", line 694: undefined symbol: tmp
"camellia.c", line 995: undefined symbol: tmp
"camellia.c", line 1099: undefined symbol: tmp
"camellia.c", line 1206: undefined symbol: tmp
"camellia.c", line 1334: undefined symbol: tmp
cc: acomp failed for camellia.c
gmake[1]: *** [SunOS5.10_i86pc_DBG.OBJ/SunOS_SINGLE_SHLIB/camellia.o] Error 1
gmake[1]: Leaving directory `/net/monstre/export/home/julien/nss/313/mozilla/security/nss/lib/freebl'
gmake: *** [libs] Error 2

I verified that reverting to the old version fixes the build issue.

[jp96085@monstre]/net/monstre/export/home/julien/nss/313/mozilla/security/nss/lib/freebl 63 % cvs up -r1.2.8.2 camellia.c
P camellia.c
[jp96085@monstre]/net/monstre/export/home/julien/nss/313/mozilla/security/nss/lib/freebl 64 % make
../../../coreconf/nsinstall/SunOS5.10_i86pc_DBG.OBJ/nsinstall -R -m 444 blapit.h shsign.h ecl/ecl-exp.h hasht.h sechash.h ../../../../dist/public/nss
../../../coreconf/nsinstall/SunOS5.10_i86pc_DBG.OBJ/nsinstall -R -m 444 alghmac.h blapi.h secmpi.h secrng.h ec.h ecl/ecl.h ecl/ecl-curve.h ../../../../dist/private/nss
../../../coreconf/nsinstall/SunOS5.10_i86pc_DBG.OBJ/nsinstall -R -m 664 SunOS5.10_i86pc_DBG.OBJ/libfreebl.a ../../../../dist/SunOS5.10_i86pc_DBG.OBJ/lib
make FREEBL_CHILD_BUILD=1 \
 OBJDIR=SunOS5.10_i86pc_DBG.OBJ/SunOS_SINGLE_SHLIB libs
make[1]: Entering directory `/net/monstre/export/home/julien/nss/313/mozilla/security/nss/lib/freebl'
cc -o SunOS5.10_i86pc_DBG.OBJ/SunOS_SINGLE_SHLIB/camellia.o -c -g -KPIC -DSVR4 -DSYSV -D__svr4 -D__svr4__ -DSOLARIS -D_REENTRANT -Di386 -DSOLARIS2_10 -D_SVID_GETTOD  -xs -DXP_UNIX -DSHLIB_SUFFIX=\"so\" -DSHLIB_PREFIX=\"lib\" -DSHLIB_VERSION=\"3\" -DSOFTOKEN_SHLIB_VERSION=\"3\" -DRIJNDAEL_INCLUDE_TABLES -DDEBUG -UNDEBUG -DDEBUG_jp96085 -DNSS_ENABLE_ECC -DNSS_ECC_MORE_THAN_SUITE_B -DUSE_UTIL_DIRECTLY -DNSS_X86_OR_X64 -DNSS_X86 -DMP_USE_UINT_DIGIT -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE  -DMP_ASSEMBLY_DIV_2DX1D -DMP_API_COMPATIBLE -I/usr/dt/include -I/usr/openwin/include -I../../../../dist/SunOS5.10_i86pc_DBG.OBJ/include -I../../../../dist/public/nss -I../../../../dist/private/nss -Impi -Iecl  camellia.c
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 5

8 years ago
The problem seems due to the use of an implementation of SHA_HTONL that requires the declaration of a "tmp" variable.

This variable is declared in sha_fast.c, but not camellia.c .
It is declared as a global as :

#if defined(SHA_NEED_TMP_VARIABLE)
  register PRUint32 tmp;
#endif

I don't know if the compiler is required to always honor the register keyword, but if this global variable is ever moved out of the register and into a fixed memory location for whatever reason, there could be a thread-safety implication for the SHA and Camellia algorithms. This implementation would only be safe if tmp was declared as a local. But that can't be done within the SHA_HTONL macro. It would require defining another macro such as SHA_HTNOL_LOCAL to be inserted in the scope where SHA_HTNOL needs to be used.

Ideally however, we shouldn't ever need to use this implementation at all. It may be time to talk to the Sun studio guys again about how to properly do bswap with their compiler.

Comment 6

8 years ago
Created attachment 396584 [details] [diff] [review]
Fix Solaris build by adding tmp locals where SHA_HTONL is used
Attachment #396584 - Flags: review?(wtc)

Comment 7

8 years ago
re: comment 4, forget what I said about thread safety. The declaration of tmp in question was a local, not a global.

Comment 8

8 years ago
Created attachment 396592 [details] [diff] [review]
Better fix with #ifdefs

This will preventing compilers on other platforms from complaining about unused variables.
Attachment #396584 - Attachment is obsolete: true
Attachment #396592 - Flags: review?(wtc)
Attachment #396584 - Flags: review?(wtc)
(Assignee)

Comment 9

8 years ago
Comment on attachment 396592 [details] [diff] [review]
Better fix with #ifdefs

[I wrote this comment while you discovered the unused variable
compiler warning issue.]

Julien,

Thank you for the bug report and the patch.  I am actually
looking at this.  I'm going to back out my checkin.

Your patch will generate compiler warnings on unused variables.
The 'tmp' variable declarations need to be ifdef'ed like this:

+#if defined(SHA_ALLOW_UNALIGNED_ACCESS) && defined(SHA_NEED_TMP_VARIABLE)
     PRUint32 tmp;
+#endif

If you don't find that too ugly, then your fix is fine by me.
You can improve it slightly with a new macro:

 #if defined(SHA_ALLOW_UNALIGNED_ACCESS)

 /* require a CPU that allows unaligned access */

+#if defined(SHA_NEED_TMP_VARIABLE)
+#define CAMELLIA_NEED_TMP_VARIABLE 1
+#endif

and then you ifdef the 'tmp' declarations with that macro:

+#if defined(CAMELLIA_NEED_TMP_VARIABLE)
     PRUint32 tmp;
+#endif

Another issue is that the default implementation of SHA_HTONL that uses
'tmp':

#define SHA_HTONL(x)  (tmp = (x), tmp = (tmp << 16) | (tmp >> 16), \
                       ((tmp & SHA_MASK) << 8) | ((tmp >> 8) & SHA_MASK))

may not be faster than the default implementation of GETU32/PUTU32 in
this file.

Another solution is to go back to rev. 1.2.8.2 and then add
defined(__GNUC__):

#if (defined(_MSC_VER) || defined(__GNUC__)) && defined(NSS_X86_OR_X64)

/* require a little-endian CPU that allows unaligned access */

...

#else /* not MSVC/GCC or not x86/x64 */

But this fix is tightly coupled to the internals of sha_fast.h,
which could be a maintenance nightmare.

Comment 10

8 years ago
The multiple ifdef's are rather ugly if we have to repeat them everywhere. I don't know of a good solution. I wish we weren't using this C-based implementation on any platform. We shouldn't be using it on Solaris which is one of our tier 1 platforms. I filed RFE bug 512579 about that. I suppose there will always be platforms that don't have it natively implemented. And maybe some CPUs do the bswap on registers like x86, while others do it on a memory address. I don't have a specific example of the later. It would be nice to have a portable and efficient bswap function that doesn't require local variable declarations to use. We can't have all three right now.

Comment 11

8 years ago
Perhaps we can get rid of the local variable requirement by having the default non-native implementation of SHA_HTONL be an inline function instead of a macro. I don't know if this is any less efficient. But it would be a lot cleaner code.
> It is declared as a global as :
It's not a global.  Look again.

Comment 13

8 years ago
Nelson,
Already noted in comment 7.
(Assignee)

Comment 14

8 years ago
Created attachment 396753 [details] [diff] [review]
Julien's better fix with #ifdefs, v2

I think this approach is the easiest to maintain because it uses the
macros defined in sha_fast.h in exactly the same way sha_fast.c does.
Attachment #396592 - Attachment is obsolete: true
Attachment #396753 - Flags: review?(julien.pierre.boogz)
Attachment #396592 - Flags: review?(wtc)

Comment 15

8 years ago
Comment on attachment 396753 [details] [diff] [review]
Julien's better fix with #ifdefs, v2

This is fine except tmp should be declared register .
Attachment #396753 - Flags: review?(julien.pierre.boogz) → review+
(Assignee)

Comment 16

8 years ago
Comment on attachment 396753 [details] [diff] [review]
Julien's better fix with #ifdefs, v2

Current compilers do the right thing automatically without the 'register'
keyword.  The Camellia functions have many local variables.  It would
look strange if only 'tmp' is declared with 'register', and it's not
clear if 'tmp' deserves to be placed in a register more than the other
variables, which are used much more often.

I checked in the patch on the SOFTOKEN_3_13_BRANCH (Softoken 3.13).

Checking in camellia.c;
/cvsroot/mozilla/security/nss/lib/freebl/camellia.c,v  <--  camellia.c
new revision: 1.2.8.4; previous revision: 1.2.8.3
done
(Assignee)

Updated

8 years ago
Status: REOPENED → RESOLVED
Last Resolved: 8 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.