freebl: POWER ChaCha20 stream cipher vector acceleration
Categories
(NSS :: Libraries, enhancement, P5)
Tracking
(Not tracked)
People
(Reporter: johnjmar, Assigned: aoeuh)
References
Details
Attachments
(5 files, 6 obsolete files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
Expected results:
Would like to reach community to help improve the performance of ChaCha20
using POWER8 and POWER9 ISA enhancements, and to provide proof that
achieved performance of ChaCha20 is close to optimal for the platform
(i.e matching or beating OpenSSL performance *)
Optimized assembly implementations in the Cryptogams repository [1] may serve
as useful reference. C-based intrinsic implementations are also acceptable
given prior conditions (*).
Financial bounty upon completion, community acceptance of patches, and reviewer
input.
Moreover, free access to a ppc64le VM or direct Jenkins CI account can be
obtained at OSU Open Source Lab: https://osuosl.org/services/powerdev/
You can list me (johnjmar@linux.vnet.ibm.com) as the IBM Advocate.
Updated•6 years ago
|
This patch adds chacha20 vsx implementation, patch ChaCha20Xor() for calling vsx version of chacha20 and gyp patch for building (i will add a make patch later).
There is no way to do performance test for vsx version, so i add this patch.
before:
bltest -E -m chacha20_ctr -i data -o /dev/null -k key -v iv -p 3
mode in opreps cxreps context op time(sec) thrgput
chacha20_ctr_e 210Mb 0Kb 3 0 0.000 22141.000 29.395 7Mb 147Kb
after:
bltest -E -m chacha20_ctr -i data -o /dev/null -k key -v iv -p 300
mode in opreps cxreps context op time(sec) thrgput
chacha20_ctr_e 20Gb 520Mb 300 0 0.000 13735.000 13.790 1Gb 498Mb
make patch added and small bug fixed
small bug fix
Reporter | ||
Comment 5•6 years ago
|
||
Hello,
I'm working in a Power9 system and I was able to apply both patches and build project using gyp.
However, when trying to run bltest I get a segmentation error :
ubuntu@oprom8:~/libs/nss$ patch -p1 < ../nss.9138837.patch
patching file cmd/bltest/blapitest.c
patching file lib/freebl/blapi.h
patching file lib/freebl/blapit.h
patching file lib/freebl/chacha20poly1305.c
Hunk #1 succeeded at 70 with fuzz 1 (offset -4 lines).
patching file lib/freebl/chacha20poly1305.h
patching file lib/freebl/ldvector.c
patching file lib/freebl/loader.c
patching file lib/freebl/loader.h
ubuntu@oprom8:~/libs/nss$ cd ../; rm -rf dist; nss/build.sh -o -g
NSPR [1/5] configure ...
NSPR [2/5] make ...
NSPR [3/5] NOT building tests
NSPR [4/5] NOT running tests
NSPR [5/5] install ...
ninja: Entering directory `/home/ubuntu/libs/nss/out/Release'
[1120/1120] STAMP obj/nss_tests.actions_depends.stamp
ubuntu@oprom8:~/libs$ iv=nss/cmd/bltest/tests/chacha20_poly1305/iv1
ubuntu@oprom8:~/libs$ k=nss/cmd/bltest/tests/chacha20_poly1305/key1
ubuntu@oprom8:~/libs$ p=nss/cmd/bltest/tests/chacha20_poly1305/plaintext1
ubuntu@oprom8:~/libs$ bltest=dist/Release/bin/bltest
ubuntu@oprom8:~/libs$ export LD_LIBRARY_PATH=dist/Release/lib
ubuntu@oprom8:~/libs$
ubuntu@oprom8:~/libs$ $bltest -E -m chacha20_ctr -i $p -o /dev/null -k $k -v $iv -5 10
Segmentation fault (core dumped)
I'm currently investigating.
Reporter | ||
Comment 6•6 years ago
|
||
Looks like chacha20_Encrypt
is getting a null ctx ...
Reading symbols from dist/Debug/bin/bltest...done.
(gdb) run -E -m chacha20_ctr -i nss/cmd/bltest/tests/chacha20_poly1305/plaintext1 -o /dev/null -k nss/cmd/bltest/tests/chacha20_poly1305/key1 -v nss/cmd/bltest/tests/chacha20_poly1305/iv1 -5 10
Starting program: /home/ubuntu/libs/dist/Debug/bin/bltest -E -m chacha20_ctr -i nss/cmd/bltest/tests/chacha20_poly1305/plaintext1 -o /dev/null -k nss/cmd/bltest/tests/chacha20_poly1305/key1 -v nss/cmd/bltest/tests/chacha20_poly1305/iv1 -5 10
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7a5f180 (LWP 71024)]
Thread 2 "bltest" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7a5f180 (LWP 71024)]
chacha20_Encrypt (cx=0x0, output=0x100053370 "", outputLen=0x7ffff7a5e5f0, maxOutputLen=265,
input=0x100053250 "Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference"...,
inputLen=265) at ../../cmd/bltest/blapitest.c:1147
1147 return ChaCha20_Xor(output, input, inputLen, ctx->key, ctx->iv+4,
(gdb) where
#0 chacha20_Encrypt (cx=0x0, output=0x100053370 "", outputLen=0x7ffff7a5e5f0, maxOutputLen=265,
input=0x100053250 "Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference"...,
inputLen=265) at ../../cmd/bltest/blapitest.c:1147
#1 0x000000010000c8f8 in cipherDoOp (cipherInfo=0x100052eb0) at ../../cmd/bltest/blapitest.c:2433
#2 0x00000001000105c8 in ThreadExecTest (data=0x100052eb0) at ../../cmd/bltest/blapitest.c:3327
#3 0x00007ffff7bec260 in _pt_root (arg=0x100054390) at ../../../../pr/src/pthreads/ptthread.c:201
#4 0x00007ffff7e9885c in start_thread (arg=0x0) at pthread_create.c:463
#5 0x00007ffff7db9028 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
(gdb) list
1142 unsigned int maxOutputLen, const unsigned char *input,
1143 unsigned int inputLen)
1144 {
1145 ChaCha20Context *ctx = cx;
1146 *outputLen = inputLen;
1147 return ChaCha20_Xor(output, input, inputLen, ctx->key, ctx->iv+4,
1148 *((PRUint32 *)ctx->iv));
1149
1150 }
1151
iv must be 16 byte file, key must be 32 byte file. The plaintext file size is also limited, but I do not know how much. I think the problem is in the internals of bltest.
ubuntu@oprom8:~/libs$ $bltest -E -m chacha20_ctr -i $p -o /dev/null -k $k -v $iv -5 10
-5 10 looks like a typo
Oh, I realized the problem is exactly in one of the input files. ChaCha20_CreateContext returns 0 if the iv or key size does not match
Reporter | ||
Comment 10•6 years ago
|
||
(In reply to aoeuh from comment #7)
iv must be 16 byte file, key must be 32 byte file. The plaintext file size is also limited, but I do not know how much. I think the problem is in the internals of bltest.
Looks like the iv
I'm using from the nss test files is 12 bytes:
ubuntu@oprom8:~/libs$ wc $iv $k
+ wc nss/cmd/bltest/tests/chacha20_poly1305/iv1 nss/cmd/bltest/tests/chacha20_poly1305/key1
0 0 12 nss/cmd/bltest/tests/chacha20_poly1305/iv1
0 3 32 nss/cmd/bltest/tests/chacha20_poly1305/key1
However, looking at https://tools.ietf.org/html/rfc7539, section 2.3 states that the nonce
or iv in this case MUST be 96 bits (12 bytes).
(In reply to aoeuh from comment #8)
ubuntu@oprom8:~/libs$ $bltest -E -m chacha20_ctr -i $p -o /dev/null -k $k -v $iv -5 10
-5 10 looks like a typo
I'm running the test for 10 seconds. Per the help
output
+ dist/Release/bin/bltest
bltest: you must enter a command!
bltest [-DEHSVR] List available cipher modes
bltest -E -m mode Encrypt a buffer
[-i plaintext] [-o ciphertext] [-k key] [-v iv]
[-b bufsize] [-g keysize] [-e exp] [-r rounds]
[-w wordsize] [-p repetitions | -5 time_interval]
[-4 th_num]
-m cipher mode to use
-i file which contains input buffer
-o file for output buffer
-k file which contains key
-v file which contains initialization vector
-b size of input buffer
-g key size (in bytes)
-p do performance test
-4 run test in multithread mode. th_num number of parallel threads
-5 run test for specified time interval(in seconds)
Anyway, I tested using a 16 byte iv
(not sure if legal), and I see a ~3x improvement (previous vs. optmized):
iv=nss/cmd/bltest/tests/aes_cbc/iv16
#patch -p1 < ../nss.9138837.patch
ubuntu@oprom8:~/libs$ $bltest -E -m chacha20_ctr -i $p -o /dev/null -k $k -v $iv -5 10
# mode in opreps cxreps context op time(sec) thrgput
chacha20_ctr_e 2Gb 227Mb 9M 0 0.000 10000.000 10.001 227Mb 582Kb
#patch -p1 < ../nss.9138837.patch
#patch -p1 < ../nss.9138830.patch
ubuntu@oprom8:~/libs$ $bltest -E -m chacha20_ctr -i $p -o /dev/null -k $k -v $iv -5 10
# mode in opreps cxreps context op time(sec) thrgput
chacha20_ctr_e 6Gb 180Mb 25M 0 0.000 10000.000 10.001 632Mb 424Kb
Assignee | ||
Comment 11•6 years ago
|
||
my first test was on debug build, my fault.
Results on integricloud power9 vps (1 core 4 threads i suppose):
before
bltest -E -m chacha20_ctr -i testfile -o resnss -k key -v iv -5 10
# mode in opreps cxreps context op time(sec) thrgput
chacha20_ctr_e 2Gb 852Mb 2T 0 0.000 10025.000 10.030 289Mb 135Kb
after
bltest -E -m chacha20_ctr -i testfile -o resnss -k key -v iv -5 10
# mode in opreps cxreps context op time(sec) thrgput
chacha20_ctr_e 14Gb 894Mb 15T 0 0.000 10005.000 10.007 1Gb 497Mb
openssl
openssl speed -evp chacha20
Doing chacha20 for 3s on 16 size blocks: 38326198 chacha20's in 2.89s
Doing chacha20 for 3s on 64 size blocks: 15884799 chacha20's in 2.88s
Doing chacha20 for 3s on 256 size blocks: 15408018 chacha20's in 3.00s
Doing chacha20 for 3s on 1024 size blocks: 4190731 chacha20's in 3.00s
Doing chacha20 for 3s on 8192 size blocks: 539112 chacha20's in 2.99s
Doing chacha20 for 3s on 16384 size blocks: 274003 chacha20's in 3.00s
OpenSSL 1.1.1d 10 Sep 2019
built on: Sat Oct 12 19:56:43 2019 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-SVnnZZ/openssl-1.1.1d=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
chacha20 212186.56k 352995.53k 1314817.54k 1430436.18k 1477058.70k 1496421.72k
However, looking at https://tools.ietf.org/html/rfc7539, section 2.3 states that the nonce or iv in this case MUST be 96 bits (12 bytes).
first 4 bytes is the counter (in little endian), openssl also uses 16 byte iv for chacha20.
Is there something I have to do?
Reporter | ||
Comment 12•6 years ago
|
||
One more observation. Looking at perf traces for bltests chacha20_ctr
and chacha20_poly1305
using unoptimized code, I see both tests sharing the same double_round
function while performing the cipher stream.
For the same bltests using optimized code, I see the following:
# chacha20_poly1305_e
+ 64.48% 0.00% bltest libc-2.27.so [.] __clone \u2592
+ 64.48% 0.00% bltest libpthread-2.27.so [.] start_thread \u2592
+ 64.48% 0.00% bltest libnspr4.so [.] _pt_root \u2592
+ 64.48% 0.00% bltest bltest [.] ThreadExecTest \u2592
+ 64.48% 0.05% bltest bltest [.] cipherDoOp \u2592
+ 64.28% 0.11% bltest bltest [.] chacha20_poly1305_Encrypt \u2592
+ 64.12% 0.20% bltest bltest [.] ChaCha20Poly1305_Seal \u2592
+ 63.93% 0.11% bltest libfreeblpriv3.so [.] ChaCha20Poly1305_Seal \u2592
+ 62.14% 7.68% bltest libfreeblpriv3.so [.] Hacl_Chacha20Poly1305_32_aead_encrypt \u2592
+ 53.73% 45.60% bltest libfreeblpriv3.so [.] Hacl_Chacha20_chacha20_encrypt.localalias.0 \u2592
+ 7.81% 7.77% bltest libfreeblpriv3.so [.] double_round \u2592
+ 1.06% 1.06% bltest libfreeblpriv3.so [.] Hacl_Poly1305_32_poly1305_finish \u2592
+ 1.00% 0.53% bltest libc-2.27.so [.] __memcpy_chk
# chacha20_ctr_e
+ 60.90% 0.00% bltest libc-2.27.so [.] __clone \u25c6
+ 60.90% 0.00% bltest libpthread-2.27.so [.] start_thread \u2592
+ 60.90% 0.00% bltest libnspr4.so [.] _pt_root \u2592
+ 60.90% 0.00% bltest bltest [.] ThreadExecTest \u2592
+ 60.80% 0.42% bltest bltest [.] cipherDoOp \u2592
+ 59.94% 0.79% bltest bltest [.] chacha20_Encrypt \u2592
+ 59.13% 0.58% bltest bltest [.] ChaCha20_Xor \u2592
+ 58.33% 2.24% bltest libfreeblpriv3.so [.] ChaCha20_Xor \u2592
+ 56.39% 0.31% bltest libfreeblpriv3.so [.] ChaCha20Xor \u2592
+ 23.81% 23.79% bltest libfreeblpriv3.so [.] mainloop <= this used to be 'double_round' before \u2592
+ 23.28% 23.24% bltest libfreeblpriv3.so [.] lastblock \u2592
+ 5.78% 5.78% bltest libfreeblpriv3.so [.] chacha20vsx \u2592
+ 2.03% 2.03% bltest libfreeblpriv3.so [.] xorlast \u2592
+ 1.18% 1.18% bltest libfreeblpriv3.so [.] exitsub
The expectation here is that eventhough the authenticator poly1305 part is not optmized, one would still see some benefits running the chacha20_poly1305
bltest, which is not the case.
Assignee | ||
Comment 13•6 years ago
|
||
Added functions Chacha20Poly1305_vsx_aead_encrypt and Chacha20Poly1305_vsx_aead_decrypt. Actually I just copy Hacl_Chacha20Poly1305_32.c and fix it a bit. I did not want to edit the file because it was verified.
Reporter | ||
Comment 14•5 years ago
|
||
Hello,
After grabbing the latest NSS code available, I've verified the latest patch (9139949
) using a Power9 system running baremetal linux (I had to manually patch both lib/freebl/ldvector.c
and lib/freebl/Makefile
due to the addition of SHA512 constructs in the latest version of the code)
Anyway, I see a 3x improvement in the chacha20_poly1305_e
bltest and a 5x improvement in the newly created chacha20_ctr_e
test. Notice I used a larger plaintext input for the tests.
#### BEFORE OPTIMIZATION PATCH
ubuntu@oprom8:~/libs/nss$ patch -p1 < ../nss.9138837.patch # bltest
1 out of 1 hunk FAILED -- saving rejects to file lib/freebl/ldvector.c.rej
...
[1088/1088] STAMP obj/nss_tests.actions_depends.stamp
ubuntu@oprom8:~/libs$ iv=nss/cmd/bltest/tests/chacha20_poly1305/iv1
ubuntu@oprom8:~/libs$ k=nss/cmd/bltest/tests/chacha20_poly1305/key1
ubuntu@oprom8:~/libs$ p=nss/COPYING
ubuntu@oprom8:~/libs$ bltest=dist/Release/bin/bltest
ubuntu@oprom8:~/libs$ export LD_LIBRARY_PATH=dist/Release/lib
ubuntu@oprom8:~/libs$
ubuntu@oprom8:~/libs$ m=chacha20_poly1305 #chacha20_ctr
ubuntu@oprom8:~/libs$ $bltest -E -m $m -i $p -o /dev/null -k $k -v $iv -5 10
# mode in opreps cxreps context op time(sec) thrgput
chacha20_poly1305_e 2Gb 451Mb 144T 0 0.000 10000.000 10.003 249Mb 860Kb
ubuntu@oprom8:~/libs$
ubuntu@oprom8:~/libs$ iv=nss/cmd/bltest/tests/aes_cbc/iv16
ubuntu@oprom8:~/libs$ m=chacha20_ctr
ubuntu@oprom8:~/libs$ $bltest -E -m $m -i $p -o /dev/null -k $k -v $iv -5 10
# mode in opreps cxreps context op time(sec) thrgput
chacha20_ctr_e 2Gb 925Mb 172T 0 0.000 10000.000 10.002 297Mb 312Kb
#### AFTER OPTIMIZATION PATCH
ubuntu@oprom8:~/libs/nss$ patch -p1 < ../nss.9139949.patch # chacha20?
2 out of 2 hunks FAILED -- saving rejects to file lib/freebl/Makefile.rej
...
[1034/1034] STAMP obj/nss_tests.actions_depends.stamp
ubuntu@oprom8:~/libs$ iv=nss/cmd/bltest/tests/chacha20_poly1305/iv1
ubuntu@oprom8:~/libs$ k=nss/cmd/bltest/tests/chacha20_poly1305/key1
ubuntu@oprom8:~/libs$ p=nss/COPYING
ubuntu@oprom8:~/libs$ bltest=dist/Release/bin/bltest
ubuntu@oprom8:~/libs$ export LD_LIBRARY_PATH=dist/Release/lib
ubuntu@oprom8:~/libs$
ubuntu@oprom8:~/libs$ m=chacha20_poly1305 #chacha20_ctr
ubuntu@oprom8:~/libs$ $bltest -E -m $m -i $p -o /dev/null -k $k -v $iv -5 10
# mode in opreps cxreps context op time(sec) thrgput
chacha20_poly1305_e 7Gb 364Mb 436T 0 0.000 10000.000 10.002 753Mb 102Kb
ubuntu@oprom8:~/libs$
ubuntu@oprom8:~/libs$ iv=nss/cmd/bltest/tests/aes_cbc/iv16
ubuntu@oprom8:~/libs$ m=chacha20_ctr
ubuntu@oprom8:~/libs$ $bltest -E -m $m -i $p -o /dev/null -k $k -v $iv -5 10
# mode in opreps cxreps context op time(sec) thrgput
chacha20_ctr_e 14Gb 862Mb 880T 0 0.000 10000.000 10.001 1Gb 495Mb
Assignee | ||
Comment 15•5 years ago
|
||
rebase to current master
Assignee | ||
Comment 16•5 years ago
|
||
rebase to current master
Reporter | ||
Comment 17•5 years ago
|
||
Hello, is there anyone that can review?
Thanks,
- JM
Comment 18•5 years ago
|
||
I appreciate the work to bring this optimized code into NSS. To really consider this patch we’re going to need the source that generated the .s file, which you’ll find we generally land in-tree for other assembly for future verification purposes. Given the complexity of the patch, I’d also prefer to review these in Phabricator rather than inside Bugzilla’s Splinter.
That aside, we are writing a policy governing community contributions of platform-specific/optimized code and will post that on the Mozilla wiki, and there we can make clear the requirements for these sorts of contributions. As-is, if the assembly is doing something nefarious, I would be hard-pressed to figure that out. I’ll leave my needinfo here to return with a drafted policy statement - and it might be a good idea to wait for further code changes until we have that.
Assignee | ||
Comment 19•5 years ago
|
||
To really consider this patch we’re going to need the source that generated the .s file
It is written by hand.
Assignee | ||
Comment 20•5 years ago
|
||
Assignee | ||
Comment 21•5 years ago
|
||
Hi, any updates?
Comment hidden (off-topic) |
Comment 23•5 years ago
|
||
Hi. The original submitter of this bug, John Martinez, left IBM. For now, I'll be the designated contact from IBM.
Comment 24•5 years ago
|
||
Hi jc. Has the platform-specific code policy been posted? I understand the problems of assembly language but it's currently the way we're able to maximize performance. Is there anything I can do from my side to help with this patch?
Comment 25•5 years ago
|
||
(In reply to George Wilson from comment #24)
Hi jc. Has the platform-specific code policy been posted? I understand the problems of assembly language but it's currently the way we're able to maximize performance. Is there anything I can do from my side to help with this patch?
Possibly your colleague David Edelsohn (https://github.com/edelsohn) has an idea about how to handle this months long "needinfo" status.
Comment 26•5 years ago
|
||
Hi all, JC accepted an offer from another company recently, so Kevin and I are gonna take over this discussion. This will happen after the holidays though. Because of many recent changes at Mozilla, the policy discussions have been delayed and I am gonna take that over in January as well. We’ll be in touch in January. Thanks for your patience and Happy holidays in the meantime !
Updated•5 years ago
|
Comment 27•5 years ago
|
||
(In reply to Benjamin Beurdouche [:beurdouche] from comment #26)
Hi all, JC accepted an offer from another company recently, so Kevin and I are gonna take over this discussion. This will happen after the holidays though. Because of many recent changes at Mozilla, the policy discussions have been delayed and I am gonna take that over in January as well. We’ll be in touch in January. Thanks for your patience and Happy holidays in the meantime !
As a long-term fan of both, Mozilla and Power, I am highly(!) disappointed by the way Mozilla processes this.
I would expect a far far more fluent contribution-process, and of course that a project respect the bounty-workers (which do not get payed for their hard work if their work is not merged).
There are situation where the contributor has a far higher knowledge-grade, so no reviewer can make a deep review (here, one needs to dig into hand-crafted assembler code). In such situation, one needs to trust the institution behind the work (here: IBM).
I believe that one can trust IBM.
Some more blocked issues:
https://bugzilla.mozilla.org/show_bug.cgi?id=1613236
https://bugzilla.mozilla.org/show_bug.cgi?id=1566124
Comment 28•5 years ago
|
||
That it came from IBM or not, this is code for an unsupported platform and it doesn't remotely have the level of quality that we expect from new code in NSS. Most new code for primitives in NSS is either machine generated or formally verified for functional correctness, secret independence and memory safety.
Mozilla does not have the resources to review and test new code for these unsupported platforms and I am don't think dumping code inside NSS without a serious plan is a good idea. That would increase our attack surface against the best interest of our users. If anything, a better plan would be to remove code for unsupported platforms, and everything that has been handwritten, even if we have to loose some performance there.
I am quite happy to be helpful and welcome lower priority contributions when we can, but there are times when it is tough, this is one of those.
I'll try to think about it again, but your comment did not serve these patches well and I would encourage gentle pings instead.
Comment 29•5 years ago
|
||
Hi Benjamin,
I got your point here and I see your concerns are reasonable but isn't it better to find solutions to these concerns than prohibit users the perfect experience when using products that depend on NSS? I'm not just talking about Mozilla products because I'm aware of many applications that use NSS as the underlying security library.
PowerPC isn't supported platform here but there are a lot of potential users who can benefit from these optimizations. I wrote an optimization of AES-GCM for PowerPC platform and looked at the code here, I can tell you the quality of these patches meets the code level of supported platforms. I understand the lack of resources to review and test both patches but to be honest Mozilla has trusted intel before at least from what I have seen in it AES-GCM contribution, Mozilla couldn't dig into algorithms and details intel has committed to the patch and it accepted, yes x86 is supported platform but the situation isn't that different here. IBM has qualified crypto engineers who contributed similar patches for libraries such OpenSSL and they will happily review the current patches here.
Also, @abebeos is a passionate guy and has many contributions for PowerPC platform on other libraries and definitely didn't mean to not gentle, he just worry about the contributor works and how it gonna add considerable value to the platform.
regards,
Mamone
Comment 30•5 years ago
|
||
(In reply to Benjamin Beurdouche [:beurdouche] from comment #28)
That it came from IBM or not, this is code for an unsupported platform and it doesn't remotely have the level of quality that we expect from new code in NSS. Most new code for primitives in NSS is either machine generated or formally verified for functional correctness, secret independence and memory safety.
I understand this argument, though I'm mostly arguing that IBM as a code-verifier can be trusted (see below).
Mozilla does not have the resources to review and test new code for these unsupported platforms and I am don't think dumping code inside NSS without a serious plan is a good idea.
Here's the plan:
- let bounty-workers to the "heavy lifting"
- let IBM verify the results (as it was already suggested here: https://bugzilla.mozilla.org/show_bug.cgi?id=1566124#c20)
That would increase our attack surface against the best interest of our users. If anything, a better plan would be to remove code for unsupported platforms, and everything that has been handwritten, even if we have to loose some performance there.
Then go on. Strip everything non-Intel away, state this clearly and prominently within the NSS repo ("Intel Only"), so everyone can save time.
(disclaimer: me exaggerating, to make my point...)
A possible alternative could be to say: "well, IBM, you need to provide the relevant (at least part-time) maintainer to make PowerPC a supported platform".
I am quite happy to be helpful and welcome lower priority contributions when we can, but there are times when it is tough, this is one of those.
Tough times? In the end, you're managers, so see to it that you delegate stuff out.
I'll try to think about it again, but your comment did not serve these patches well and I would encourage gentle pings instead.
Hey, this was actually my "gentle ping" tenor. Remember "diversity": persons have different expression styles (mine is usually clear of any diplomacy). Yes, I'm kind of brain-damaged, because diplomacy is needed much within today's OSS Project-landscape - even more than logic and abstractional/analytical abilities.
(disclaimer: I'm criticizing/suggesting "things", (here: project processes etc.), something that is clearly encouraged within the Mozilla Etiquette)
(and I'm an PowerPC&Mozilla fan, not related to IBM)
Comment 31•5 years ago
|
||
And I just saw this:
the enhanced algorithm already implemented and approved in GNU nettle library and it's now released in nettle version 3.7.
source: https://bugzilla.mozilla.org/show_bug.cgi?id=1566124#c19 (all this sounds like a decent plan, as requested)
Comment 32•5 years ago
•
|
||
Hi all,
Ok, as I mentioned before, I am happy to help here, but as I have a no cycles to dedicate, let's try to efficiently handle this patch first
and then look at the GCM patch from Mamone. I don't think we would take the assembly for Poly1305, but fortunately no patch was provided and there is vectorization support in progress on the HACL* side, so I am sure that this code will be pretty fast...
I had a brief look at the patches on Sunday and at first sight, these changes look ok, but I'd like to do proper review in Phabricator.
@aoeuh, could you please split this work in two phabricator patches, one for blapi, one for chacha20?
(you can push two commits with moz-phab submit
and it will separate in two stacked phabricator patches).
@George, once aoeuh is done, can you please make sure to have at least one or two person from IBM (or someone trusted) to:
- have another look at the code to make sure it is correct
- make sure this patch builds both with gyp (
build.sh
) and Makefiles on IBM platforms
Also, I'd like to make sure that some minimal help will be available from IBM if we hit issues, we won't have support for these machines in our CI, so we won't be able to test this. If we have reports that the code breaks something, for some reason, and there is no support, the only available option will be to revert that patch... :/ I am happy to discuss how we can see IBM engage more with us on supporting their platform in the long run, after we handle these changes.
When we are done with the changes, I'll push the changes to our CI to see if everything passes, it might require a bit more work but if not I'll land the changes to NSS.
Does that work for you all?
Heads up to you all and to @glaubitz who is handling separate changes there: https://bugzilla.mozilla.org/show_bug.cgi?id=1687164
(these seem pretty much disconnected for now, but we might collide at some point)
Comment 33•5 years ago
|
||
I will try to get my changes in for #1687164 this week. I'm on vacation for two weeks now so plenty of time :).
Comment 34•5 years ago
|
||
(In reply to Benjamin Beurdouche [:beurdouche] from comment #32)
Also, I'd like to make sure that some minimal help will be available from IBM if we hit issues, we won't have support for these machines in our CI, so we won't be able to test this. If we have reports that the code breaks something, for some reason, and there is no support, the only available option will be to revert that patch... :/ I am happy to discuss how we can see IBM engage more with us on supporting their platform in the long run, after we handle these changes.
How does CI system handle the platform-specific code? Does it run natively or by emulating through qemu? What I'm thinking is tweaking the CI system to support PowerPC specific-code, I already have experience with gitlab CI and I may help here if it's open to contribute.
Comment 35•5 years ago
|
||
The CI runs natively and its under the control of another team so I don't have access to these machines.
I will ask but I am pretty sure that the rule is to only keep supported platforms in the CI, which makes sense since we won't maintain that code...
Comment 36•5 years ago
|
||
(In reply to Mamone from comment #34)
[...]
How does CI system handle the platform-specific code? Does it run natively or by emulating through qemu? What I'm thinking is tweaking the CI system to support PowerPC specific-code, I already have experience with gitlab CI and I may help here if it's open to contribute.
My experience with this tells me that this is something that needs to be done by IBM. It looks like Mozilla opens up to contributions, so, in an ideal world, IBM would ensure to extend mozillas CI-system. This seems to be a separate issue/task.
Comment 37•5 years ago
|
||
Hi all, just catching up here. We can definitely help test and review submissions for Power. What are the requirements for a platform to be considered supported? NSS has run on Power and z for a long time - we've even sponsored successful Linux distro FIPS 140-2 validations of it on our platforms. What is needed for CI? I will discuss your requirements with my management.
Comment 38•5 years ago
|
||
As I mentioned before, I think we should first handle this patch, see how it goes, then we can have the other discussions. : )
Assignee | ||
Comment 39•5 years ago
|
||
Assignee | ||
Comment 40•5 years ago
|
||
Depends on D107220
Assignee | ||
Comment 41•5 years ago
|
||
rebased, added check for NSS_DISABLE_CRYPTO_VSX
Comment 42•5 years ago
|
||
Depends on D107221
Comment 43•5 years ago
|
||
https://hg.mozilla.org/projects/nss/rev/764124fddaa2a908ffa8222732d31829b71c357f
https://hg.mozilla.org/projects/nss/rev/4f7ba08bd991781a349083a5addda7216af4b160
https://hg.mozilla.org/projects/nss/rev/61e70233f80eeebb3eaf03a5a363f8204282ab4f
Comment 44•5 years ago
|
||
Thank you @aoeuh for the patch !
@Mamone, I'll look at https://bugzilla.mozilla.org/show_bug.cgi?id=1566124 next.
Assignee | ||
Comment 45•5 years ago
|
||
Thanks Benjamin for the review!
Updated•5 years ago
|
Description
•