Closed Bug 373108 Opened 17 years ago Closed 11 years ago

Implement AES Galois Counter Mode (GCM)

Categories

(NSS :: Libraries, enhancement, P2)

3.11.1
enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wtc, Assigned: rrelyea)

References

Details

(Whiteboard: FIPS)

Attachments

(15 files, 5 obsolete files)

161.08 KB, application/pdf
Details
805.35 KB, application/pdf
Details
19.72 KB, text/html
Details
756 bytes, patch
rrelyea
: review+
Details | Diff | Splinter Review
63.20 KB, patch
Details | Diff | Splinter Review
60.16 KB, patch
Details | Diff | Splinter Review
5.31 KB, patch
Details | Diff | Splinter Review
5.95 KB, patch
Details | Diff | Splinter Review
1.21 KB, patch
Details | Diff | Splinter Review
20.11 KB, patch
wtc
: review+
elio.maldonado.batiz
: review+
Details | Diff | Splinter Review
4.56 KB, patch
Details | Diff | Splinter Review
3.37 KB, patch
wtc
: review+
Details | Diff | Splinter Review
47.79 KB, patch
Details | Diff | Splinter Review
97.33 KB, patch
ryan.sleevi
: review+
Details | Diff | Splinter Review
1.38 KB, patch
ryan.sleevi
: review+
Details | Diff | Splinter Review
We need to start with proposing new PKCS #11 mechanisms
for AES Galois Counter Mode (GCM) to the Cryptoki working
group.

The Internet-Draft for Suite B cipher suites for TLS
(search for "draft-rescorla-tls-suiteb") specifies new cipher
suites that use AES in Galois Counter Mode (GCM).
Whiteboard: FIPS
There is an implementation of AES+GCM available for free, in both c code
and x86_64 assembler, which is regarded as very fast.  One report says it's
faster than OpenSSL's implementation.  It's available in two zip files,
one for the AES code, and a second one for the GCM stuff.  
See http://fp.gladman.plus.com/AES/

The major question in my mind is: is the license compatible? 
and if not, ca we get its author to contributed under MPL?
Blocks: FIPS2008
Priority: -- → P4
Target Milestone: --- → 3.13
removed from FIPS2009. will consider for future release.
No longer blocks: FIPS2008
Target Milestone: 3.13 → ---
Priority: P4 → P2
Target Milestone: --- → 3.14
I downloaded this PDF file from
ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-11/v2-20/pkcs-11v2-20a5d1.pdf

Note that this is a draft of an amendment.  It is not published on
the PKCS #11 website, but can be found with a web search for
"PKCS #11 CKM_AES_GCM".
Note: you can browse PKCS #11 v2.30 draft online at
http://www.cryptsoft.com/pkcs11doc/v230/

I downloaded this PDF file from the "PKCS #11 V2.30 mechanisms part 1"
link on the PKCS #11 website.  The download URL is
ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-11/v2-30/pkcs-11v2-30m1-d7.pdf

It is newer than PKCS #11 v2.20 Amendment 5 - Draft 1.  If they differ,
I think this document should have a higher precedence.
This adds GCM, CTR, and CTS as new modes to AES. I've implemented the new modes in a generic way, so that the internal AES worker function (which switches between hardware and software AES implementations) now also switches to various modes. This would allow us to pretty easily add GCM, CTR and CTS to any other block algorithm as well.

This patch includes fixes to the binary field multiply. Evidently these corner cases did not arise in binary field ECC (or they did and we just haven't tripped overthem since few compile NSS with binary field ECC).
Assignee: nobody → rrelyea
Status: NEW → ASSIGNED
This patch the interfaces specified in the PKCS #11 v2.3 Draft 7. There are some abiguity in the draft that has been resolved as follows:

AES_CTS - the spec does not describe the flavor of CTS (Kerberos, NIST, or Schneier). Since the spec references the NIST CTS document (which describes all three flavors), I presumed that the NIST version was the one it meant to have implemented.

AES_GCM - The spec implies that you will get a single MAC at the end of the full stream. The way AES_GCM is defined in both NIST and in the IETF, you need to have the MAC with each block you process on decrypt. I've implemented adding the MAC to the end of each block you write. In practice, AES_GCM can only be used in the 'one-shot' configuration (INIT/Encrypt INIT/Decrypt) much like RSA. In that mode both the spec and the implementation is identical.
Attached patch Tests for CTR and GCM (obsolete) — Splinter Review
This addes GCM, CTR, and CTS tests to the blapitest framework.

I've also updated the perftests (including fixing a bug in DSA). I wanted to see how GCM and CTR fair. I'll attach the results in the next post.

The upshot is on intel hardware, AES ECB/CBC are by far the fastest of our algorithms (2G), AES CTR come in around 200 M/sec (about half RC-4), and AES_GCM comes in at 25 M/sec (about twice 3des).

bob
Attachment #658653 - Flags: superreview?(wtc)
Attachment #658653 - Flags: review?(ryan.sleevi)
Attachment #658669 - Flags: superreview?(wtc)
Attachment #658669 - Flags: review?(ryan.sleevi)
Attachment #658671 - Flags: superreview?(wtc)
Attachment #658671 - Flags: review?(ryan.sleevi)
(In reply to Robert Relyea from comment #7)
> AES_GCM - The spec implies that you will get a single MAC at the end of the
> full stream.
>
> The way AES_GCM is defined in both NIST and in the IETF, you
> need to have the MAC with each block you process on decrypt.
> I've implemented adding the MAC to the end of each block you write.

Yes. The IETF document is RFC 5116, and TLS 1.2 (and presumably all other IETF documents that use AEAD ciphers) references RFC 5116 to define the format for TLS's AES-GCM cipher suites.

As I mentioned before, I think there is a problem with the draft PKCS#11 interface specification vs. the NIST/FIPS requirements for GCM implementations. The PKCS#11 draft interface specifies that the initialization vector must be provided by the application, beyond the control of the PKCS#11 token. But, the NIST Special Publication 800-38D says "In order to inhibit an unauthorized party from controlling or influencing the generation of IVs, GCM shall be implemented only within a cryptographic module that meets the requirements of FIPS Pub. 140-2. In particular, the cryptographic boundary of the module shall contain a “generation unit” that produces IVs according to one of the constructions in Sec. 8.2 above."

Maybe I am misreading this, but it implies to me that the PKCS#11 interface must make the IV parameter an *output* parameter, not an *input* parameter.

(In reply to Robert Relyea from comment #7)
> AES_GCM - The spec implies that you will get a single MAC at the end of the
> full stream. The way AES_GCM is defined in both NIST and in the IETF, you
> need to have the MAC with each block you process on decrypt. I've
> implemented adding the MAC to the end of each block you write. In practice,
> AES_GCM can only be used in the 'one-shot' configuration (INIT/Encrypt
> INIT/Decrypt) much like RSA. In that mode both the spec and the
> implementation is identical.

I agree with your interpretation of the spec. However, I am concerned about the performance and implementation requirements that this is likely to impose on libssl. For CBC mode, libssl makes a special effort to only init/finalize the HMAC and encryption a single time each, to minimize PKCS#11 overhead. I think it would be very useful to have an additional boolean parameter bMACPerChunk to the PKCS#11 interface definition that; when true, the token would cause each call to C_Encrypt to output the MAC and each call to C_Decrypt to verify the MAC, and when false, the token would work as you describe. Alternatively, a new function with these semantics, C_DigestEncryptUpdateBlock(), should be added to PKCS#11?
No, actually the IETF AEAD specifies the IV as provided by the application, at least for AES_GCM. NIST is basically silent. The IV must be unique for each block, however. This is specified in both.

The real issue was the decrypt is supposed to return no data if the GCM hash doesn't match. This means you need to include the GCM hash with the decrypt call. That is why the calls need to be one shot.

I remembered you meantioning the IV, so I specifically looked for it in the specs and found nothing preventing the existing API (which is actually required to implement AEAD).

bob

> I agree with your interpretation of the spec. However, I am concerned about the performance 
> and implementation requirements that this is likely to impose on libssl

yes, I thought about allowing something like:

C_EncryptInit()
C_EncryptUpdate()
C_EncryptInit()
C_EncryptUpdate()
C_EncryptInit()
C_EncryptUpdate()
C_EncryptFinal().

This would solve the peformance issue (basically the cost of setting up all the contexts) at the cost of significantly changing the expected definitions of PKCS #11 primitives. As encoded today, we should be able to do:

C_EncryptInit()
C_EncryptUpdate()
C_EncryptUpdate()
C_EncryptUpdate()

As long as we assume IV can simply be monotonically increasing, and that subsequent blocks would not have AAD information. I doubt both assumptions will likely be true, however.

I realized that in order to implement my previous idea, we would end up holding internal contexts over. We could do this for efficiency anyway. So the code would still be:

C_EncryptInit()
C_Encrypt()
C_EncryptInit()
C_Encrypt()

If we are calling on the same session, we can preserve the internal contexts between the C_Encrypt() and C_EncryptInit(). If we notice that the C_EncryptInit matches key, algorithm, and op of our previous call, we can keep reuse the old context (just resetting the GCM, and loading the new IV). This means we could get the same performance without violating the greater PKCS #11 spec.

NOTE: I didn't implement this because I wanted to see if we really did have a performance issue here on our hands. AES GCM is currently twice as fast as 3DES. For clients, there shouldn't be an issue, we may choose simplicity over overall performance. compared to network latency and certificate processing, the symmetric algorithms are no-ops. For servers, the matter is different (The CPU time spent on servicing 100+ SSL connections take away CPU needed to actually perform the requests), and it may well be worth our while to try to cache the contexts.

Anyway I convinced myself we weren't painted in the corner, but we really needed to have code that uses GCM before we can truly figure out where our best bang for the buck is in terms of optimizations.

> Alternatively, a new function with these semantics, C_DigestEncryptUpdateBlock(), 
> should be added to PKCS#11?

The block isn't an issue. The GCM block is included with the Encrypt block, and thus passed to the Encrypt block. In that case simply doing EncryptUpdate/DecryptUpdate is fine. The issue is each block may include AAD data and potentically a different IV. That is what requires the new Init each call.


bob
> Alternatively, a new function with these semantics, C_DigestEncryptUpdateBlock(), 
> should be added to PKCS#11?

Also, if we want to make such radical changes to the PKCS #11 spec, we would need to provide significant resources, like a new spec editter, as well as wait for such a change to get through the group. Using draft mechanisms seems a much more conservative way to go to get what we need implemented.

bob
(In reply to Robert Relyea from comment #11)
> NIST is basically silent.

What is your interpretation of the section of NIST Special Publication 800-38D I quoted in Comment 11?

It seems to be a lot more specific than the similar requirement for CBC mode ("Validation that an implementation of the CBC, CFB, or OFB mode conforms to this recommendation will typically include an examination of the procedures for assuring the unpredictability or uniqueness of the IV"), though I think the intent is the same, so perhaps whatever is considered good enough for CBC mode would be good enough for GCM mode.

> I remembered you meantioning the IV, so I specifically looked for it in the
> specs and found nothing preventing the existing API (which is actually
> required to implement AEAD).

To be clear, my thinking is that the IV should be an output parameter for Encrypt but an input parmaeter for Decrypt.

> C_EncryptInit()
> C_Encrypt()
> C_EncryptInit()
> C_Encrypt()
> 
> If we are calling on the same session, we can preserve the internal contexts
> between the C_Encrypt() and C_EncryptInit(). If we notice that the
> C_EncryptInit matches key, algorithm, and op of our previous call, we can
> keep reuse the old context (just resetting the GCM, and loading the new IV).
> This means we could get the same performance without violating the greater
> PKCS #11 spec.

Yes, that does sound reasonable. There would still be additional overhead over the current approach we're using for CBC mode. But, it is a good idea to try to reduce overhead (especially in Softoken) in general anyway, especially if you already have ideas on how to do it.

FWIW, I think the performance target should be for AES128-GCM cipher suites to be faster (and/or more power efficient) than AES128-CBC-SHA1 cipher suites (if not RC4-SHA1), for TLS. But, we don't have to get there all at once.
(In reply to Brian Smith (:bsmith) from comment #13)
> (In reply to Robert Relyea from comment #11)
> > NIST is basically silent.
> 
> What is your interpretation of the section of NIST Special Publication
> 800-38D I quoted in Comment 11?
> 
> It seems to be a lot more specific than the similar requirement for CBC mode
> ("Validation that an implementation of the CBC, CFB, or OFB mode conforms to
> this recommendation will typically include an examination of the procedures
> for assuring the unpredictability or uniqueness of the IV"), though I think
> the intent is the same, so perhaps whatever is considered good enough for
> CBC mode would be good enough for GCM mode.

I have to agree with Brian's reading of Section 9.1. In particular, it specifically calls out the IV as a critical security parameter:

2. The IV shall be a critical security parameter as defined in FIPS Pub. 140-2
until the authenticated encryption function is invoked with the IV.  Prior to this
invocation, the IV shall be provided the same protection as other critical
security parameters in a module that is validated to the requirements in FIPS Pub.
140-2.


As currently specified in Draft 7 of v2.30, having to supply the IV via pIv means that the parameter must come from outside the module boundary, which seems to run contrary to that. It seems that, to implement what's spec'c in SP 800-38D, PKCS#11 would need to let the IV be supplied similar to the (OAEP, PBKDF2) SALT_SOURCE_TYPEs.

Something like

typedef CK_ULONG CK_AES_GCM_PARAMS_IV_SOURCE_TYPE;
#define CKZ_IV_SPECIFIED 0x000000001 Array of CK_BYTE containing the IV to use.
#define CKZ_IV_DEVICE_RANDOM 0x00000002 The device will generate the IV

typedef struct CK_AES_GCM_PARAMS {
  CK_AES_GCM_PARAMS_IV_SOURCE_TYPE ivSource;
  CK_VOID_PTR ivSourceData;
  CK_ULONG ivSourceDataLen;
  CK_BYTE_PTR pAAD;
  CK_ULONG ulAADLen;
  CK_ULONG ulTagBits;
}


Of course, none of this may be necessary, if NIST has issued forth any implementation guidance that runs contrary to SP 800-38D.

It's worth noting that CNG (which also implements GCM) also *requires* that the IV/nonce be supplied by the caller ( http://msdn.microsoft.com/en-us/library/windows/desktop/cc562981(v=vs.85).aspx ), which also means it's traversing the security boundaries under the Microsoft crypto provider model.
(In reply to Ryan Sleevi from comment #14)
> It's worth noting that CNG (which also implements GCM) also *requires* that
> the IV/nonce be supplied by the caller (
> http://msdn.microsoft.com/en-us/library/windows/desktop/cc562981(v=vs.85).
> aspx ), which also means it's traversing the security boundaries under the
> Microsoft crypto provider model.

Their implementation is FIPS-140 certified, so perhaps my reading is too strict after all:
http://technet.microsoft.com/en-us/library/cc750357.aspx
http://csrc.nist.gov/groups/STM/cavp/documents/aes/aesval.html#1168
Here's Bob's early reply from email, which he suggested I copy into this bug:

> It seems to be a lot more specific than the similar requirement for CBC mode
> ("Validation that an implementation of the CBC, CFB, or OFB mode conforms to
> this recommendation will typically include an examination of the procedures for
> assuring the unpredictability or uniqueness of the IV"), though I think the
> intent is the same, so perhaps whatever is considered good enough for CBC mode
> would be good enough for GCM mode.

Unfortunately that has to be part of the underlying protocol a randomly
generated IV internally won't meet this requirement.

>> I remembered you meantioning the IV, so I specifically looked for it in the
>> specs and found nothing preventing the existing API (which is actually
>> required to implement AEAD).
>
> To be clear, my thinking is that the IV should be an output parameter for
> Encrypt but an input parmaeter for Decrypt.

The problem is that the IV may be specified by implicit parameters in
the protocol itself. It's not a random value, and probably shouldn't be.
(In reply to Brian Smith (:bsmith) from comment #15)
> (In reply to Ryan Sleevi from comment #14)
> > It's worth noting that CNG (which also implements GCM) also *requires* that
> > the IV/nonce be supplied by the caller (
> > http://msdn.microsoft.com/en-us/library/windows/desktop/cc562981(v=vs.85).
> > aspx ), which also means it's traversing the security boundaries under the
> > Microsoft crypto provider model.
> 
> Their implementation is FIPS-140 certified, so perhaps my reading is too
> strict after all:
> http://technet.microsoft.com/en-us/library/cc750357.aspx
> http://csrc.nist.gov/groups/STM/cavp/documents/aes/aesval.html#1168

Their GCM implementation is not certified. Only CCM, ECB, CBC, and CFB8.
(In reply to Brian Smith (:bsmith) from comment #16)
> Here's Bob's early reply from email, which he suggested I copy into this bug:
> 
> > It seems to be a lot more specific than the similar requirement for CBC mode
> > ("Validation that an implementation of the CBC, CFB, or OFB mode conforms to
> > this recommendation will typically include an examination of the procedures for
> > assuring the unpredictability or uniqueness of the IV"), though I think the
> > intent is the same, so perhaps whatever is considered good enough for CBC mode
> > would be good enough for GCM mode.
> 
> Unfortunately that has to be part of the underlying protocol a randomly
> generated IV internally won't meet this requirement.
> 
> >> I remembered you meantioning the IV, so I specifically looked for it in the
> >> specs and found nothing preventing the existing API (which is actually
> >> required to implement AEAD).
> >
> > To be clear, my thinking is that the IV should be an output parameter for
> > Encrypt but an input parmaeter for Decrypt.
> 
> The problem is that the IV may be specified by implicit parameters in
> the protocol itself. It's not a random value, and probably shouldn't be.

It just means that such a protocol wouldn't be usable in a FIPS Approved Mode of Operation then, AIUI. Or they carve out a special exemption for the protocol with separate validation of the implementation - much like was done for the TLS PRFs.

I'm not trying to argue that it makes sense. These things never do ;)
Comment on attachment 658653 [details] [diff] [review]
Add GCM, CTR, and CTS modes to AES

Tomorrow I will convert this patch and the softoken patch
to changelists in the Rietveld code review tool to
facilitate code review.
Ryan, thanks for pointing out the things I overlooked. I see on the AES FIPS validation page at NIST that they explicitly require the implementation to indicate whether the IV generation is internal or external, and if internal, whether a random or deterministic IV is constructed. But, they have validated modules that answered "external." And, the validation documentation for GCM mode shows different tests and requirements for external vs. internal IV generation: http://csrc.nist.gov/groups/STM/cavp/documents/mac/gcmvs.pdf (I also noticed that the security policy for at least one OpenSSL validation had to specify special handling for the case of a power failure for AES-GCM; the requirement to document this is also called out in SP800-38D.)

Perhaps it is better to just implement the current draft PKCS#11 mechanism now (since Bob already wrote the code for it) and then implement a new PKCS#11 mechanism if it is needed for FIPS validation later.
Comment on attachment 658653 [details] [diff] [review]
Add GCM, CTR, and CTS modes to AES

This patch can now be reviewed at
https://chromiumcodereview.appspot.com/10919163.
Comment on attachment 658669 [details] [diff] [review]
Softoken interface for AES_CTR AES_CTS and AES_GCM

This patch can now be reviewed at
https://chromiumcodereview.appspot.com/10917150.
(In reply to Robert Relyea from comment #7)
> Created attachment 658669 [details] [diff] [review]
> Softoken interface for AES_CTR AES_CTS and AES_GCM
> 
> This patch the interfaces specified in the PKCS #11 v2.3 Draft 7. There are
> some abiguity in the draft that has been resolved as follows:
> 
> AES_CTS - the spec does not describe the flavor of CTS (Kerberos, NIST, or
> Schneier). Since the spec references the NIST CTS document (which describes
> all three flavors), I presumed that the NIST version was the one it meant to
> have implemented.
> 
> AES_GCM - The spec implies that you will get a single MAC at the end of the
> full stream. The way AES_GCM is defined in both NIST and in the IETF, you
> need to have the MAC with each block you process on decrypt. I've
> implemented adding the MAC to the end of each block you write. In practice,
> AES_GCM can only be used in the 'one-shot' configuration (INIT/Encrypt
> INIT/Decrypt) much like RSA. In that mode both the spec and the
> implementation is identical.

Having read through the patches, I wanted to double check my understanding of your comment on AES_GCM.

When you say "each block you process on decrypt", you mean "each message" (eg: TLS record), correct? If so, using C_EncryptInit, C_EncryptUpdate, and C_EncryptFinal to collectively en/decrypt a "message" should be a correct interface, since the message will have an auth tag, etc. That is, C_EncryptUpdate can, should, and does handle multiple blocks (as in, bit sequences of size blocksize) correctly.

The concern (if I correctly understand) is that there is key expansion state that, ideally, would be able to be shared between multiple *messages*.

What state would need to be saved? Each message would have an independently reset IV, correct? As far as I can tell, the only inter-message cachable state is

H = E(K, zero_block)
plus any per-key tables we might use, as described in 4.1

Could we not cache those on the hKey CK_OBJECT_HANDLE (the one that will be passed to C_EncryptInit), since they really are specific to the key, and not the encryption operation? It seems the safest/correct approach anyways, since the IV may change between C_EncryptInit calls.

Have I missed more opportunities for caching?
(In reply to Ryan Sleevi from comment #23)
> When you say "each block you process on decrypt", you mean "each message"
> (eg: TLS record), correct? If so, using C_EncryptInit, C_EncryptUpdate, and
> C_EncryptFinal to collectively en/decrypt a "message" should be a correct
> interface, since the message will have an auth tag, etc. That is,
> C_EncryptUpdate can, should, and does handle multiple blocks (as in, bit
> sequences of size blocksize) correctly.

That is correct from a purity standpoint; that's how PKCS#11 is supposed to work. However, libssl itself does not need that constraint; we could easily make libssl call C_EncryptUpdate exactly once per record in TLS 1.1+, if we need to.

> What state would need to be saved? Each message would have an independently
> reset IV, correct? As far as I can tell, the only inter-message cachable
> state is
> 
> H = E(K, zero_block)
> plus any per-key tables we might use, as described in 4.1
> 
> Could we not cache those on the hKey CK_OBJECT_HANDLE (the one that will be
> passed to C_EncryptInit), since they really are specific to the key, and not
> the encryption operation?

Per-key caching is actually done in the FreeBL layer (e.g. AES round keys), IIRC.

> It seems the safest/correct approach anyways,
> since the IV may change between C_EncryptInit calls.
> 
> Have I missed more opportunities for caching?

It may also be the case that C_EncryptInit and/or C_EncryptFinal add too much overhead (for libssl) due to memory allocation and/or locking overhead. If you are dealing with an actual hardware token, then each PKCS#11 call could have high latency. I am not sure which of these concerns, if any, are still relevant to libssl's real-world performance, but libssl was written to minimize the number of C_EncryptInit/C_EncryptFinal calls that are made (there's one call to each function made per handshake, IIRC).
(In reply to Brian Smith (:bsmith) from comment #24)
> (In reply to Ryan Sleevi from comment #23)
> > When you say "each block you process on decrypt", you mean "each message"
> > (eg: TLS record), correct? If so, using C_EncryptInit, C_EncryptUpdate, and
> > C_EncryptFinal to collectively en/decrypt a "message" should be a correct
> > interface, since the message will have an auth tag, etc. That is,
> > C_EncryptUpdate can, should, and does handle multiple blocks (as in, bit
> > sequences of size blocksize) correctly.
> 
> That is correct from a purity standpoint; that's how PKCS#11 is supposed to
> work. However, libssl itself does not need that constraint; we could easily
> make libssl call C_EncryptUpdate exactly once per record in TLS 1.1+, if we
> need to.

I think I may have not been clear enough.

Currently, calling C_Encrypt (or C_EncryptInit), it forces a degree of GCM-specific key expansion, such as the calculation of H.

I originally understood Bob's comments to be a lament that such computation could not be cached between multiple (TLS records). However, Wan-Teh pointed out that the more fundamental concern is that, at the FreeBL layer, it doesn't support streaming operations at all - there is only a single _Encrypt or _Decrypt function, even though there are individual (FreeBL) contexts. As such, it means that the PKCS#11 layer cannot support _EncryptUpdate for CTS/GCM modes, since they both are contingent upon knowing when "the last block" is encountered - and that's only known at C_EncryptFinal.

CBC is the same way, but the softoken code is working around this limitation in freebl by handling the padding itself, which is how it can support C_EncryptUpdate calls for CBC & ECB calls.

I agree that, as far as libssl is concerned, it only needs to support C_Encrypt, since we'll have the full SSL record at the time we call. However, from a PKCS#11 implementation perspective, it seems like FreeBL needs to support streaming AES - just moving the softokn logic into FreeBL.

> 
> > What state would need to be saved? Each message would have an independently
> > reset IV, correct? As far as I can tell, the only inter-message cachable
> > state is
> > 
> > H = E(K, zero_block)
> > plus any per-key tables we might use, as described in 4.1
> > 
> > Could we not cache those on the hKey CK_OBJECT_HANDLE (the one that will be
> > passed to C_EncryptInit), since they really are specific to the key, and not
> > the encryption operation?
> 
> Per-key caching is actually done in the FreeBL layer (e.g. AES round keys),
> IIRC.

Yes, but my point was about GCM-specific functionality - namely, the computation of H, as well as any per-key bit tables that can be used to optimize the field multiplies.

This comment again related to my understanding that "Calling C_Encrypt (or C_EncryptInit+Update+Final) for GCM is more expensive, because it forces redundant operations to occur", which I think is still a fair and accurate statement. I think the 'solution' for this would be to ensure that the softokn layer can cache a key-specific GCM context on the *key* object (eg: without requiring an in-progress GCM operation), so that the IV-independent state can be cached between (SSL) messages.

> 
> > It seems the safest/correct approach anyways,
> > since the IV may change between C_EncryptInit calls.
> > 
> > Have I missed more opportunities for caching?
> 
> It may also be the case that C_EncryptInit and/or C_EncryptFinal add too
> much overhead (for libssl) due to memory allocation and/or locking overhead.
> If you are dealing with an actual hardware token, then each PKCS#11 call
> could have high latency. I am not sure which of these concerns, if any, are
> still relevant to libssl's real-world performance, but libssl was written to
> minimize the number of C_EncryptInit/C_EncryptFinal calls that are made
> (there's one call to each function made per handshake, IIRC).

Did you mean to say "per handshake" or "per record"? Perhaps I misunderstood, but C_EncryptInit/Update/Final is supposed to be "the same" as a simple call to C_Encrypt. I agree, avoiding Init/Final from the libssl layer is good, so that it avoids the PKCS#11 locks.

However, my comments were more focused on the relationship and performance between softoken and freebl, regardless of how it's being called from libssl.
(In reply to Ryan Sleevi from comment #25)
> CBC is the same way, but the softoken code is working around this limitation
> in freebl by handling the padding itself, which is how it can support
> C_EncryptUpdate calls for CBC & ECB calls.
> 
> I agree that, as far as libssl is concerned, it only needs to support
> C_Encrypt, since we'll have the full SSL record at the time we call.
> However, from a PKCS#11 implementation perspective, it seems like FreeBL
> needs to support streaming AES - just moving the softokn logic into FreeBL.

It seems like Softoken could buffer the inputs to C_EncryptUpdate until C_EncryptFinal is called, and then call the FreeBL function. I don't know which approach is better. It seems like adding streaming to FreeBL is probably more efficient. Adding buffering to Softoken makes the actual crypto code (in FreeBL) much simpler, which might be good because it seems like there may be multiple implementations of the AES-GCM code (one in C, and one or more assembler implementations), but such an approach may force some a maximum message size limitation that might not be good for non-TLS uses of AES-GCM. If we knew we would modify libssl to call C_Encrypt once per record, then I'd be inclined to just do whatever is simpler and not worry as much about the efficiency of C_EncryptUpdate as long as it works.

> I think the 'solution' for this would be to ensure that the
> softokn layer can cache a key-specific GCM context on the *key* object (eg:
> without requiring an in-progress GCM operation), so that the IV-independent
> state can be cached between (SSL) messages.

IV-independent state is similar to AES round keys, so why not just cache them right next to the cached round keys (in FreeBL), lazily initializing it on the first AES-GCM operation?

> > still relevant to libssl's real-world performance, but libssl was written to
> > minimize the number of C_EncryptInit/C_EncryptFinal calls that are made
> > (there's one call to each function made per handshake, IIRC).
> 
> Did you mean to say "per handshake" or "per record"? Perhaps I
> misunderstood, but C_EncryptInit/Update/Final is supposed to be "the same"
> as a simple call to C_Encrypt. I agree, avoiding Init/Final from the libssl
> layer is good, so that it avoids the PKCS#11 locks.

We call C_EncryptInit once during the handshake. Then, when encrypting each record, we call C_EncryptUpdate one or more times per record. Then we call C_EncryptFinal when we close the connection. See ssl3_InitPendingContextsPKCS11 (ssl3con.c) and PK11_CipherOp (pk11cxt.c). Note that we depend on C_EncryptUpdate outputing the ciphertext immediately without buffering for stream ciphers and we depend on it never caching more than (block_size - 1) for block ciphers.

Because this strategy won't work for GCM, will will have to modify libssl's approach, perhaps specifically for GCM.

> However, my comments were more focused on the relationship and performance
> between softoken and freebl, regardless of how it's being called from libssl.
(In reply to Brian Smith (:bsmith) from comment #26)
> It seems like Softoken could buffer the inputs to C_EncryptUpdate until
> C_EncryptFinal is called, and then call the FreeBL function.

Sorry: I was wrong; it can't buffer like that:

"11.2 Conventions for functions returning output in a variable-length buffer

[snip]

Cryptographic functions which return output in a variable-length buffer should always return as much output as can be computed from what has been passed in to them thus far. As an example, consider a session which is performing a multiple-part decryption operation with DES in cipher-block chaining mode with PKCS padding. Suppose that, initially, 8 bytes of ciphertext are passed to the C_DecryptUpdate function. The blocksize of DES is 8 bytes, but the PKCS padding makes it unclear at this stage whether the ciphertext was produced from encrypting a 0-byte string, or from encrypting some string of length at least 8 bytes. Hence the call to C_DecryptUpdate should return 0 bytes of plaintext. If a single additional byte of ciphertext is supplied by a subsequent call to C_DecryptUpdate,
then that call should return 8 bytes of plaintext (one full DES block)."

> Note that we depend on C_EncryptUpdate outputing the ciphertext immediately
> without buffering for stream ciphers and we depend on it never caching more
> than (block_size - 1) for block ciphers.

This part relies on the part of the spec quoted above.
(In reply to Brian Smith (:bsmith) from comment #26)
> (In reply to Ryan Sleevi from comment #25)
> > CBC is the same way, but the softoken code is working around this limitation
> > in freebl by handling the padding itself, which is how it can support
> > C_EncryptUpdate calls for CBC & ECB calls.
> > 
> > I agree that, as far as libssl is concerned, it only needs to support
> > C_Encrypt, since we'll have the full SSL record at the time we call.
> > However, from a PKCS#11 implementation perspective, it seems like FreeBL
> > needs to support streaming AES - just moving the softokn logic into FreeBL.
> 
> It seems like Softoken could buffer the inputs to C_EncryptUpdate until
> C_EncryptFinal is called, and then call the FreeBL function. I don't know
> which approach is better. It seems like adding streaming to FreeBL is
> probably more efficient. Adding buffering to Softoken makes the actual
> crypto code (in FreeBL) much simpler, which might be good because it seems
> like there may be multiple implementations of the AES-GCM code (one in C,
> and one or more assembler implementations), but such an approach may force
> some a maximum message size limitation that might not be good for non-TLS
> uses of AES-GCM. If we knew we would modify libssl to call C_Encrypt once
> per record, then I'd be inclined to just do whatever is simpler and not
> worry as much about the efficiency of C_EncryptUpdate as long as it works.

Looks like you found in the spec why this doesn't work.

Because both GenericStreamCipher and GenericBlockCipher depend on previous state (either the internal state or the stream cipher or the CBC-residue-as-the-IV), the Init-once-at-Startup is fine and correct.

For any GenericAEADCipher though, which depends on the explicit nonce, we'll need to call Init-per-record.

However, without modification to FreeBL, C_EncryptUpdate *does not* work as required by PKCS#11 for GCM or CTS.

> 
> > I think the 'solution' for this would be to ensure that the
> > softokn layer can cache a key-specific GCM context on the *key* object (eg:
> > without requiring an in-progress GCM operation), so that the IV-independent
> > state can be cached between (SSL) messages.
> 
> IV-independent state is similar to AES round keys, so why not just cache
> them right next to the cached round keys (in FreeBL), lazily initializing it
> on the first AES-GCM operation?

Correct. That's exactly what I'm suggesting we should do - if we're concerned about the performance of setting up H-et-al (and I think we should be).

But we don't do that, and with how the current GCM code is structured, presently can't do that (unless I've missed something).

> 
> > > still relevant to libssl's real-world performance, but libssl was written to
> > > minimize the number of C_EncryptInit/C_EncryptFinal calls that are made
> > > (there's one call to each function made per handshake, IIRC).
> > 
> > Did you mean to say "per handshake" or "per record"? Perhaps I
> > misunderstood, but C_EncryptInit/Update/Final is supposed to be "the same"
> > as a simple call to C_Encrypt. I agree, avoiding Init/Final from the libssl
> > layer is good, so that it avoids the PKCS#11 locks.
> 
> We call C_EncryptInit once during the handshake. Then, when encrypting each
> record, we call C_EncryptUpdate one or more times per record. Then we call
> C_EncryptFinal when we close the connection. See
> ssl3_InitPendingContextsPKCS11 (ssl3con.c) and PK11_CipherOp (pk11cxt.c).
> Note that we depend on C_EncryptUpdate outputing the ciphertext immediately
> without buffering for stream ciphers and we depend on it never caching more
> than (block_size - 1) for block ciphers.
> 
> Because this strategy won't work for GCM, will will have to modify libssl's
> approach, perhaps specifically for GCM.

Right, for any AEAD cipher suite (that is, anything using the GenericAEADCipher construct), we'll (unfortunately) need to be calling C_Encrypt[Init/Update/Final] or C_Encrypt once-per-record, since each record needs to be authenticated (that is, the additional_data from 6.2 of RFC 5246 needs to be verified).

That we haven't had to do so yet is because softokn is doing the CBC padding itself, and relied on the underlying method being CKM_AES_CBC. Had we used CKM_AES_CBC_PAD, we would have run into this same issue, since _PAD will not ("should not") compute the padding until C_EncryptFinal.

> 
> > However, my comments were more focused on the relationship and performance
> > between softoken and freebl, regardless of how it's being called from libssl.
Standardize on the same format of comment used on the other functions
in this file.
Attachment #661363 - Flags: review?(rrelyea)
Freebl and softoken patches checked in:

Checking in lib/freebl/blapii.h;
/cvsroot/mozilla/security/nss/lib/freebl/blapii.h,v  <--  blapii.h
new revision: 1.5; previous revision: 1.4
done
Checking in lib/freebl/blapit.h;
/cvsroot/mozilla/security/nss/lib/freebl/blapit.h,v  <--  blapit.h
new revision: 1.30; previous revision: 1.29
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/ctr.c,v
done
Checking in lib/freebl/ctr.c;
/cvsroot/mozilla/security/nss/lib/freebl/ctr.c,v  <--  ctr.c
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/ctr.h,v
done
Checking in lib/freebl/ctr.h;
/cvsroot/mozilla/security/nss/lib/freebl/ctr.h,v  <--  ctr.h
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/cts.c,v
done
Checking in lib/freebl/cts.c;
/cvsroot/mozilla/security/nss/lib/freebl/cts.c,v  <--  cts.c
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/cts.h,v
done
Checking in lib/freebl/cts.h;
/cvsroot/mozilla/security/nss/lib/freebl/cts.h,v  <--  cts.h
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/gcm.c,v
done
Checking in lib/freebl/gcm.c;
/cvsroot/mozilla/security/nss/lib/freebl/gcm.c,v  <--  gcm.c
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/gcm.h,v
done
Checking in lib/freebl/gcm.h;
/cvsroot/mozilla/security/nss/lib/freebl/gcm.h,v  <--  gcm.h
initial revision: 1.1
done
Checking in lib/freebl/manifest.mn;
/cvsroot/mozilla/security/nss/lib/freebl/manifest.mn,v  <--  manifest.mn
new revision: 1.65; previous revision: 1.64
done
Checking in lib/freebl/rijndael.c;
/cvsroot/mozilla/security/nss/lib/freebl/rijndael.c,v  <--  rijndael.c
new revision: 1.28; previous revision: 1.27
done
Checking in lib/freebl/rijndael.h;
/cvsroot/mozilla/security/nss/lib/freebl/rijndael.h,v  <--  rijndael.h
new revision: 1.13; previous revision: 1.12
done
Checking in lib/freebl/mpi/mp_gf2m-priv.h;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mp_gf2m-priv.h,v  <--  mp_gf2m-priv.h
new revision: 1.4; previous revision: 1.3
done
Checking in lib/freebl/mpi/mp_gf2m.c;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mp_gf2m.c,v  <--  mp_gf2m.c
new revision: 1.7; previous revision: 1.6
done
Checking in lib/softoken/pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.183; previous revision: 1.182
done
Checking in lib/softoken/pkcs11c.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11c.c,v  <--  pkcs11c.c
new revision: 1.131; previous revision: 1.130
done
Checking in lib/softoken/pkcs11i.h;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11i.h,v  <--  pkcs11i.h
new revision: 1.61; previous revision: 1.60
done
Checking in lib/softoken/pkcs11u.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v  <--  pkcs11u.c
new revision: 1.84; previous revision: 1.83
done
Checking in lib/util/pkcs11t.h;
/cvsroot/mozilla/security/nss/lib/util/pkcs11t.h,v  <--  pkcs11t.h
new revision: 1.23; previous revision: 1.22
done
This patch was from:

http://codereview.chromium.org/10919163/
Attachment #658653 - Attachment is obsolete: true
Attachment #658653 - Flags: superreview?(wtc)
Attachment #658653 - Flags: review?(ryan.sleevi)
Patch comes from http://codereview.chromium.org/10917150/
Attachment #658669 - Attachment is obsolete: true
Attachment #658669 - Flags: superreview?(wtc)
Attachment #658669 - Flags: review?(ryan.sleevi)
This patch makes three changes.

1. Rename CK_AES_GCM_PARAMS to CK_GCM_PARAMS and CK_AES_CCM_PARAMS
to CK_CCM_PARAMS, because these are the names used in the v2.30 draft.

2. Remove the GCM code for 64-bit block ciphers. I've checked with
David McGrew about this by email.

3. Do not expose CKM_AES_CTS from the softoken. The CTS code hasn't
been tested, and CTS_DecryptUpdate has at least two bugs.
Attachment #666397 - Flags: review?(rrelyea)
Bob,

I strongly suggest that we not expose CKM_AES_GCM from the
softoken until the NSS team has come to agree on whether the
GCM authentication tag should be returned in C_EncryptUpdate
or C_EncryptFinal.

I think the CKM_AES_GCM specification in PKCS #11 v2.30 draft
is not clear about this point. We should either find the
original author of the CKM_AES_GCM spec, or study existing
implementations. I found only one existing implementation of
CKM_AES_GCM by doing web searches: OpenSolaris.

http://src.opensolaris.org/source/s?refs=aes_encrypt_update&project=onnv

If you browse the related functions, you will see that they
output the GCM authentication tag in the "final" function
rather than the "update" function. (It took me a while to
understand the structure of the OpenSolaris AES GCM code,
but it can be done.)
This is purely a cleanup. The current ctr.c code is correct.
This patch changes it to manipulate ctr->bufPtr in the same
way ghash->bufLen is manipulated in gcm.c. I find it a little
easier to understand and the consistency between the two files
would also be good.
This patch fixes a bug wtc found in the AES_CTS decrypt code, as well and includes a set of tests for AES_CTS. The tests are taken from the kerberos RFC, but have been modified to be CS-1 rather than CS-3.
Comment on attachment 661363 [details] [diff] [review]
Add comment to intel_aes_decrypt_cbc_128 to mark the hardcoded offsets of iv and expandedKey

r+, though I would prefer to have just pure C comments, it's not the only c++ comments in this file, so that makes my objection just a nit.
Attachment #661363 - Flags: review?(rrelyea) → review+
Comment on attachment 666397 [details] [diff] [review]
Rename CK_AES_GCM_PARAMS to CK_GCM_PARAMS, remove GCM code for 64-bit block ciphers, do not expose CKM_AES_CTS

r- for the patch overall because too many things are lumped in it...


1) r+ for the gcm.c changes, which includes both the parameter rename and the removal of the 64 bit GCM.
2) r- for the pkcs11.c and pkcs11c.c removal of CTS.... simply because I have a patch which addresses the cts issues (including tests).
3) r+ for the pkcs11t.h changes for GCM and CCM params and updated comments.

You can check in the r+'s portions of the patch.

bob
Attachment #666397 - Flags: review?(rrelyea) → review-
Comment on attachment 667706 [details] [diff] [review]
Patch for AES_CTS.

cts patch (I thought I had already set the review flag).
Attachment #667706 - Flags: review?(wtc)
Comment on attachment 667706 [details] [diff] [review]
Patch for AES_CTS.

Volunteering a somewhat mechanistic review of this patch.
Attachment #667706 - Flags: review?(emaldona)
Comment on attachment 667706 [details] [diff] [review]
Patch for AES_CTS.

Review of attachment 667706 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc. I will review CTS_DecryptUpdate carefully later.
You can check in this patch first. I suggest a small
change below.

::: lib/freebl/cts.c
@@ +115,4 @@
>      if (rv != SECSuccess) {
>  	return SECFailure;
>      }
> +    *outlen = fullblocks; /* AES low level doesn't set outlen */

We should fix the AES low level some time. (I already filed
bug 791875.)

@@ +219,3 @@
>  	/* we know inbuf == outbuf now, inbuf is declared const and can't
>  	 * be the target, so use outbuf for the target here */
> +	memcpy(outbuf+inlen-pad, inbuf+inlen-pad-blocksize, pad);

Nit: -pad-blocksize => -blocksize-pad

This matches the order used on the next line.
Attachment #667706 - Flags: review?(wtc) → review+
Comment on attachment 667706 [details] [diff] [review]
Patch for AES_CTS.

Review of attachment 667706 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/freebl/cts.c
@@ +219,4 @@
>  	/* we know inbuf == outbuf now, inbuf is declared const and can't
>  	 * be the target, so use outbuf for the target here */
> +	memcpy(outbuf+inlen-pad, inbuf+inlen-pad-blocksize, pad);
> +	memcpy(outbuf+inlen-blocksize-pad,     lastBlock,       blocksize);

One more nit: please remove the extra spaces after the commas.
You used them originally to align the arguments to the three
memcpy calls. But you are not doing that any more.
Comment on attachment 667706 [details] [diff] [review]
Patch for AES_CTS.

r+ as far as cts.c goes only. I'm still going through aes-cts-type-1-vectors.txt
Attachment #667706 - Flags: review?(emaldona) → review+
Checked in the changes that Bob approved in comment 40 on the
NSS trunk (NSS 3.14).

Checking in lib/freebl/gcm.c;
/cvsroot/mozilla/security/nss/lib/freebl/gcm.c,v  <--  gcm.c
new revision: 1.2; previous revision: 1.1
done
Checking in lib/util/pkcs11t.h;
/cvsroot/mozilla/security/nss/lib/util/pkcs11t.h,v  <--  pkcs11t.h
new revision: 1.24; previous revision: 1.23
done
Attachment #666397 - Attachment is obsolete: true
r+ also for aes-cts-type-1-vectors.txt. Thank you Bob for your online clarifications on the puzzling rotated byte sequences due to:
# Original Test vectors for AES CTS-3 (Kerberos) having been modified for AES CTS-1 (NIST)
> We should fix the AES low level some time. (I already filed bug 791875.)

To be clear, the 'low level' is the assembler code, which always returns blocksize date. Any real callers (including callers to the AES blapi interface) will get the right thing as this value is set in the C code before the low level AES function is called.

bob
Explanation of patch:

1) The variable 'multi' is used to determine whether or not the algorithm is 'streamable'. C_XXXXUpdate calls will fail to get the context if the context has multi set to PR_FALSE.

2) In one case multi was used as a synonym for RSA. I've updated this case to explicitly look at the new RSA flag.

3) With the above change, all that is now necessary is to set multi to false in the AES_GCM case.

bob
Attachment #670556 - Flags: review?(wtc)
Comment on attachment 670556 [details] [diff] [review]
Patch to turn off C_XXXXUpdate for AES_GCM

Review of attachment 670556 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc tentatively. I think the new 'rsa' field of SFTKSessionContext
is not ideal because you are not setting it to true for the RSA
mechanisms for Sign and Verify. So it is confusing. I suggest a way
to avoid this, but I am not sure if I didn't overlook anything.

::: lib/softoken/pkcs11c.c
@@ +429,3 @@
>      context->cipherInfo = NULL;
>      context->hashInfo = NULL;
>      context->doPad = PR_FALSE;

We should initialize context->maxLen to 0 in this function.

@@ +1007,5 @@
>      if (crv != CKR_OK) return crv;
>  
>      if (!pEncryptedData) {
> +	*pulEncryptedDataLen = context->rsa ? context->maxLen :
> +		ulDataLen + 2 * context->blockSize;

Here we can say
        *pulEncryptedDataLen = context->maxLen ? context->maxLen :
                ulDataLen + 2 * context->blockSize;

@@ -1002,5 @@
>      if (crv != CKR_OK) return crv;
>  
>      if (!pEncryptedData) {
> -	*pulEncryptedDataLen = context->multi ? 
> -		ulDataLen + 2 * context->blockSize : context->maxLen;

context->multi is also set to false for CKM_NETSCAPE_AES_KEY_WRAP.
Can you confirm that CKM_NETSCAPE_AES_KEY_WRAP can't be used with
NSC_Encrypt? We seem to allow that because its flags are
CKF_EN_DE_WR_UN.

::: lib/softoken/pkcs11i.h
@@ +249,4 @@
>  struct SFTKSessionContextStr {
>      SFTKContextType	type;
>      PRBool		multi; 		/* is multipart */
> +    PRBool		rsa; 		/* is rsa */

You are not setting the 'rsa' field to PR_TRUE
in all the places where we use RSA mechanisms.

We can avoid this problem by using 'maxLen' instead.
Attachment #670556 - Flags: review?(wtc) → review+
> Here we can say
>        *pulEncryptedDataLen = context->maxLen ? context->maxLen :
>               ulDataLen + 2 * context->blockSize;

I'm pretty sure RSA isn't the only algorithm that sets maxLen.

> You are not setting the 'rsa' field to PR_TRUE
> in all the places where we use RSA mechanisms.

I think I'd rather fix this instead, rather than overloading maxLen (which is how we got into trouble with multi).


> context->multi is also set to false for CKM_NETSCAPE_AES_KEY_WRAP.
> Can you confirm that CKM_NETSCAPE_AES_KEY_WRAP can't be used with
> NSC_Encrypt? We seem to allow that because its flags are
> CKF_EN_DE_WR_UN.

ulDatalen + 2*context->blockSize is a perfectly acceptable return for the max size an NSC_Encrypt call can make with CKM_NETSCAPE_AES_KEY_WRAP.

bob
Incorporate wtc's comments as follows:

1) set maxLen to '0' in the context initializer.
2) set rsa to true in the other cases (C_Sign/C_Verify/C_VerifyRecover).
Attachment #670556 - Attachment is obsolete: true
Attachment #670894 - Flags: review?(wtc)
Comment on attachment 670894 [details] [diff] [review]
Rev 2: Patch to turn off C_XXXXUpdate for AES_GCM - incorproate wtc's comments.

Review of attachment 670894 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc.
Attachment #670894 - Flags: review?(wtc) → review+
Checking in lib/softoken/pkcs11c.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11c.c,v  <--  pkcs11c.c
new revision: 1.132; previous revision: 1.131
done
Checking in lib/softoken/pkcs11i.h;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11i.h,v  <--  pkcs11i.h
new revision: 1.62; previous revision: 1.61
done
Hello all - 

This patch offers an efficient implementations of AES-GCM. 
The implementation uses Intel's AES and PCLMULQDQ instructions (AES-NI), and is
designed for the current (and future) Intel Core Processors, with the AVX 
instruction set (the 2nd, the 3rd, and the (future) 4th Generation Intel Core).

The algorithms and methods that underlie this code are detailed in references 
[1-4]: 

[1] Shay Gueron, Michael E. Kounavis: Intel® Carry-Less Multiplication 
    Instruction and its Usage for Computing the GCM Mode (Rev. 2.01)
    http://software.intel.com/sites/default/files/article/165685/clmul-wp-rev-2.01-2012-09-21.pdf
[2] S. Gueron, M. E. Kounavis: Efficient Implementation of the Galois Counter
    Mode Using a Carry-less Multiplier and a Fast Reduction Algorithm. 
    Information Processing Letters 110: 549–553 (2010).
[3] S. Gueron: AES Performance on the 2nd Generation Intel® Core™ Processor
    Family (to be posted) (2012).
[4] S. Gueron: Fast GHASH computations for speeding up AES-GCM (to be 
    published) (2012).
     

The patch assumes NSS version 3.14 RC0 

The performance 
===============

The performance of this patch was measured by using the built-in bltest utility
as follows:
bltest -E -m aes_gcm -p 10000 -o ct.txt -g 16 -b 16384 
(measuring authenticated encryption performance for a 16KB buffer, repeated 
10000 times)


                                          NSS 3.14 RC0   This patch   Speedup
----------------------------------------
Performance in GB/sec:
----------------------------------------
Core i7-2600K @3.4GHz ("Sandy Bridge")    0.057 GB/sec    1.17 GB/sec  20.5x
Core i7-3770 @3.4GHz  ("Ivy Bridge")      0.059 GB/sec    1.19 GB/sec  20.2x
----------------------------------------
Performance in Cycles per byte (C/B):
----------------------------------------
Core i7-2600K @3.4GHz ("Sandy Bridge")      55.42 C/B      2.70 C/B    20.5x
Core i7-3770 @3.4GHz  ("Ivy Bridge")        53.67 C/B      2.66 C/B    20.2x
----------------------------------------


Comments:
=========

This implementation is designed for the Intel(R) Core(TM) processors that 
support AES and PCLMULQDQ instructions, as well as the AVX instructions set.
(It also works, with high performance, on AMD Bulldozer processor that has 
these instructions).
Its AES-GCM performance is the fastest TLS authenticated encryption combination 
that is provided by the NSS library.

The patch integrates seamlessly with the existing NSS implementation: it 
selects the appropriate code path by checking the CPUID bits to detect AES-NI, 
PCLMULQDQ, and AVX support. For processors that do not support these 
instructions, the implementation resorts to the original GCM code.

The code was tested on NSS version 3.14 RC0.

Thanks to Wan-Teh Chang, Bob Relyea, Brian Smith, and Eric Rescorla for 
preliminary discussions on the patch and its integration. 


Developers and authors:
************************************************************************
Shay Gueron (1, 2), and Vlad Krasnov (1)

(1) Intel Corporation, Israel Development Center, Haifa, Israel
(2) University of Haifa, Israel
************************************************************************
Copyright(c) 2012, Intel Corp.
Attachment #673021 - Flags: review+
Marked the bug fixed in NSS 3.14.

Shay: thank you very much for contributing a patch to use the Intel's AES
and PCLMULQDQ instructions (AES-NI) and the Advanced Vector Extension (AVX)
architecture. Since NSS 3.14 has been released, I moved your patch to a new
bug (bug 805604).
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment on attachment 673021 [details] [diff] [review]
Efficient AES-GCM implementation that uses Intel's AES and PCLMULQDQ instructions (AES-NI), and the Advanced Vector Extension (AVX) architecture.

Moving to review ? 

Thanks for the patch shay. I'll review it shortly.

The two things that have to happen before this bug gets closed, in my opinion is: 1) the tests need to be checked in (they are still waiting on a review), and we should also pick up this patch once it's approved.

I suggest we track all other issues as separate bugs.

bob
Attachment #673021 - Flags: review+ → review?(rrelyea)
OK wtc has moved this patch to another bug (bug 805604). I'll go with that, but I'm going to reopen this bug because the tests are still not in....:(
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 673021 [details] [diff] [review]
Efficient AES-GCM implementation that uses Intel's AES and PCLMULQDQ instructions (AES-NI), and the Advanced Vector Extension (AVX) architecture.

removing review, This patch will be reviewed at bug 805604.
Attachment #673021 - Flags: review?(rrelyea)
Bob, ideally we should open a new bug for the tests. Otherwise a
Bugzilla query for bugs fixed in NSS 3.14 won't return this bug,
and we won't be able to easily tell which part is in NSS 3.14 and
which part is in NSS 3.14.1.

If you want to reduce Bugzilla overhead and just use this bug to
finish the tests, that's fine by me.
Well since the patch is already attached to this bug, I'd prefer to just finish it on this bug.

bob
Comment on attachment 661363 [details] [diff] [review]
Add comment to intel_aes_decrypt_cbc_128 to mark the hardcoded offsets of iv and expandedKey

Patch checked in on the NSS trunk (NSS 3.14.1).

Checking in intel-aes.s;
/cvsroot/mozilla/security/nss/lib/freebl/intel-aes.s,v  <--  intel-aes.s
new revision: 1.8; previous revision: 1.7
done
Comment on attachment 658671 [details] [diff] [review]
Tests for CTR and GCM

Review of attachment 658671 [details] [diff] [review]:
-----------------------------------------------------------------

I will take another pass in the AM, just to make sure I haven't missed anything. This looks like a much simpler change than the one I was thinking of - unless I'm perhaps missing some files.

::: cmd/bltest/blapitest.c
@@ +818,5 @@
>  
>  PRBool
> +is_authCipher(bltestCipherMode mode)
> +{
> +    /* change as needed! */

I'm not sure I fully understand this comment. Isn't all code "change as needed"? :)

@@ +867,2 @@
>  #endif
> +        mode == bltestAES_CBC || mode == bltestAES_CTS || 

unnecessary whitespace @ eol

@@ +1345,5 @@
>      switch (cipherInfo->mode) {
>      case bltestAES_ECB:	    minorMode = NSS_AES;	  break;
>      case bltestAES_CBC:	    minorMode = NSS_AES_CBC;	  break;
> +    case bltestAES_CTS:	    minorMode = NSS_AES_CTS;	  break;
> +    case bltestAES_CTR:	    

unnecessary whitespace @ eol

@@ +1519,4 @@
>      return SECSuccess;
>  }
>  
> +SECStatus

wtc@ fixed this back in 1.71 - http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/cmd/bltest/blapitest.c&rev=1.71&mark=1462#1462

@@ +2080,5 @@
> +    case bltestAES_CTR:
> +    case bltestAES_GCM:
> +	outlen = cipherInfo->input.pBuf.len;
> +	if (cipherInfo->mode == bltestAES_GCM && encrypt) {
> +	    outlen += 16;

nit: Add a comment that this is for the authentication tag.

@@ +2616,4 @@
>      default:
>  	return SECFailure;
>      }
> +    return rv;

I'm not sure I understand the motivation for this change?

@@ +2880,5 @@
>      int index = 0;
>  #endif
>      switch (mode) {
> +    case bltestAES_GCM:
> +	sprintf(filename, "%s/tests/%s/%s%d", testdir, modestr, "aad", j);

Why is this function not using PR_snprintf?

@@ +3602,4 @@
>      }
>      RNG_SystemInfoForRNG();
>  
> +

Unnecessary newline?

@@ +3953,1 @@
>          

extraneous whitespace?

::: cmd/bltest/tests/aes_ctr/aes_ctr_1.txt
@@ +23,5 @@
> +Block #4={
> +Input Block=f0f1f2f3f4f5f6f7f8f9fafbfcfdff02 
> +Output Block=b9e783b30dd7924ff7bc9b97beaa8740 
> +Plaintext=f69f2445df4f9b17ad2b417be66c3710 
> +Ciphertext=4f78a7f6d29809585a97daec58c6b050

What's up with the EOLs here? Is this in the canonical tests?

::: cmd/bltest/tests/aes_gcm/test_source.txt
@@ +1,2 @@
> +#  AppendixB AES Test Vectors
> +#  From "The Galois/Counter Mode of Operation (GCM)", David A McGree & John Viega,

Copy-pasta typo - McGrew

@@ +18,5 @@
> +IV=000000000000000000000000
> +H=66e94bd4ef8a2c3b884cfa59ca342b2e
> +Y0=00000000000000000000000000000001
> +E(K,Y0)=58e2fccefa7e3061367f1d57a4e7455a
> +len(A)||len(C)=00000000000000000000000000000000 

What's up with the EOL whitespace here and on 17?
Comment on attachment 658671 [details] [diff] [review]
Tests for CTR and GCM

Review of attachment 658671 [details] [diff] [review]:
-----------------------------------------------------------------

r-, in that in applying this patch locally, I'm getting coredumps with double frees. I'm trying to get a more meaningful backtrace.

A few further comments below.

::: cmd/bltest/blapitest.c
@@ +867,4 @@
>  #endif
> +        mode == bltestAES_CBC || mode == bltestAES_CTS || 
> +	mode == bltestAES_CTR || mode == bltestAES_GCM ||
> +	mode == bltestCAMELLIA_CBC || mode == bltestSEED_CBC)

It turns out that the indentation here is mixed as well between tabs and spaces. You should use one consistently (I believe the answer is spaces here)

@@ +1338,5 @@
>      PRIntervalTime time1, time2;
> +    unsigned char *params;
> +    int len;
> +    CK_AES_CTR_PARAMS ctrParams;
> +    CK_AES_GCM_PARAMS gcmParams;

In CVS HEAD, these are called CK_GCM_PARAMS and CK_CCM_PARAMS - so this doesn't compile.

@@ +2080,5 @@
> +    case bltestAES_CTR:
> +    case bltestAES_GCM:
> +	outlen = cipherInfo->input.pBuf.len;
> +	if (cipherInfo->mode == bltestAES_GCM && encrypt) {
> +	    outlen += 16;

Should this be AES_BLOCK_SIZE, rather than 16, considering that above in init you set .ulTagBits to be blocklen (AES_BLOCK_SIZE) * 8?

@@ +2434,5 @@
>      if (is_symmkeyCipher(cipherInfo->mode)) {
>          const unsigned char *input = cipherInfo->input.pBuf.data;
> +        unsigned int inputLen = is_authCipher(cipherInfo->mode) ?
> +		 cipherInfo->input.pBuf.len :
> +		 PR_MIN(cipherInfo->input.pBuf.len, 16);

Should this be tab or space aligned? The rest of this function uses spaces for such alignments.

@@ +2439,3 @@
>          unsigned char *output = cipherInfo->output.pBuf.data;
>          unsigned int outputLen = maxLen;
> +	unsigned int totalOutputLen = 0;

The rest of this function is aligned by spaces, but you use a tab here

@@ +2443,5 @@
>          rv = (*cipherInfo->cipher.symmkeyCipher)(cipherInfo->cx,
>                                                   output, &len, outputLen,
>                                                   input, inputLen);
>          CHECKERROR(rv, __LINE__);
> +	totalOutputLen += len;

and here

@@ +2458,2 @@
>          }
> +	cipherInfo->output.pBuf.len = totalOutputLen;

And these two places as well.

@@ +3586,5 @@
>      progName = progName ? progName+1 : argv[0];
>  
> +    rv = NSS_InitializePRErrorTable();
> +    if (rv != SECSuccess) {
> +    	SECU_PrintPRandOSError(progName);

You use a combination of spaces and tabs for aligning this conditional. You should use tab consistently.
Attachment #658671 - Flags: review?(ryan.sleevi) → review-
Comment on attachment 658671 [details] [diff] [review]
Tests for CTR and GCM

Review of attachment 658671 [details] [diff] [review]:
-----------------------------------------------------------------

The source of the double free:

http://mxr.mozilla.org/security/source/security/nss/lib/freebl/rijndael.c?mark=1113-1117,1180-1185#1113

AES_InitContext calls AES_DestroyContext if aes_InitContext fails, while AES_CreateContext calls AES_DestroyContext if AES_InitContext fails. This was introduced by your previous GCM patch.
Attachment #658671 - Attachment is obsolete: true
Attachment #658671 - Flags: superreview?(wtc)
Attachment #705423 - Flags: review?
Attachment #705423 - Flags: review? → review?(ryan.sleevi)
Attachment #705579 - Flags: review?(ryan.sleevi)
Attachment #705579 - Flags: review?(ryan.sleevi) → review+
Test patches are now in, closing bug.
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
Comment on attachment 705423 [details] [diff] [review]
Tests for CTR and GCM V2

Review of attachment 705423 [details] [diff] [review]:
-----------------------------------------------------------------

This was r+, as discussed on IRC.

::: cmd/bltest/blapitest.c
@@ +1357,5 @@
>      switch (cipherInfo->mode) {
>      case bltestAES_ECB:	    minorMode = NSS_AES;	  break;
>      case bltestAES_CBC:	    minorMode = NSS_AES_CBC;	  break;
> +    case bltestAES_CTS:	    minorMode = NSS_AES_CTS;	  break;
> +    case bltestAES_CTR:	    

nit: The extra whitespace at the end of the line
Attachment #705423 - Flags: review?(ryan.sleevi) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: