Closed Bug 1629982 Opened 5 years ago Closed 5 years ago

SEC_ERROR_NO_TOKEN when using neqo-client (SSL_HkdfExtract)

Categories

(NSS :: Libraries, defect, P3)

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: grover, Assigned: kjacobs)

Details

Neqo, Mozilla's QUIC/H3 implementation, pulls latest NSS repo and builds it as part of its build process. Recently we have started to see a runtime error:

H3 Client connecting: V6([2601:1c2:1900:71:941b:f26d:48e6:2062]:38984) -> V6([2001:19f0:4:34::1]:4433)
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: NssError { name: "SEC_ERROR_NO_TOKEN", code: -8127, desc: "The security card or token does not exist, needs to be initialized, or has been removed." }', neqo-transport/src/crypto.rs:253:30

when we use SSL_HkdfExtract().

Using NSS 3.51 RTM causes the problem to go away.

I know it's not great to have a test case that requires a whole nother project but I know there's been some work lately in that area of NSS and wanted to give you an early heads up that something is broken.

I can give build steps for Neqo if you want, or a Docker container, if that's easier.

Thanks for the report. Is this something that shows up in Neqo tests, or only from Firefox?

We haven't seen this in NSS yet, so if there's any way of getting a reproducer (or even a call stack) that would be helpful.

Flags: needinfo?(agrover)

Spot where ERR_NO_TOKEN is being set:

#0  PK11_GetBestSlotMultipleWithAttributes (type=0x7ffc22ea0198, mechanismInfoFlags=0x0, keySize=0x0, mech_count=1, wincx=0x0) at ../../lib/pk11wrap/pk11slot.c:2410
#1  0x000055c63ed55206 in PK11_GetBestSlot (type=16426, wincx=0x0) at ../../lib/pk11wrap/pk11slot.c:2427
#2  0x000055c63ed4efc0 in PK11_DeriveWithTemplate (baseKey=0x55c63fa9cbf0, derive=16426, param=0x7ffc22ea0530, target=16426, operation=268, keySize=0, userAttr=0x0, 
    numAttrs=0, isPerm=0) at ../../lib/pk11wrap/pk11skey.c:1693
#3  0x000055c63ed4ec4f in PK11_Derive (baseKey=0x55c63fa9cbf0, derive=16426, param=0x7ffc22ea0530, target=16426, operation=268, keySize=0)
    at ../../lib/pk11wrap/pk11skey.c:1584
#4  0x000055c63ed797c1 in tls13_HkdfExtract (ikm1=0x55c63fa9b360, ikm2=0x55c63fa9cbf0, baseHash=ssl_hash_sha256, prkp=0x7ffc22ea0850) at ../../lib/ssl/tls13hkdf.c:153
#5  0x000055c63edb54ae in SSLExp_HkdfExtract (version=772, cipherSuite=4865, salt=0x55c63fa9b360, ikm=0x55c63fa9cbf0, keyp=0x7ffc22ea0850)
    at ../../lib/ssl/sslprimitive.c:217
#6  0x000055c63ecf8424 in neqo_crypto::hkdf::SSL_HkdfExtract (version=772, cipher=4865, salt=0x55c63fa9b360 "TC", <incomplete sequence \316>, 
    ikm=0x55c63fa9cbf0 "TC", <incomplete sequence \316>, prk=0x7ffc22ea0850) at neqo-crypto/src/exp.rs:19
#7  0x000055c63ecf7b64 in neqo_crypto::hkdf::extract (version=772, cipher=4865, salt=..., ikm=0x7ffc22ea0b38) at neqo-crypto/src/hkdf.rs:110
#8  0x000055c63ec89cd0 in neqo_transport::crypto::CryptoDxState::new_initial (direction=neqo_transport::crypto::CryptoDxDirection::Write, label=..., dcid=...)
    at neqo-transport/src/crypto.rs:291
#9  0x000055c63ec8e21b in neqo_transport::crypto::CryptoStates::init (self=0x7ffc22ea1700, role=neqo_common::Role::Client, dcid=...) at neqo-transport/src/crypto.rs:613
#10 0x000055c63e973e74 in neqo_transport::connection::Connection::new_client (server_name=..., protocols=..., cid_manager=..., local_addr=..., remote_addr=...)
    at /home/agrover/git/neqo/neqo-transport/src/connection.rs:384
#11 0x000055c63e982981 in neqo_http3::connection_client::Http3Client::new (server_name=..., protocols=..., cid_manager=..., local_addr=..., remote_addr=..., 
    max_table_size=128, max_blocked_streams=128) at /home/agrover/git/neqo/neqo-http3/src/connection_client.rs:54
#12 0x000055c63e97a57d in neqo_client::client (args=0x7ffc22ea9580, socket=..., local_addr=..., remote_addr=..., origin=..., urls=...) at neqo-client/src/main.rs:284
#13 0x000055c63e97d3bc in neqo_client::main () at neqo-client/src/main.rs:450
Flags: needinfo?(agrover)

What is your $LD_LIBRARY_PATH Andy?

One of the things we have to be careful of with this is that we avoid loading libsoftokn3.so from the OS. If you are on a RedHat distro, then NSS will be installed and so /usr/lib/libsoftokn3.so might be loaded ahead of the version of NSS you have built for neqo. NSS does its own dynamic loading. That loading respects the usual search path from the OS (and LD_LIBRARY_PATH), but it isn't possible to hard-code hints about the location of the library into the binary. As a result, we are somewhat at the mercy of the environment. If you have an old NSS lying around, then it will load that and complain that it can't find the method. This method in particular is new enough that it might not be in a general release.

The priority flag is not set for this bug.
:jcj, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jjones)

My LD_LIBRARY_PATH isn't set. Yes I'm on Fedora, so Red Hat distro genes. nss-softokn-3.51.0-1.fc32.x86_64 is installed and can't be removed without hosing the system.

You're saying because NSS does its own loading that even though we're building NSS statically into Neqo, we could still get bitten by installed system NSS library versions?

The final data point to reiterate is that going back to NSS 3.51 RTM tag makes this problem go away. Has there been work since that release around dynlib loading that could cause the issue to newly manifest?

I can't see anything since we tagged 3.51 RTM that would have changed this behavior.

kjacobs - can you take a stab at triage for this one?

Flags: needinfo?(jjones) → needinfo?(kjacobs.bugzilla)

Andy, thanks - I have a local neqo build working... Is this something that I could reproduce fairly easily from here (if so, how)? Bisecting will likely be the quickest path, as there are bunch of PKCS11 v3.0 updates in 3.52.

I'll dig into this more tomorrow.

Flags: needinfo?(agrover)

Hi Kevin, wow you are building Neqo! Awesome! Now, you can run ./target/debug/neqo-client https://cloudflare-quic.com/ and it will either work or you will see the error.

As far as bisecting I think you could either:

  • Build NSS separately as described here
  • Keep building NSS as sub-build of Neqo, but change neqo-crypto/build.rs line 104 to build the exact commit you want

Hope this helps, very curious to learn what you find.

Flags: needinfo?(agrover)

BTW I started seeing the error when moving from NSS_3_51_1_RTM to NSS_3_52_BETA1.

Also if going the sub-build route, suggest git clean -fxd between runs to ensure freshness.

Thanks! Unfortunately I'm not having much luck reproducing.

Separate NSS:

kjacobs-44776:neqo kjacobs$ echo $NSS_DIR
/Users/kjacobs/repos/nss
kjacobs-44776:neqo kjacobs$ echo $DYLD_LIBRARY_PATH
/Users/kjacobs/repos/dist/Debug/lib/
for n in {1..25}; do ./target/debug/neqo-client https://cloudflare-quic.com/; done
... <success>

Combined build (with updated neqo-crypto/build.rs):

kjacobs-44776:neqo kjacobs$ cargo build
     ...
    Finished dev [unoptimized + debuginfo] target(s) in 2m 06s
kjacobs-44776:neqo kjacobs$ ls ./target/debug/build
kjacobs-44776:neqo kjacobs$ cd ./target/debug/build/neqo-crypto-71e8b705e7f49c93/out/nss
kjacobs-44776:nss kjacobs$ hg log -r .
changeset:   15581:bb4462a16de8
tag:         NSS_3_52_BETA2
...
kjacobs-44776:nss kjacobs$ cd ~/repos/neqo/
kjacobs-44776:neqo kjacobs$ unset NSS_DIR
kjacobs-44776:neqo kjacobs$ export DYLD_LIBRARY_PATH=/Users/kjacobs/repos/neqo/target/debug/build/neqo-crypto-71e8b705e7f49c93/out/dist/Debug/lib/
for n in {1..25}; do ./target/debug/neqo-client https://cloudflare-quic.com/; done
... <success>

Note, in both cases I occasionally get

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: InvalidInput, error: "cannot set a 0 duration timeout" }', src/libcore/result.rs:1188:5

but that's almost certainly unrelated.

Let me know if I might have missed something above. Out of curiosity, have you reproduced on another machine/OS?

I confirmed that the right libs are being loaded by setting the error and returning NULL at the very start of that function, then rebuilding:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: NssError { name: "SEC_ERROR_NO_TOKEN", code: -8127, desc: "The security card or token does not exist, needs to be initialized, or has been removed." }', src/libcore/result.rs:1188:5

Without that modification (i.e. with a fresh copy of NSS_3_52_RTM), everything works for me:

H3 Client connecting: V6([2601:1c2:4c80:570:3850:a692:ff2b:fb72]:63613) -> V6([2606:4700:10::6816:826]:443)
READ HEADERS[0]: Ok(([(":status", "200"), ("date", "Fri, 01 May 2020 21:32:55 GMT"), ("content-type", "text/html"), ("content-length", "106072"), ("set-cookie", "__cfduid=dbad89ce516362dad7b443fa89c69a9ee1588368775; expires=Sun, 31-May-20 21:32:55 GMT; path=/; domain=.cloudflare-quic.com; HttpOnly; SameSite=Lax; Secure"), ("expect-ct", "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\""), ("server", "cloudflare"), ("cf-ray", "58cc9e2bda4e8cdf-PDX"), ("alt-svc", "h3-27=\":443\"; ma=86400, h3-25=\":443\"; ma=86400, h3-24=\":443\"; ma=86400, h3-23=\":443\"; ma=86400"), ("cf-request-id", "0273c12f6a00008cdf0d011200000001")], false))
READ[0]: 763 bytes
...

Edit: Also can't reproduce on Ubuntu 18.04.

Flags: needinfo?(kjacobs.bugzilla)

Out of curiosity, have you reproduced on another machine/OS?

Another machine yes, different OS no. I'll try that next.

Thanks for your time in looking into this.

Flags: needinfo?(agrover)
Assignee: nobody → kjacobs.bugzilla
Status: NEW → ASSIGNED
Priority: -- → P3

Ooookay. Yeah I'm linking against libsoftokn3.so provided by nss-softokn-3.51.0-1.fc32.x86_64, and this doesn't have CKM_HKDF_DERIVE in the supported mechanisms, leading the call to PK11_DoesMechanism in PK11_SymKeysToSameSlot to return false.

So this should go away once Fedora ships 3.52.

Closing. Thanks again for your time.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(agrover)
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.