Closed Bug 1543874 Opened 4 years ago Closed 4 years ago

Enable external clock for SSL

Categories

(NSS :: Libraries, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mt, Assigned: mt)

Details

Attachments

(1 file)

Right now, libssl calls PR_Now() all over the place. That makes it hard to test scenarios that rely on the clock. We have a few hacks in place in libssl_internals.c that push start times of timers back when things happen, but those are spot fixes to the underlying problem.

Our QUIC stack is externally driven for both I/O and time. The I/O stuff is manageable, but time is much harder. In order for us to drive time externally, we'll need some way to set the value that libssl gets for time.

Basic idea:

Add a now() function pointer to each socket and call that instead of ssl_Time<Foo>(). The cost of a dynamic dispatch should be minimal when compared to a system call.

Provide an API for providing an alternative clock implementation.

Cleanups:

  • Fuzzing code doesn't need to conditionally compile the time functions out. That code can use the new API.

  • Remove all places where we tweak internal timers and instead run the clock forwards. DTLS retransmission is one of those cases that can use this. Remove the various functions that poke at internals.

  • Tweak time-based tests (like ticket expiration) to use the new API.

Addendum: It might not be possible to attach something to sockets. Management of session IDs is one of the heavier users of timers and the sslSessionID functions don't generally have a socket handy. This can be rectified by replicating the timer function on the sid. Similarly, the anti-replay code is global, so that might need some work.

Take note of the questions here...

This adds a new (experimental) API that allows users of libssl to provide their
own clock function. This is primarily of use in testing, but it also enables
our QUIC implementation, which also runs off an external clock.

SSL Sockets (and session IDs, when they are in memory) now have a "now()"
function and void* arg attached to them. By default, this is a function that
calls PR_Now(). These values are copied from the socket to any session ID that
is created from the socket, and to any session ID that is restored from the
session cache.

The ssl_Time() and ssl_TimeUsec() functions have been removed.

As part of this, the experimental SSL_SetupAntiReplay() function had to be
modified to take an external clock (PR_Now() suffices generally). That function
relies on knowing the time, and it doesn't have a socket to work from. To avoid
problems arising from the change in the signature, SSL_SetupAntiReplay is now
removed.

QUESTION: is it worth keeping SSL_SetupAntiReplay()? It's trivial.

There are now three uses of time in the library:

  • The primary source of time runs of these newly added functions. This governs
    session expiry, 0-RTT checks, and related functions.

  • The session cache uses a separate time to manage its locking. This is of type
    PRUint32 in seconds (rather than PRTime in microseconds). In investigating
    this, I found several places where this time in seconds was leaking across to
    the main functions via the lastAccessTime property. That was fixed. The
    cache functions that use time now all call ssl_CacheNow() to get time.

  • DTLS timers run using PRIntervalTime. This is a little annoying and these
    could be made to use the main time source, but that would result in
    conversions between PRTime and PRIntervalTime at the DTLS API. PRIntervalTime
    has a different epoch to PRTime, so this would be a little awkward.

QUESTION: is it worth converting DTLS timers to PRTime rather than
PRIntervalTime?

Only the first of these can be controlled.

Bugs found:

  • Expiration time of resumption tokens was based on the sid->expirationTime,
    which didn't account for the lifetime provided by the server. These are now
    capped by the minimum of ssl_ticket_lifetime and the value the server
    indicates.

    I removed ssl3_sid_timeout, the old limit, because inconsistent lifetimes
    between client and server messed with tests. The client would have a lower
    cap than the server, which prevented testing of the enforcement of server
    limits without jumping through hoops.

  • There was a missing time conversion in tls13_InWindow which made the window
    checks too lenient.

  • lastAccessTime was being set to seconds-since-epoch instead of
    microseconds-since-epoch in a few places.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.45
You need to log in before you can comment on or make changes to this bug.