Open Bug 1549961 Opened 5 years ago Updated 1 year ago

Ensure that remote settings clock skew fetching actually works if your clock is wrong (due to OCSP and other parts of TLS)

Categories

(Firefox :: Remote Settings Client, defect, P5)

68 Branch
defect

Tracking

()

Tracking Status
firefox68 --- affected

People

(Reporter: Gijs, Unassigned)

Details

(This bug report might need to live somewhere else but I don't know where that would be.)

We aim to give people a sensible error about their system clocks being wrong if that's the cause of TLS issues. We do this based on some info we get from periodic checks with the remote settings server.

For the recent armag-add-on stuff I had to do some testing with my system clock set back a few days. One thing I noticed was that a bunch of stuff (notably on AMO) stopped working because of OCSP errors. I haven't looked into the details much but from memory, it was due to the OCSP response being stapled in and not being valid for the right timeframe. This sort of makes sense, if the server has accurate timings and the client does not, things go pearshapes in terms of when the OCSP response was fetched.

This got me wondering... it doesn't look like the client-side handling of the fetch request ( https://searchfox.org/mozilla-central/rev/99a2a5a955960b0e58ceade1db1f7652d9db4ba1/services/settings/Utils.jsm#80 ) takes any specific care about these OCSP errors, and I suspect it will simply fail if a future-timed OCSP response is stapled and/or any other timing-specific .

In other words, if a client really does have an out-of-date system clock, I wonder if we would ever find out, because the request to remote settings will also fail.

For this bug, I'd like us to do 3 things:

  1. check I'm reading this right. Maybe I'm way off base! :-)
  2. if we're not currently OCSP-stapling on this endpoint and therefore we're "accidentally" fine here, but in general my reasoning is correct and we'd fail if we started doing OCSP-stapling, that we come up with a plan to deal with that.
  3. that we add automated testing/monitoring (to Firefox, or to the server, or both) to check for this situation.

:glasserc, are you the right person to look into this type of thing?

Flags: needinfo?(eglassercamp)

I'm not sure I'm the best person to ask, but I'll do my best.

  1. I think your analysis is correct. It seems like https://searchfox.org/mozilla-central/source/browser/actors/NetErrorChild.jsm#473-485 is consulting the Remote Settings pref. This was added in https://hg.mozilla.org/mozilla-central/rev/d65b48650a68a678f5d4c2aabca7e539e647e3d4. This pref is only set on successful polling of the latest-changes endpoint (https://searchfox.org/mozilla-central/source/services/settings/remote-settings.js#231). If the fetch fails because of TLS error, I guess we would never set it. I'm guessing we would report this using uptake telemetry as a network error. The absence of this pref would mean that we would never show the nice "your clock is wrong" message. However, as far as I know, this check only happens when connecting fails in content, so other situations would fail regardless of the presence of this pref.

  2. OCSP Stapling seems to be enabled on all Cloudfront distributions, so yes, we're probably affected.

  3. I'm not really sure what to do about it. When this fails, I imagine it happens during the TLS handshake, so we don't even make a request so we don't have a response, so I don't see how we could access the Date/Age headers. Is there a way to tell our network stack to make a request while disregarding a certain specific kind of TLS failure? I don't see anything we can do on the server side since the client doesn't send its timestamp and I don't think we could make Cloudfront serve backdated certs or anything like that. I'm also not sure how to write an automated test for this -- it seems like I'd have to stand up a HTTPS server and generate a future-dated cert. Do we have anything like that available in the test suite?

Flags: needinfo?(eglassercamp)

When you're backdating your time we compare against the build date and should be able to determine clock skew independently of remote settings.

I think this is just a dupe of bug 1491498 (which depends on bug 1486551 which is a bit harder to solve), so I'd say what you're seeing is purely the client's fault.

(In reply to Johann Hofmann [:johannh] from comment #2)

When you're backdating your time we compare against the build date and should be able to determine clock skew independently of remote settings.

But OCSP errors will start appearing after just one or 2 days skew, and the build date can be many weeks or even months ago.

I think this is just a dupe of bug 1491498

1491498 is about where we show the error, and showing it for OCSP errors; this bug is about how we fetch the data that we use to determine if the clock is skewed, and that potentially failing due to similar OCSP errors (but in the background, so no errors are displayed). So no, I don't think they're dupes at all...

(In reply to Ethan Glasser-Camp (:glasserc) from comment #1)

I'm not sure I'm the best person to ask, but I'll do my best.

  1. I think your analysis is correct. It seems like https://searchfox.org/mozilla-central/source/browser/actors/NetErrorChild.jsm#473-485 is consulting the Remote Settings pref. This was added in https://hg.mozilla.org/mozilla-central/rev/d65b48650a68a678f5d4c2aabca7e539e647e3d4. This pref is only set on successful polling of the latest-changes endpoint (https://searchfox.org/mozilla-central/source/services/settings/remote-settings.js#231). If the fetch fails because of TLS error, I guess we would never set it. I'm guessing we would report this using uptake telemetry as a network error. The absence of this pref would mean that we would never show the nice "your clock is wrong" message. However, as far as I know, this check only happens when connecting fails in content, so other situations would fail regardless of the presence of this pref.

Yep.

  1. OCSP Stapling seems to be enabled on all Cloudfront distributions, so yes, we're probably affected.

OK, good to know.

  1. I'm not really sure what to do about it. When this fails, I imagine it happens during the TLS handshake, so we don't even make a request so we don't have a response, so I don't see how we could access the Date/Age headers. Is there a way to tell our network stack to make a request while disregarding a certain specific kind of TLS failure?

I don't know. Dana?

I know for captive portal detection we've started using non-TLS requests to avoid issues. I don't know if that makes sense specifically for clock skew (and signing them separately with a longer-validity, non-public-web-PKI trusted cert or something). I mean, rolling your own crypto is always bad, except when necessary. Not sure "standard" TLS will work for us here...

I don't see anything we can do on the server side since the client doesn't send its timestamp and I don't think we could make Cloudfront serve backdated certs or anything like that.

Yeah, I imagine we have very little control over cloudfront...

I'm also not sure how to write an automated test for this -- it seems like I'd have to stand up a HTTPS server and generate a future-dated cert. Do we have anything like that available in the test suite?

Our test infrastructure has an http server and some weird TLS wrappers and certs for it. So in principle, yes, but when I last had to touch it at this kind of level it was not super straightforward.

Flags: needinfo?(dkeeler)

(In reply to :Gijs (he/him) from comment #3)

I know for captive portal detection we've started using non-TLS requests to avoid issues. I don't know if that makes sense specifically for clock skew (and signing them separately with a longer-validity, non-public-web-PKI trusted cert or something). I mean, rolling your own crypto is always bad, except when necessary. Not sure "standard" TLS will work for us here...

This is something we've thought about for a while, but it seems like a big project with unknown level of reward, so it wasn't really prioritized.

(In reply to :Gijs (he/him) from comment #3)

(In reply to Johann Hofmann [:johannh] from comment #2)

  1. I'm not really sure what to do about it. When this fails, I imagine it happens during the TLS handshake, so we don't even make a request so we don't have a response, so I don't see how we could access the Date/Age headers. Is there a way to tell our network stack to make a request while disregarding a certain specific kind of TLS failure?

I don't know. Dana?

No data or headers are available if the handshake fails.
I don't think we have a way to disable things like OCSP stapling on a per-request basis (and even then, it seems like a bad idea to disable checking revocation information for requests that fetch trusted remote information for the browser.

I know for captive portal detection we've started using non-TLS requests to avoid issues. I don't know if that makes sense specifically for clock skew (and signing them separately with a longer-validity, non-public-web-PKI trusted cert or something). I mean, rolling your own crypto is always bad, except when necessary. Not sure "standard" TLS will work for us here...

There's roughtime, but I don't know what the status of that is.

Flags: needinfo?(dkeeler)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #5)

(In reply to :Gijs (he/him) from comment #3)

(In reply to Johann Hofmann [:johannh] from comment #2)

  1. I'm not really sure what to do about it. When this fails, I imagine it happens during the TLS handshake, so we don't even make a request so we don't have a response, so I don't see how we could access the Date/Age headers. Is there a way to tell our network stack to make a request while disregarding a certain specific kind of TLS failure?

I don't know. Dana?

No data or headers are available if the handshake fails.

Yeah, this makes sense.

I don't think we have a way to disable things like OCSP stapling on a per-request basis (and even then, it seems like a bad idea to disable checking revocation information for requests that fetch trusted remote information for the browser.

I agree, I'd only be interested in doing it for the clock skew data.

I know for captive portal detection we've started using non-TLS requests to avoid issues. I don't know if that makes sense specifically for clock skew (and signing them separately with a longer-validity, non-public-web-PKI trusted cert or something). I mean, rolling your own crypto is always bad, except when necessary. Not sure "standard" TLS will work for us here...

There's roughtime, but I don't know what the status of that is.

Looks like there's a cloudflare and a rust implementation, in addition to Google's, but the list of available servers looks limited. I expect implementing something like that is probably a significant project.

It sounds like we don't have cheap (in the "engineering time" sense) options to do much about this...

Since this ticket is about certificates management, I'm moving it to Core/PSM

Component: Remote Settings Client → Security: PSM
Product: Firefox → Core

This isn't a PSM bug. As filed, the purpose of this bug is to ensure the remote settings clock skew detection feature works if the client clock is wrong enough such that we can't verify TLS information. This isn't going to be fixed by changing how TLS works - we need to change how remote settings works.

Component: Security: PSM → Remote Settings Client
Product: Core → Firefox

The priority flag is not set for this bug.
:leplatrem, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mathieu)

I don't see any «Priority flag», I guess it's the «Importance» field ;) https://bugzilla.mozilla.org/show_bug.cgi?id=1548506

Flags: needinfo?(mathieu)
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.