508633 - Unresponsive OCSP server should not kill page load

Reporter

Description

•

16 years ago

[Filing as PSM because I think this is a product issue, not an NSS one, but copying some NSS folks as well, because they'll know for sure] We had a situation in bug 508408 where an unresponsive OCSP responder on https://addons.mozilla.org was causing page load to hang for a few minutes before timing out. Rerouting the OCSP requests to localhost (which immediately refused the connection) let the page load instantly, albeit downgraded to DV, as expected from bug 405139. It is my opinion that an unresponsive OCSP responder should be treated like one that refuses the connection, that is, EV status should be revoked, but the connection should proceed. Indeed, in a bug I am having no luck finding, I seem to recall Kai implementing precisely that behaviour, with a timeout of something like 15s. But it would seem that fix did not do what was intended, or that we've discovered some similar situation not covered by the original patch.

Nelson Bolyard (seldom reads bugmail)

Comment 1

•

16 years ago

Mike, although the NSS library uses locks and is quite capable of operating simultaneously on multiple threads, I am told that PSM single threads all SSL connections. Perhaps that is the problem. If multiple requests are being made to the same server, and for each one there is a long time, and those timeouts are serialized rather than running concurrently ... If NSS is not obeying the configured socket timeouts, that's an NSS bug. Otherwise ...

Kai Engert [:KaiE:]

Comment 2

•

16 years ago

Mozilla itself "single threads" all of its networking, too :-) It uses a single dispatcher thread for all networking, which dispatches non-blocking I/O calls serially... PSM uses the same strategy on its own separate thread. Sometimes NSS decides to block a non-blocking read/write call... when it wants to perform OCSP. Yes, at this given moment, PSM will not be able to perform any other read/write calls until the situation has been resolved and the non-SSL OCSP request has been completed. During this duration all attempts to do read/write on an SSL socket, initiated by the Mozilla networking layer, will be temporary rejected with error code WOULDBLOCK.

Kai Engert [:KaiE:]

Comment 3

•

16 years ago

Johnathan, yes we had reduced the timeout, and then it had been declared as acceptable/fixed. We need to reproduce the AMO situation and debug it to understand what's happening.

Wan-Teh Chang

Comment 4

•

16 years ago

Kai, the difference is that Mozilla waits for all pending IO operations simultaneously, whereas PSM issues and waits for pending SSL IO operations one at a time.

Kai Engert [:KaiE:]

Comment 5

•

16 years ago

(In reply to comment #4) > Kai, the difference is that Mozilla waits for all pending IO > operations simultaneously, whereas PSM issues and waits for > pending SSL IO operations one at a time. Agreed, that's true. I found my old patch and have attached it to bug 511393 for review.

matthew zeier [:mrz]

Comment 6

•

16 years ago

(In reply to comment #3) > We need to reproduce the AMO situation and debug it to understand what's > happening. Appears trivial to duplicate - setup an OCSP responder that either has :80 firewalled off or one that accepts a connection and never responds. In fact, I duplicated this just last night in production!

Kai Engert [:KaiE:]

Updated

•

15 years ago

Depends on: 511393

Whiteboard: [psm-fatal]

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 9

•

14 years ago

I think there are several issues: * When an OCSP request times out, we should continue to let the page load, even if it is EV, if the option to require an OCSP response hasn't been set; in the case of EV, this should downgrade to DV indications. However, this isn't working; the timeout causes the page to fail to load. * When a page has N HTTPS subresources that use the same OCSP responder, and that OCSP responder doesn't respond, we may make up to N requests to that OCSP responder, each of which will time out. This is different than the normal case where we get a response, because then we only make 1 OCSP request and then cache it. To resolve this, we would need to cache an indication that the OCSP responder is down and avoid making OCSP requests to it for the current page and/or for a certain time period. * It may be the case that the 15s timeout isn't functioning correctly. * 15s may be too long of a timeout. * Bug 511393 causes all SSL traffic to be serialized behind the current OCSP request, even when that traffic doesn't depend on that OCSP request. Is there anything I missed? Let's make this bug about the first issue and then I'll file separate bugs about the others.

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Updated

•

14 years ago

Depends on: 674147

Kai Engert [:KaiE:]

Comment 10

•

13 years ago

reassign bug owner. mass-update-kaie-20120918

Assignee: kaie → nobody

Stefan Fleiter (:sfleiter)

Updated

•

13 years ago

Blocks: 803582

Dana Keeler (she/her) [:keeler]

Comment 11

•

9 years ago

We currently time out after 2 seconds for DV and 10 seconds for EV. I don't think there's anything else to do here.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WORKSFORME

Bugzilla

Unresponsive OCSP server should not kill page load

Categories

(Core :: Security: PSM, defect)

Tracking

()

People

(Reporter: johnath, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [psm-fatal])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 9

Updated

Comment 10

Updated

Comment 11