Closed Bug 1344890 Opened 7 years ago Closed 7 years ago

h2 tls 1.3 early data problem with https://echo.filippo.io/foo

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox53 --- unaffected
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: mcmanus, Assigned: mcmanus)

References

Details

Attachments

(1 file, 1 obsolete file)

It was reported to me that this site https://echo.filippo.io/foo lost the lock indicator but loaded fine when doing h2 early data with tls 1.3

I can't reproduce that. But when I am there and do a hard refresh (ctrl-shift-r) the reload just spins and timesout and retries successfully (presumably without early data). we should track down both issues.

we're on the cusp of deploying, so some urgency makes sense here.. nick do you want to start, can pass to dragana and then as timezones dictate?
Flags: needinfo?(hurley)
I have seen it lose the lock after the long timeout associated with the hard refresh.

obv this is the latest nightly with the early data pref on.
Alright, taking a look. I can repro, but nothing particularly suspicious in the logs that I'm seeing on first glance. Getting a debug build ready to run through rr so I can go back and forth to try to see where we get off the rails.
Flags: needinfo?(hurley)
thanks
FYI, that origin will currently fail to resume most of the times, as we have not deployed widely the session ticket key sharing code. That *should* change in a day or two. In the meantime you'll get 0-RTT enabled tickets, but they will be rejected. If it succeeds you'll see a "Cf-0rtt-Unique" header. In my staging environment I see the HTTPS indicator (and all TLS UI in the dialog) disappear as soon as I hard reload, whatever the URL.

I am actually curious about the nature of the long timeout, as it does not match any of the three scenarios: successful resumption; failed resumption with discarded early data; fallback to 1.2, with failed handshake. Any more pointers as to what might cause it?
I can't reproduce if I disable H/2, or if early data is not used. However, if I reproduce with H/2 on, turn it off, and then hard-reload I enter a weird state, where the indicator is empty, but the dialog is empty (as opposed as to saying "connection is unencrypted" as it does in normal reproductions).
I'm seeing it figure out quickly that the resumption fails (just a few ms after sending early data)

2017-03-07 02:39:09.246000 UTC - [Socket Thread]: V/nsHttp nsHttpConnection::EnsureNPNComplete [this=0F74A480] early data not accepted
2017-03-07 02:39:09.246000 UTC - [Socket Thread]: I/nsHttp Http2Session::Finish0RTT 0AA45800 aRestart=1 aAlpnChanged=0

but it doesn't get restarted for a long time.. don't know why at this time

2017-03-07 02:40:02.790000 UTC - [Main Thread]: V/nsHttp nsHttpConnectionMgr::RescheduleTransaction [trans=014D9400 0]
(In reply to Patrick McManus [:mcmanus] from comment #6)
> I'm seeing it figure out quickly that the resumption fails (just a few ms
> after sending early data)
> 
> 2017-03-07 02:39:09.246000 UTC - [Socket Thread]: V/nsHttp
> nsHttpConnection::EnsureNPNComplete [this=0F74A480] early data not accepted
> 2017-03-07 02:39:09.246000 UTC - [Socket Thread]: I/nsHttp
> Http2Session::Finish0RTT 0AA45800 aRestart=1 aAlpnChanged=0
> 
> but it doesn't get restarted for a long time.. don't know why at this time
> 
> 2017-03-07 02:40:02.790000 UTC - [Main Thread]: V/nsHttp
> nsHttpConnectionMgr::RescheduleTransaction [trans=014D9400 0]

I think the RescheduleTransaction is a red herring - that only happens when we change the priority of a channel, and it doesn't appear in my nspr log for my repro at all. (I also don't see how RescheduleTransaction actually forces a send).

I tried forcing ResumeSend (via SetWriteCallbacks) in Finish0RTT for the restart case yesterday to no avail (it was getting late in my working day, and was the best thing I could think of with the information available).
agree about rescheduleTransaction - not relevant
in the rr recording I have of this, we write out 396 bytes as early data (magic, some priority setup, some h2 HEADERS frame(s)). Then we find out it was rejected (Session Ticket failure probably) and then we PR_Write() it again under the full session 150ms later. PR_Write returns 396.

so that all seems right at that level. I do not know (yet) whether PSM and NSS actually turned that into bytes on the wire yet.
I think we just don't check the poll for reader - having lost the event state of a stream outstanding when we replayed it from a buffer.. the read seems to be kicked in 60 seconds later by a PING being scheduled coincidentally.
update: so I can fix the big delay - it was a state machine problem of not being level triggered for READ after resending the failed early data. I'll consider this a secondary problem and post the fix in a separate bug.

The core problem with the lack of a security indicator remains.. it is somewhat intermittent (but more common than not).
(In reply to Patrick McManus [:mcmanus] from comment #11)
> update: so I can fix the big delay - it was a state machine problem of not
> being level triggered for READ after resending the failed early data. I'll
> consider this a secondary problem and post the fix in a separate bug.
> 
> The core problem with the lack of a security indicator remains.. it is
> somewhat intermittent (but more common than not).

Wow, this is all going well over my lunch break! :)

Is it just a case of a missing ResumeRecv after the replay? (I had the same delay issue with accepted early data on my first attempt.)

The missing lock icon *is* still concerning. Is there maybe something in psm or nss that doesn't get properly initialized in the h2 replay case? (Otherwise, I can't imagine replayed early data having any bearing on the lock icon.)
Depends on: 1345240
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #12)

> 
> Is it just a case of a missing ResumeRecv after the replay? (I had the same
> delay issue with accepted early data on my first attempt.)
> 

pretty much
this testcase is nice (right now) as it gives a mix of pass/fail on the early data depending on the session ticket.. that all seems ok now
when there is no lock nsSecureBrowserUIImpl.cpp returns very early because the info argument cannot be QI'd to nsITransportSecurityInfo .. presumably something to do with the early state of things being in the early data phase.

It is normally that structure that determines STATE_IS_SECURE.

maybe that helps you nick or david (ni'd).. I gotta afk soon to host a birthday party.
Flags: needinfo?(dkeeler)
specifically GetSecurityStateFromSecurityInfoAndRequest()
(In reply to Patrick McManus [:mcmanus] from comment #15)

> the info argument cannot be QI'd to nsITransportSecurityInfo .. presumably

well, its null :)..
when the channel calls mTransaction->SecurityInfo() from OnStartRequest it is returning null..

this is because mTransaction::ReadSegments does not set the securityInfo or mConnected during 0RTT. This can be fixed up in transaction::Finish0RTT I believe. Patch forthcoming
Flags: needinfo?(dkeeler) → needinfo?(dd.mozilla)
Attachment #8845036 - Flags: review?(dd.mozilla)
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
(In reply to Patrick McManus [:mcmanus] from comment #18)
> when the channel calls mTransaction->SecurityInfo() from OnStartRequest it
> is returning null..
> 
> this is because mTransaction::ReadSegments does not set the securityInfo or
> mConnected during 0RTT. This can be fixed up in transaction::Finish0RTT I
> believe. Patch forthcoming

In the h1 case ReadSegments gets called on more time after tls handshake finishes and sets this variables (I was looking into this when I did h1-0rtt). H2 does not :(
Flags: needinfo?(dd.mozilla)
Comment on attachment 8845036 [details] [diff] [review]
need to set nsHttpTransaction.mSecurityInfo in Finish0RTT

Review of attachment 8845036 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpTransaction.cpp
@@ +2177,5 @@
>          }
> +    } else {
> +        // this is code that was skipped in ::ReadSegments while in 0RTT
> +        mConnected = true;
> +        mConnection->GetSecurityInfo(getter_AddRefs(mSecurityInfo));

in h1 case this will be called 2 times, but that is not a problem.
Attachment #8845036 - Flags: review?(dd.mozilla) → review+
yeah - we both just made an assumption there.

I'll update my patch to consider mConnected=true in finish0rtt
(In reply to Patrick McManus [:mcmanus] from comment #23)
> yeah - we both just made an assumption there.
> 
> I'll update my patch to consider mConnected=true in finish0rtt

it needs check in ReadSegments, not Finish0RTT? 

ReadSegments is called after.
Attachment #8845036 - Attachment is obsolete: true
Comment on attachment 8845041 [details] [diff] [review]
need to set nsHttpTransaction.mSecurityInfo in Finish0RTT

Review of attachment 8845041 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpTransaction.cpp
@@ +2174,5 @@
>              seekable->Seek(nsISeekableStream::NS_SEEK_SET, 0);
>          } else {
>              return NS_ERROR_FAILURE;
>          }
> +    } else if (!mConnected) {

ReadSegments already has a check. (I wrote a comment without checking)

This one will never be false. It will always be mConnected==false. But it does not hurt.
Attachment #8845041 - Flags: review?(dd.mozilla) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/0fd44a0d6351f6bebaa94477e64f89b5f58bb796
Bug 1344890 - need to set nsHttpTransaction.mSecurityInfo in Finish0RTT r=dragana
https://hg.mozilla.org/mozilla-central/rev/0fd44a0d6351
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Blocks: 1322373
Comment on attachment 8845041 [details] [diff] [review]
need to set nsHttpTransaction.mSecurityInfo in Finish0RTT

Approval Request Comment
[Feature/Bug causing the regression]: 1322373
[User impact if declined]: https:// using tls 1.3 0rtt and early data may lose security indicator
[Is this code covered by automated tests?]: exercised but not validated
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: 1345240
[Is the change risky?]: no
[Why is the change risky/not risky?]: very targetted to this case - 2 lines
[String changes made/needed]: none
Attachment #8845041 - Flags: approval-mozilla-aurora?
Comment on attachment 8845041 [details] [diff] [review]
need to set nsHttpTransaction.mSecurityInfo in Finish0RTT

Fix a tls 1.3 early data issue. Aurora54+.
Attachment #8845041 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: