Support TCP fastopen

RESOLVED FIXED in mozilla55

Status

()

RESOLVED FIXED
4 years ago
a year ago

People

(Reporter: valentin, Assigned: dragana)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

Trunk
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 disabled, firefox56 disabled, firefox57 disabled)

Details

(Whiteboard: [necko-active][parity-edge][parity-webkit][parity-blink])

Attachments

(6 attachments, 23 obsolete attachments)

13.37 KB, application/octet-stream
Details
12.13 KB, patch
mcmanus
: review+
Details | Diff | Splinter Review
2.79 KB, patch
dragana
: review+
Details | Diff | Splinter Review
2.33 KB, patch
dragana
: review+
Details | Diff | Splinter Review
89.20 KB, patch
dragana
: review+
Details | Diff | Splinter Review
31.53 KB, patch
dragana
: review+
Details | Diff | Splinter Review
(Reporter)

Description

4 years ago
TCP Fast Open is a feature that allows saving a round-trip when establishing a connection. [1] [2]

This was implemented in the linux 3.6 kernel. We only intend to enable this for TLS connections.

Using this feature requires some changes to the way we establish a connection.
connect() + send()/write() is replaced by a call to sendto() [1]

This needs an implementation of PSMSendTo in nsNSSIOLayer.cpp and probably needs implementations of sendto for the lower layer methods as well.

[1] http://lwn.net/Articles/508865/
[2] https://tools.ietf.org/html/rfc7413
This sounds like more of a necko bug.
Component: Security: PSM → Networking
Whiteboard: [necko-backlog]
tcp fast open will be a feature of windows 10 anniversary update - pushed for free to the win10 userbase. No actual date for that seems to be available, but I've heard anything from August to October 2016. So soon.

That's going to massively change the number of tcp-fo eligible clients. Servers with modern linux kernels already have it. So the potential impact is large.

There are concerns about middleboxes. Some form of happy eyeballs should play a role.
Whiteboard: [necko-backlog] → [necko-next]

Comment 3

3 years ago
https://developer.microsoft.com/en-us/microsoft-edge/platform/changelog/desktop/14352/
https://developer.apple.com/videos/play/wwdc2015/719/
Google currently supports it with Chromium on Linux and is working on support for OS X and Windows.

This could be interesting with 0-RTT in TLS 1.3 (bug 1057463).
OS: Unspecified → All
Hardware: Unspecified → All
Whiteboard: [necko-next] → [necko-next][parity-edge][parity-webkit][parity-blink]
Version: unspecified → Trunk
Duplicate of this bug: 1297042
(Assignee)

Comment 5

3 years ago
I am looking into this.
(Assignee)

Updated

3 years ago
Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
Whiteboard: [necko-next][parity-edge][parity-webkit][parity-blink] → [necko-active][parity-edge][parity-webkit][parity-blink]
(Assignee)

Comment 14

3 years ago
Posted patch bug_1188435_nspr_v1.patch (obsolete) — Splinter Review
Linux use SendTo to send data during TCP Fast Open.
OSX uses connectx
and Windows uses connectex.

I made the interface all platform to be the same as on Linux. So necko will use PR_SendTo to send data during TCP Fast Open.
Attachment #8800720 - Flags: review?(ted)
(Assignee)

Comment 15

3 years ago
Posted patch bug_1188435_nss_v1.patch (obsolete) — Splinter Review
to make use of TCP Fast Open we need to use PR_SendTo instead of PR_Send. Therefore nss needs to know addr it wants to send data to. The nss change is mainly about way to supply addr and how to deal with connect() specific error (e.g. PR_IN_PROGRESS is not a real error for nss).
Attachment #8800722 - Flags: review?(dkeeler)
(Assignee)

Comment 16

3 years ago
Posted patch bug_1188435_v2.patch (obsolete) — Splinter Review
Attachment #8800726 - Flags: review?(mcmanus)
Comment on attachment 8800722 [details] [diff] [review]
bug_1188435_nss_v1.patch

Review of attachment 8800722 [details] [diff] [review]:
-----------------------------------------------------------------

This should be split into two patches (and maybe even two bugs), since this involves changes to NSS itself. The NSS parts would probably best be reviewed by :ekr or :mt.

::: netwerk/socket/nsISSLSocketControl.idl
@@ +59,5 @@
>      readonly attribute bool earlyDataAccepted;
>  
>      /* When 0RTT is performed, PR_Write will not drive the handshake forward.
>       * It must be forced by calling this function.
> +     * When the Fast Open is used, the socket is not connected yet and we will

documentation nit: I would be more up-front about the effect of having vs. not having aAddr. Maybe something like: "If aAddr is not null, SendTo will be used to send data. This is used in TCP Fast Open. If aAddr is null, Send will be used.

@@ +66,2 @@
>       */
> +    void driveHandshake([const] in PRNetAddrPtr aAddr);

I think this needs to be annotated as [noscript] now (due to the native type PRNetAddrPtr)

::: security/manager/ssl/nsNSSIOLayer.cpp
@@ +353,2 @@
>  NS_IMETHODIMP
> +nsNSSSocketInfo::DriveHandshake(const PRNetAddr *addr)

nit: 'const PRNetAddr* addr'

@@ +1220,5 @@
>  
>  int32_t
>  checkHandshake(int32_t bytesTransfered, bool wasReading,
> +               PRFileDesc* ssl_layer_fd, nsNSSSocketInfo* socketInfo,
> +               PRBool fastOpen)

nit: bool instead of PRBool

@@ +1253,5 @@
>      if (handleHandshakeResultNow) {
> +      if ((PR_WOULD_BLOCK_ERROR == err) ||
> +          (fastOpen && ((PR_IN_PROGRESS_ERROR == err) ||
> +          (PR_IS_CONNECTED_ERROR == err) ||
> +          (PR_NOT_CONNECTED_ERROR == err)))) {

Do these errors need to match the errors in DriveHandshake?

@@ +1308,5 @@
>      // (erroneously) calls an I/O function (PR_Send/PR_Recv/etc.) again on
>      // this socket. Note that we use the original error because if we use
>      // PR_CONNECT_RESET_ERROR, we'll repeated try to reconnect.
> +    if (originalError != PR_WOULD_BLOCK_ERROR &&
> +        (!fastOpen || (originalError != PR_IN_PROGRESS_ERROR &&

It seems like we should define a helper function to enumerate these errors in the fast open case.

@@ +1508,5 @@
> +  return checkHandshake(bytesWritten, false, fd, socketInfo, false);
> +}
> +
> +static int32_t
> +PSMSendto(PRFileDesc *fd, const void *buf, PRInt32 amount,

nit: * to the left

@@ +1514,5 @@
> +{
> +  MOZ_ASSERT(!flags);
> +  nsNSSShutDownPreventionLock locker;
> +  nsNSSSocketInfo* socketInfo = getSocketInfoIfRunning(fd, writing, locker);
> +  if (!socketInfo)

nit: braces around conditionals
Attachment #8800722 - Flags: review?(dkeeler)
(Assignee)

Comment 18

3 years ago
Posted patch bug_1188435_psm_v1.patch (obsolete) — Splinter Review
Attachment #8800722 - Attachment is obsolete: true
Attachment #8801059 - Flags: review?(dkeeler)
(Assignee)

Comment 19

3 years ago
Posted patch bug_1188435_nss_v1.patch (obsolete) — Splinter Review
This is nss part.
The main change is that we need to  use PR_SendTo during Fast Open and we need an address for that. Also some errors are not the real errors, e.g. PR_IN_PROGRESS.

This currently does not work with tls1.3 early-data. We will fix that in a separate bug.
Attachment #8801060 - Flags: review?(ekr)
(Assignee)

Updated

3 years ago
Duplicate of this bug: 1305985
(Dragana, rather put any NSS and NSPR patches to their own respective bugs, for better release tracking, thanks.)
(Assignee)

Comment 22

3 years ago
(In reply to Honza Bambas (:mayhemer) from comment #21)
> (Dragana, rather put any NSS and NSPR patches to their own respective bugs,
> for better release tracking, thanks.)

I wanted to split them later into different bugs. First I wanted to get this all running :)

I will move them.
(Assignee)

Updated

3 years ago
Depends on: 1310197
(Assignee)

Updated

3 years ago
Depends on: 1310201
(Assignee)

Updated

3 years ago
Attachment #8800720 - Attachment is obsolete: true
Attachment #8800720 - Flags: review?(ted)
(Assignee)

Comment 23

3 years ago
Comment on attachment 8801060 [details] [diff] [review]
bug_1188435_nss_v1.patch

Moving to a separate bug.
Attachment #8801060 - Attachment is obsolete: true
Attachment #8801060 - Flags: review?(ekr)
(Assignee)

Comment 24

3 years ago
Comment on attachment 8801059 [details] [diff] [review]
bug_1188435_psm_v1.patch

I am dropping the review for now. Ekr suggested not to change nss. I have a idea how can be done.
Attachment #8801059 - Flags: review?(dkeeler)
(Assignee)

Updated

3 years ago
Attachment #8800726 - Flags: review?(mcmanus)
(Assignee)

Comment 26

2 years ago
Posted patch bug_1188435_v3.patch (obsolete) — Splinter Review
So I moved FastOpen into a separate layer just above socket layer (under nss).

ekr prefer not to change nss. This actually makes the code easier. The FastOpen layer will take care to separate errors from send and connect.
Attachment #8800726 - Attachment is obsolete: true
Attachment #8801059 - Attachment is obsolete: true
Attachment #8803348 - Flags: review?(mcmanus)
I haven't done the full review yet, but at a glance this looks more to my liking too. Thanks for doing it.

What was the corresponding nss bug? Can it be closed as invalid?
(Assignee)

Comment 28

2 years ago
(In reply to Patrick McManus [:mcmanus] from comment #27)
> I haven't done the full review yet, but at a glance this looks more to my
> liking too. Thanks for doing it.
> 
> What was the corresponding nss bug? Can it be closed as invalid?

Bug 1310201.
It is close as invalid now.
(Assignee)

Comment 29

2 years ago
we could do backup connection inside TCPFastOpen layer completely (I have left it where it is now), That would simplify code. But then we will have this in 2 places. For FastOpen it is better to have it in TCPFastOpen layer an for other cases it is better outside of it. Actually, maybe all cases are better to do it in the TCPFastOpen layer.
Comment on attachment 8803348 [details] [diff] [review]
bug_1188435_v3.patch

Review of attachment 8803348 [details] [diff] [review]:
-----------------------------------------------------------------

this is pretty cool! a couple big picture issues, a few nits, and the early data thing should be resolved.. (though I think a patch on top of this one in the same bug might be a reasonable way to do that)

::: modules/libpref/init/all.js
@@ +1584,5 @@
>  
>  pref("network.http.tcp_keepalive.long_lived_connections", true);
>  pref("network.http.tcp_keepalive.long_lived_idle_time", 600);
>  
> +#ifdef XP_WIN32

why?

::: netwerk/base/nsIFastOpen.idl
@@ +2,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "nsISupports.idl"

is there some reason this needs to be xpcom with all its machinery instead of just an abstract C++ class?

@@ +5,5 @@
> +
> +#include "nsISupports.idl"
> +
> +/**
> +  * This is an interface for Fast Open.

mention TCP - rfc 7413.. basically what it does and that this interface might just be doing some stuff in the OS

@@ +12,5 @@
> +  * cancel it if the Fast Open socket was successfully connected
> +  * (FastOpenConnected is called with error=false).
> +  **/
> +
> +[scriptable, uuid(893b07ea-fdf0-44af-8fde-8a02fdf926c9)]

builtinclass instead of scriptable.. then you can skip the noscript attributes

@@ +15,5 @@
> +
> +[scriptable, uuid(893b07ea-fdf0-44af-8fde-8a02fdf926c9)]
> +interface nsIFastOpen : nsISupports
> +{
> +  [noscript] void canDoFastOpen(out boolean canDo);

I think this can be a readonly and infallible bool attribute.. maybe called enabled

::: netwerk/base/nsSocketTransport2.cpp
@@ +1645,5 @@
>      bool tryAgain = false;
> +    if (mFDFastOpenInProgress &&
> +        ((mCondition == NS_ERROR_CONNECTION_REFUSED) ||
> +         (mCondition == NS_ERROR_NET_TIMEOUT))) {
> +        // Blocking TCP Fast Open using iptables can be configured to reset the

I understand what the code does but not what the comment is saying.. what does iptables have to do with this? (other than as a hand test..)

::: netwerk/base/nsTemporaryFileInputStream.cpp
@@ +2,5 @@
>  /* This Source Code Form is subject to the terms of the Mozilla Public
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  #include "nsTemporaryFileInputStream.h"

I don't understand why the diffs to this file are needed at all. any ideas? at worst I would think you could just add a using mozilla;

::: netwerk/protocol/http/nsHttpConnection.cpp
@@ +315,5 @@
>          goto npnComplete;
>  
>      rv = ssl->GetNegotiatedNPN(negotiatedNPN);
> +    // Fast Open does not work well with TLS Early Data. TODO: dragana
> +    // fix this.

:)

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +668,5 @@
> +    LOG(("nsHttpConnectionMgr::RemoveIdleConnection %p conn=%p",
> +         this, conn));
> +
> +    if (!conn->ConnectionInfo())
> +        return NS_ERROR_UNEXPECTED;

braces

@@ +674,5 @@
> +    nsConnectionEntry *ent = LookupConnectionEntry(conn->ConnectionInfo(),
> +                                                   conn, nullptr);
> +
> +    if (!ent || !ent->mIdleConns.RemoveElement(conn))
> +        return NS_ERROR_UNEXPECTED;

braces

@@ +2877,5 @@
>      // step 3
>      if (!specificEnt) {
>          RefPtr<nsHttpConnectionInfo> clone(specificCI->Clone());
>          specificEnt = new nsConnectionEntry(clone);
> +        specificEnt->mUseFastOpen = gHttpHandler->UseFastOpen();

should the cloning ctor do that?

@@ +3330,5 @@
> +        (out == mBackupStreamOut)) {
> +        MOZ_ASSERT(mTransaction->IsNullTransaction());
> +        // Fast Open has been used and primary connection has not been
> +        // connected yet. Probably some syn packet got lost or middleboxes
> +        // interfered.

I think we need more comments that basically say what is happening next.. the backup (non-TFO) connection has succeeded and is going into the idle pool.. the primary (TFO) connection is aborted and the transaction is removed from that and put in the pending queue.. we hope they find each other :)

It took me a while to realize this was the flow.

@@ +3427,5 @@
> +    if (mConnectionUsingFastOpen) {
> +        mSocketTransport = nullptr;
> +        mConnectionUsingFastOpen = nullptr;
> +    }
> +    LOG(("DDDDD %x %d", aError, mEnt->mUseFastOpen));

:)

::: netwerk/protocol/http/nsHttpTransaction.cpp
@@ +1312,5 @@
>      }
>  
> +
> +
> +

stray whitespace
Attachment #8803348 - Flags: review?(mcmanus)
(Assignee)

Comment 31

2 years ago
Posted patch bug_1188435_v3.patch (obsolete) — Splinter Review
I have added some assertions as well.

TLS-0RTT part will take some time to setup a test server, TLS-0RTT server was a hack and  need to get it running again.
Attachment #8803348 - Attachment is obsolete: true
Attachment #8805588 - Flags: review?(mcmanus)
(Assignee)

Comment 33

2 years ago
Posted patch bug_1188435_v3.patch (obsolete) — Splinter Review
Attachment #8805588 - Attachment is obsolete: true
Attachment #8805588 - Flags: review?(mcmanus)
Attachment #8808082 - Flags: review?(mcmanus)
(Assignee)

Comment 34

2 years ago
Posted patch bug_1188435_v3.patch (obsolete) — Splinter Review
rebased and It was to restrictive on one assertion :)
Attachment #8808082 - Attachment is obsolete: true
Attachment #8808082 - Flags: review?(mcmanus)
Attachment #8813296 - Flags: review?(mcmanus)
(Assignee)

Comment 35

2 years ago
Posted patch bug_1188435_part2.patch (obsolete) — Splinter Review
This patch makes tls 0rtt work with TFO. It is not tested with h2/0rtt (not implemented yet). It is probably just working, but I will wait with a review flag until tested.
(Assignee)

Updated

2 years ago
Blocks: 1320268
(Assignee)

Comment 36

2 years ago
An observation:

This will happen never or really rarely. I just want to write it down somewhere.

A server needs to explicitly activate tfo by setting a tcp option. So theoretically it is possible that on a host with 2 servers one server has it turn on and the other does not, or that server changes the option. I believe we will ever see this really on the Internet. I am writing anyway in case it happens...

If the client host has a tfo cookie for a server host but the server does not have tfo activated, FF will send some data on the syn packet, this data will be retransmitted by the tcp stack upon arrival of synack. The socket is polled as ready only after the retransmitted packets is acked. (at least on linux, IW is not change and it should be 10 on versions >3.0). So the handshake last for 2RTT(from FF perspective) and depending on rtt our halfopenconn's backup timer could trigger and close this fine working connection.
(Assignee)

Comment 37

2 years ago
Posted patch bug_1188435_part2.patch (obsolete) — Splinter Review
Attachment #8814318 - Attachment is obsolete: true
(Assignee)

Comment 38

2 years ago
Posted patch bug_1188435_v3.patch (obsolete) — Splinter Review
I have added a logic for turning off tfo if it fails too often.
Attachment #8813296 - Attachment is obsolete: true
Attachment #8813296 - Flags: review?(mcmanus)
Attachment #8814871 - Flags: review?(mcmanus)
(Assignee)

Comment 39

2 years ago
Posted patch bug_1188435_part2.patch (obsolete) — Splinter Review
rebased.
Attachment #8814384 - Attachment is obsolete: true
I've rebased the patch and am actively testing and reviewing. (I'll upload the rebase after my review is done.. don't want to overwrite partial comments - but they're all nits at the moment.) Did want to add an awesome packet cap tho.
Posted file tfo.2sessions
a packet capture of 2 tcp streams.. the first where we learn the tfo cookie and the second where its used along with the Client Hello - to facebook. nice!
Posted patch 0001-rebased-patch.patch (obsolete) — Splinter Review
rebased dd patch
I'm sorry for having kept this bug waiting forever. I have tried to sit down and review this patch a few times but it turns out that I don't actually know most of the NSPR code well enough to do a thorough review without also spending a bunch of time reading the existing NSPR code. In light of that, I'd like to propose that you simply have another networking peer review the NSPR bits, and I will rubber-stamp it.

The only thing I would like to know is what impact this has on our supported platforms. Will the code continue to work on OS versions that don't support fastopen? Will this raise the minimum required version of any OS or glibc or anything like that?
Comment on attachment 8852599 [details] [diff] [review]
0001-rebased-patch.patch

Review of attachment 8852599 [details] [diff] [review]:
-----------------------------------------------------------------

reset the "disabler" upon cleared captive portal event

need to address the incompatibility with 0rtt-tls.. its ok to do so in a separate (high priority) patch. perhaps by capturing nss 0tt output in the nspr layer already between socket and nss

need some telemetry on fallback, success, disablement rates etc..

confirm devtools timing still make sense

add resp header indicator esp for suceeded (addl patch ok)

::: modules/libpref/init/all.js
@@ +4605,5 @@
>  #if defined(XP_UNIX) && !defined(XP_MACOSX)
>  pref("network.tcp.keepalive.probe_count", 4);
>  #endif
>  
> +#ifndef XP_WIN32

drop the preprocessor

::: netwerk/base/TCPFastOpenLayer.h
@@ +11,5 @@
> +
> +namespace mozilla {
> +namespace net {
> +
> +nsresult AttachTCPFastOpenIOLayer(PRFileDesc *fd);

more comments on where this layer is in the stack and its purpose in life

::: netwerk/base/nsSyncStreamListener.cpp
@@ +12,5 @@
>  nsSyncStreamListener::Init()
>  {
>      return NS_NewPipe(getter_AddRefs(mPipeIn),
>                        getter_AddRefs(mPipeOut),
> +                      mozilla::net::nsIOService::gDefaultSegmentSize,

wtf

::: netwerk/base/nsTemporaryFileInputStream.cpp
@@ +12,5 @@
>  using namespace mozilla;
>  using namespace mozilla::ipc;
>  
>  typedef mozilla::ipc::FileDescriptor::PlatformHandleType FileHandleType;
> +using namespace mozilla;

wtf?

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +3153,5 @@
>  
>      gHttpHandler->ConnMgr()->RecvdConnect();
>  
>      CancelBackupTimer();
> +    if (mConnectionUsingFastOpen && mUsingFastOpen) {

comment here about why the primary stream does not call onoutputstreamready when it is using tfo

@@ +3269,5 @@
> +        mConnectionUsingFastOpen->Transaction()->RestartOnFastOpenError();
> +    }
> +    if (mConnectionUsingFastOpen) {
> +        mSocketTransport = nullptr;
> +        mConnectionUsingFastOpen = nullptr;

consider renaming to connectionNegotiatingFastOpen

::: netwerk/protocol/http/nsHttpHandler.cpp
@@ +1723,5 @@
> +    if (PREF_CHANGED(TCP_FAST_OPEN_FAILURE_LIMIT)) {
> +        rv = prefs->GetIntPref(TCP_FAST_OPEN_FAILURE_LIMIT, &val);
> +        if (NS_SUCCEEDED(rv)) {
> +            if (val < 0) {
> +                val =0;

whitespace
Comment on attachment 8814871 [details] [diff] [review]
bug_1188435_v3.patch

see comments on the rebased patch
Attachment #8814871 - Flags: review?(mcmanus)
Comment on attachment 8852599 [details] [diff] [review]
0001-rebased-patch.patch

Review of attachment 8852599 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/TCPFastOpenLayer.cpp
@@ +78,5 @@
> +    PR_SetError(PR_WOULD_BLOCK_ERROR, 0);
> +    return -1;
> +  case TCPFastOpenSecret::WAITING_FOR_FIRST_SEND:
> +    {
> +      PRInt32 rv = (fd->lower->methods->sendto)(fd->lower, buf, amount, flags,

need to check sendto() to see if it is the invalid ptr in case someone is using an nspr different than what we are updating.. maybe check that in the startup routine rather than here?
(Assignee)

Comment 47

2 years ago
Posted patch bug_1188435_v3.patch (obsolete) — Splinter Review
rebased multiple times :)
We all work on the same file this days.
Attachment #8814871 - Attachment is obsolete: true
Attachment #8852599 - Attachment is obsolete: true
Attachment #8854957 - Flags: review+
(Assignee)

Comment 48

2 years ago
Posted patch bug_1188435_v3.patch (obsolete) — Splinter Review
Attachment #8854957 - Attachment is obsolete: true
Attachment #8854991 - Flags: review+
(Assignee)

Comment 49

2 years ago
Posted patch bug_1188435_part2.patch (obsolete) — Splinter Review
Attachment #8814884 - Attachment is obsolete: true
Attachment #8854992 - Flags: review+
(Assignee)

Comment 50

2 years ago
Posted patch bug_1188435_part3.patch (obsolete) — Splinter Review
Make TFO+0RTT+H2 work.

(0RTT+H2 did not work when I first made TFO)
Attachment #8855398 - Flags: review?(mcmanus)
(Assignee)

Comment 51

2 years ago
Posted patch bug_1188435_v3.patch (obsolete) — Splinter Review
just miss-ordered variable initialization.
Attachment #8854991 - Attachment is obsolete: true
Attachment #8855417 - Flags: review+
(Assignee)

Comment 52

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=86757964a8cf1ffd0c0d8edf93ddcc66e4bc7ee8&selectedJob=89500156

We have a problem with test_http2.js, test_imutable.js ... all tests using nodejs.

I have tried to reproduce it locally but no luck at all. The status of failing channel is 0x804B000D which is NS_ERROR_CONNECTION_REFUSED. We explicitly retry without TFO on this error. I have force this error to see if it is going to retry properly and it does. I do not know what is happening, does nodejs server restarts or just crash? From try logs I do not see anything.
the server does not restart, so if it is crashing the retry won't help.

of course that begs the question of why it is crashing - normally it sticks a node stack in stderr and that shows up in the log..

[task 2017-04-07T10:47:29.264139Z] 10:47:29     INFO -  Process stderr
[task 2017-04-07T10:47:29.264344Z] 10:47:29     INFO -  _http_outgoing.js:344
[task 2017-04-07T10:47:29.264656Z] 10:47:29     INFO -      throw new Error('`value` required in setHeader("' + name + '", value).');
[task 2017-04-07T10:47:29.266687Z] 10:47:29     INFO -      ^
[task 2017-04-07T10:47:29.282367Z] 10:47:29     INFO -  Error: `value` required in setHeader("x-client-port", value).
[task 2017-04-07T10:47:29.282811Z] 10:47:29     INFO -      at ServerResponse.OutgoingMessage.setHeader (_http_outgoing.js:344:11)
[task 2017-04-07T10:47:29.285323Z] 10:47:29     INFO -      at Server.handleRequest (/home/worker/workspace/build/tests/xpcshell/moz-http2/moz-http2.js:798:9)
[task 2017-04-07T10:47:29.285755Z] 10:47:29     INFO -      at emitTwo (events.js:87:13)
[task 2017-04-07T10:47:29.288453Z] 10:47:29     INFO -      at Server.emit (events.js:172:7)
[task 2017-04-07T10:47:29.288796Z] 10:47:29     INFO -      at emitTwo (events.js:87:13)
[task 2017-04-07T10:47:29.291425Z] 10:47:29     INFO -      at Server.emit (events.js:172:7)
[task 2017-04-07T10:47:29.292761Z] 10:47:29     INFO -      at HTTPParser.parserOnIncoming [as onIncoming] (_http_server.js:527:12)
[task 2017-04-07T10:47:29.293150Z] 10:47:29     INFO -      at HTTPParser.parserOnHeadersComplete (_http_common.js:88:23)
[task 2017-04-07T10:47:29.293381Z] 10:47:29     INFO -  INFO | Result summary:
[task 2017-04-07T10:47:29.293625Z] 10:47:29     INFO -  INFO | Passed: 379
Comment on attachment 8855398 [details] [diff] [review]
bug_1188435_part3.patch

Review of attachment 8855398 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpConnection.cpp
@@ +2271,3 @@
>          DontReuse();
>          mUsingSpdyVersion = 0;
> +        mSpdySession->Finish0RTT(true, true);

I would feel better if you nullchecked mSpdySession
Attachment #8855398 - Flags: review?(mcmanus) → review+
(Assignee)

Comment 55

2 years ago
The test that is failing is: netwerk/test/unit/test_origin.js
It is failing even if I disable fast open. I am investigating...
thought - we fail joinConnection() if the handshake isn't complete.. test_origin contains both ORIGIN and 7540 coalescing tests.

I don't think that ORIGIN tests would fail, because you can't consume a h2 level byte from the server without the handshake being complete, right?

But plausibly some 7540 case could fail.. just brainstorming
oh - but you said it failed even if tfo was disabled :(
(Assignee)

Comment 58

2 years ago
it even failed with only nspr part, so now I have pulled the latest m.-c. and i am doing a clobber build.
This is only local testing, but the error is the same:
 0:00.73 LOG: MainThread INFO _http_outgoing.js:333

 0:00.73 LOG: MainThread INFO     throw new Error('`value` required in setHeader("' + name + '", value).');
(Assignee)

Comment 59

2 years ago
Posted patch bug_1188435_part4.patch (obsolete) — Splinter Review
Attachment #8856987 - Flags: review?(mcmanus)
Comment on attachment 8856987 [details] [diff] [review]
bug_1188435_part4.patch

Review of attachment 8856987 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/TCPFastOpenLayer.cpp
@@ +370,5 @@
>        result = PR_IS_CONNECTED_ERROR;
>      } else {
>        result = PR_GetError();
>      }
> +    if (tfoFd->lower->methods->sendto == (PRSendtoFN)tfoFd->lower->methods->reserved_fn_0) {

log and comment!
Attachment #8856987 - Flags: review?(mcmanus) → review+
(Assignee)

Comment 61

2 years ago
(In reply to Dragana Damjanovic [:dragana] from comment #58)
> it even failed with only nspr part, so now I have pulled the latest m.-c.
> and i am doing a clobber build.
> This is only local testing, but the error is the same:
>  0:00.73 LOG: MainThread INFO _http_outgoing.js:333
> 
>  0:00.73 LOG: MainThread INFO     throw new Error('`value` required in
> setHeader("' + name + '", value).');

I updated nodejs not it is working properly alpn was missing.

I have pushed it to try to see if it is working.
(Assignee)

Comment 62

2 years ago
Posted patch bug_1188435_part5.patch (obsolete) — Splinter Review
This patch fixes test that were failing. It fixes http2 in case of net-reset (ssl parameter were not set on the new socket).

I have tested this: 1) it succeeds, 2) gets net-reset and 3) times out.

I have tested it with h1 and h2 and h2-0rtt.
Attachment #8857609 - Flags: review?(mcmanus)
(Assignee)

Comment 63

2 years ago
and the last test: h1+0RTT
Attachment #8857609 - Attachment is obsolete: true
Attachment #8857609 - Flags: review?(mcmanus)
Attachment #8857665 - Flags: review?(mcmanus)
Attachment #8857665 - Flags: review?(mcmanus) → review+
(Assignee)

Updated

2 years ago
Blocks: 1352273, 1352274
(Assignee)

Comment 65

2 years ago
(In reply to Dragana Damjanovic [:dragana] from comment #64)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=3d49dc8ed0f51dfb832dcd9bcb6c2a5d4cc113b8
> 
> this is a try run it this bug and its followup bugs.

The failing windows test is an error in bug 1310197 that has been fixed. (Bug 1310197 is already in m.-c.).

This is a try run with fix for bug 1310197 and this bug and it followups:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b86f10b9f5fc78e5387c2f51ceaf61910dd29518
(Assignee)

Comment 66

2 years ago
Posted patch bug_1188435_v3.patch (obsolete) — Splinter Review
Rebased.
Attachment #8855417 - Attachment is obsolete: true
Attachment #8861388 - Flags: review+
(Assignee)

Comment 67

2 years ago
Posted patch bug_1188435_part2.patch (obsolete) — Splinter Review
Rebased
Attachment #8854992 - Attachment is obsolete: true
Attachment #8861390 - Flags: review+
(Assignee)

Comment 68

2 years ago
Rebased and address a comment.
Attachment #8855398 - Attachment is obsolete: true
Attachment #8861392 - Flags: review+
(Assignee)

Comment 69

2 years ago
Added a comment and a log message.
Attachment #8856987 - Attachment is obsolete: true
Attachment #8861394 - Flags: review+

Comment 70

2 years ago
Pushed by dd.mozilla@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5a9277de75b3
Implement fast open necko part. r=mcmanus
https://hg.mozilla.org/integration/mozilla-inbound/rev/3c71363497a8
Make TLS 0RTT work with TLS Fast Open. r=mcmanus
https://hg.mozilla.org/integration/mozilla-inbound/rev/ee570e804c7f
Make TFO work with 0RTT and H2 work. r=mcmanus
https://hg.mozilla.org/integration/mozilla-inbound/rev/772917804aa4
Check if sendto is implemented. r=mcmanus
https://hg.mozilla.org/integration/mozilla-inbound/rev/1426ecc8209d
Fix http2 restart on net-reset and fix activeConn count. r=mcmanus
backed out for test failures in https://treeherder.mozilla.org/logviewer.html#?job_id=94051101&repo=mozilla-inbound
Flags: needinfo?(dd.mozilla)

Comment 72

2 years ago
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/40ebe09b4678
Backed out changeset 1426ecc8209d 
https://hg.mozilla.org/integration/mozilla-inbound/rev/8dee197fe7c1
Backed out changeset 772917804aa4 
https://hg.mozilla.org/integration/mozilla-inbound/rev/568c8aec3caa
Backed out changeset ee570e804c7f 
https://hg.mozilla.org/integration/mozilla-inbound/rev/f59a72936f72
Backed out changeset 3c71363497a8 
https://hg.mozilla.org/integration/mozilla-inbound/rev/fbac633e3a8f
Backed out changeset 5a9277de75b3 for test failures in test_devtools_serviceworker_interception.html | undefined assertion name
Depends on: 1359938
Depends on: 1359847
(Assignee)

Updated

2 years ago
Blocks: 1360515
(Assignee)

Updated

2 years ago
Flags: needinfo?(dd.mozilla)
(Assignee)

Comment 73

2 years ago
Rebased
Attachment #8861388 - Attachment is obsolete: true
Attachment #8864051 - Flags: review+
(Assignee)

Comment 74

2 years ago
Rebased.
Attachment #8861390 - Attachment is obsolete: true
Attachment #8864052 - Flags: review+

Comment 75

2 years ago
Pushed by dd.mozilla@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/680a2f0d0b26
Implement fast open necko part. r=mcmanus
https://hg.mozilla.org/integration/mozilla-inbound/rev/033337b35af3
Make TLS 0RTT work with TLS Fast Open. r=mcmanus
https://hg.mozilla.org/integration/mozilla-inbound/rev/8adfceab18a5
Make TFO work with 0RTT and H2 work. r=mcmanus
https://hg.mozilla.org/integration/mozilla-inbound/rev/e46c77c61346
Check if sendto is implemented. r=mcmanus
https://hg.mozilla.org/integration/mozilla-inbound/rev/8fc8a0557ca0
Fix http2 restart on net-reset and fix activeConn count. r=mcmanus

Updated

2 years ago
Depends on: 1362959
(Assignee)

Updated

2 years ago
Depends on: 1362984
(Assignee)

Updated

2 years ago
Depends on: 1363034
Hi Patrick, Dragana, we've noticed Nightly crash rates have worsened since this patch landed on Friday. Given Project Dawn, we are monitoring Nightly quality and stability more closely. We'd like to propose a pref-off/backouts in situations like these going forward. Thoughts? 

Is our hunch that this might be to blame for the Nightly crash spike valid?
Flags: needinfo?(mcmanus)
Flags: needinfo?(dd.mozilla)
(Assignee)

Comment 78

2 years ago
(In reply to Ritu Kothari (:ritu) from comment #77)
> Hi Patrick, Dragana, we've noticed Nightly crash rates have worsened since
> this patch landed on Friday. Given Project Dawn, we are monitoring Nightly
> quality and stability more closely. We'd like to propose a pref-off/backouts
> in situations like these going forward. Thoughts? 
> 
> Is our hunch that this might be to blame for the Nightly crash spike valid?

Yes, this patch is to blame for nightly crashes. I am working on fixing them.

There is a pref to turn this feature off, but I would like to have some more time to test it and fix problems. It is not easy to find all problem  if I am the only one running it.
Flags: needinfo?(dd.mozilla)
I think as long as the fix stream is active, the bottleneck here would be more data so I would like to leave this live. This feature really requires a lot of diverse networks to get feedback on. And the fix stream is quite active.

I do agree that if we aren't making daily progress, or are facing a beta deadline, it should be pref'd off. I would expect to be down to a tail of issues shortly.
Flags: needinfo?(mcmanus)
Thanks Dragana, Patrick. I like that idea and understand your reasoning for leaving it on. If we don't feel there is enough progress made, we can come back to the idea of turning this off.

Updated

2 years ago
Depends on: 1363421
Depends on: 1363798
Duplicate of this bug: 1360332
Duplicate of this bug: 1363376
Duplicate of this bug: 1363798
Honza, why are you duping crashes to this bug?  They'll still need fixing before re-enabling the pref, so marking them duplicates seems rather weird?
Flags: needinfo?(honzab.moz)
(In reply to Julien Cristau [:jcristau] from comment #84)
> Honza, why are you duping crashes to this bug?  They'll still need fixing
> before re-enabling the pref, so marking them duplicates seems rather weird?

Probably because I thought this bug is already reopened, backed out and that the crashes cannot be avoided just by disabling it.  But I first had to make sure, right.

Reopening and making sure those are blocking.  Thanks.
Flags: needinfo?(honzab.moz)
Depends on: 1360332
Depends on: 1363376
(Assignee)

Updated

2 years ago
Depends on: 1364371
(Assignee)

Updated

2 years ago
Depends on: 1364372
(Assignee)

Updated

2 years ago
Depends on: 1362831
(Assignee)

Updated

2 years ago
Depends on: 1364785
Turned off 5/31 in bug 1367393.
status-firefox55: fixed → disabled
status-firefox56: --- → affected
(Assignee)

Updated

2 years ago
Depends on: 1372897
(Assignee)

Updated

2 years ago
Depends on: 1369509
Depends on: 1381256
status-firefox56: affected → disabled
You need to log in before you can comment on or make changes to this bug.