Closed Bug 1486046 Opened 6 years ago Closed 6 years ago

Allow restart of pending h2 streams on sudden net-reset of the (used) connection

Categories

(Core :: Networking: HTTP, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox63 --- affected

People

(Reporter: mayhemer, Unassigned)

References

Details

It happened few times that I navigated to an arbitrary h2 served page and all its css was broken - not loaded.

Log examination shows that for some streams short after we have sent their request data (the HEADERS frame) the connection has been closed with NET_RESET.  Some streams didn't even have a chance to live that long to send their requests.

All of them (all GETs) are closed with ABORT instead of NET_RESET because of the code at [1] called from [2].

For h1 we do restart transactions that are GET and didn't receive data on connection close with NET_RESET [3].  We should IMO do the same for h2.

Nick, what do you think?

[1] https://searchfox.org/mozilla-central/rev/f2ac80ab7dbde5400a3400d463e07331194dec94/netwerk/protocol/http/Http2Session.cpp#170
[2] https://searchfox.org/mozilla-central/rev/f2ac80ab7dbde5400a3400d463e07331194dec94/netwerk/protocol/http/Http2Session.cpp#3694
[3] https://searchfox.org/mozilla-central/rev/c45b9755593ce6cc91f558b21e544f5165752de7/netwerk/protocol/http/nsHttpTransaction.cpp#1133
Flags: needinfo?(hurley)
That would not be compliant with 7540 - see https://tools.ietf.org/html/rfc7540#section-5.4.3

Now, admittedly, the section linked to from there for more information (https://tools.ietf.org/html/rfc7540#section-8.1.4) makes it a bit less clear, as that talks specifically about non-idempotent requests. A strict reading of section 5.4.3, though, includes idempotent requests in the set of things that can't be retried. (It's also fun because, as we've found out through the predictor, even GETs can affect server-side state in some cases, and therefore not be strictly idempotent. The real world sucks.)

I'm inclined to go with a strict reading of 5.4.3 here unless this becomes a major issue in a lot of places. Behaviour like this can happen with h1, too, so it's not like this is an entirely new failure mode that is h2-specific, it's just that h2 has expanded (in a few cases) the conditions that may make this happen.

Another thing to take a look at is *why* we're getting NET_RESET - are we closing the connection, or is the server closing the connection? Whoever is closing the connection should be sending a GOAWAY frame, so if the server is shutting down the connection without doing so, that points to a server bug, and we should reach out to whoever is responsible. If we're shutting down without sending GOAWAY, that's an us-bug, and we should fix it :) If there's a middlebox shutting down the connection then, well, I hate middleboxes.
Flags: needinfo?(hurley)
Thanks Nick!  The log I am looking at is from Apr 2018 and there is no nsSocketTransport logging.  From what I can see on nsHttp module it seems like the server sent a part of a response on one of the streams and then sent a RST immediately after.  I don't know how to easily find out if GOAWAY has been sent by the server to us, but it's definitely not logged between the last dataframe and the reset.  I can only see we do send it from our side to the server before flushing and closing the conn (surprisingly).

I can see this happen only occasionally, but when it does the page is pretty broken.  F5 helps, obviously.

Based on your comment I'm closing this as WONTFIX with a good rationale.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Yeah, that sounds like a broken server if it looks like there's a partial response. You'd see RecvGoAway in the log if the server sent us one.

And yes, we do generate a GOAWAY even in cases when it won't ever go anywhere. A bit silly perhaps, but harmless (and proof that we're exceedingly well-behaved and polite! :-D)
(In reply to Nicholas Hurley [:nwgh][:hurley] from comment #3)
> Yeah, that sounds like a broken server if it looks like there's a partial
> response. You'd see RecvGoAway in the log if the server sent us one.

No RecvGoAway in for the session, so it was a server bug.
You need to log in before you can comment on or make changes to this bug.