1269055 - RFE - auto retry failed POST

Reporter

Description

•

9 years ago

Attached file rfc2616-8.2.4.pl — Details

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 Build ID: 20160309193552 Steps to reproduce: See the attached perl script that starts a simple webserver on port 8080 that will render a page triggering the problem on POSTing via form action. Simply run the script, then browse to http://localhost:8080 and then click the 'Maximum Go' button. You can safely send a TERM signal to the perl process running the script after you are done testing. Actual results: On Firefox 46+ for desktop users the user will see a 'connection reset' page. Expected results: See https://tools.ietf.org/html/rfc2616#section-8.2.4 When requests terminate unexpectedly on requests like those you accomplish with POST form action, the default action should be to automatically retry the request. On Google Chrome the page will reload successfully when using the same provided script.

Thomas A. Baugh

Reporter

Comment 1

•

9 years ago

Note that even though the bug submission form automatically appended the user agent I filed this bug with, etc., the builds I am finding this on are exclusively 46.0 - 49.0a1 (2016-04-29). Version 45 works just fine on these requests.

Loic

Updated

•

9 years ago

Component: Untriaged → Networking: HTTP

Product: Firefox → Core

Loic

Comment 2

•

9 years ago

Do you have a live testcase?

Flags: needinfo?(thomas.baugh)

Thomas A. Baugh

Reporter

Comment 3

•

9 years ago

Are you asking me to run the perl script I attached to the original comment? The provided script should start up a perl HTTP server process on port 8080 that you can immediately browse to and view the problem by clicking on the html page the server process is setup to render. That is the test case. Please see the attachment to the first comment, let me know if there's still something else needed here.

Dragana Damjanovic [:dragana]

Assignee

Comment 4

•

9 years ago

I am tempting to set this as WONTFIX. The POST method is not safe to retry, I am not sure we should retry it automatically. ff45 retries automatically without informing users that a POST method is retried. The retry is happening after a TCP reset but we cannot know if the server actually has consumed our data. The change that cause this mismatch at: http://hg.mozilla.org/mozilla-central/annotate/tip/netwerk/protocol/http/nsHttpTransaction.cpp#l979

Flags: needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Comment 5

•

9 years ago

This is the "double pizza" problem. The most relevant text is probably https://tools.ietf.org/html/rfc7230#section-6.3.1 "6.3.1. Retrying Requests Connections can be closed at any time, with or without intention. Implementations ought to anticipate the need to recover from asynchronous close events. When an inbound connection is closed prematurely, a client MAY open a new connection and automatically retransmit an aborted sequence of requests if all of those requests have idempotent methods (Section 4.2.2 of [RFC7231]). A proxy MUST NOT automatically retry non-idempotent requests. A user agent MUST NOT automatically retry a request with a non- idempotent method unless it has some means to know that the request semantics are actually idempotent, regardless of the method, or some means to detect that the original request was never applied. For example, a user agent that knows (through design or configuration) that a POST request to a given resource is safe can repeat that request automatically. Likewise, a user agent designed specifically to operate on a version control repository might be able to recover from partial failure conditions by checking the target resource revision(s) after a failed connection, reverting or fixing any changes that were partially applied, and then automatically retrying the requests that failed." The text says MUST NOT retry non-idempotent (such as POST) - but then gives lots of wiggle room. And we do use that wiggle room from time to time, as does Chrome. In general we won't retry posts if there are the first transaction on a connection - we might do so later under the presumption that a closed persistent connection is the issue. But again, its interesting that chrome takes on the risk here of resending. I certainly think that's the trend (and booting unqiueness to the application layer completely) What interests me most is that comment 1 suggests a change in behavior between 45 and 46.. that sounds vaguely familiar to me but I can't put my finger on it.

Flags: needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Updated

•

9 years ago

Summary: RFC 2616 8.2.4 not obeyed on POST connection resets → RFE - auto retry failed POST

Dragana Damjanovic [:dragana]

Assignee

Comment 6

•

9 years ago

(In reply to Patrick McManus [:mcmanus] from comment #5) > > What interests me most is that comment 1 suggests a change in behavior > between 45 and 46.. that sounds vaguely familiar to me but I can't put my > finger on it. Bug 1236277 There were no check before and you ask me to add one :)

Patrick McManus [:mcmanus]

Comment 7

•

9 years ago

ah. thanks. that means the reporter must be reusing a pconn somehow. what do you think of adding || !resuedConn back into the mix. It seems compatible

Dragana Damjanovic [:dragana]

Assignee

Comment 8

•

9 years ago

(In reply to Patrick McManus [:mcmanus] from comment #7) > ah. thanks. that means the reporter must be reusing a pconn somehow. > > what do you think of adding || !resuedConn back into the mix. It seems > compatible I will add that, so that in this case behaves as before plus retry if it is safe.

Dragana Damjanovic [:dragana]

Assignee

Updated

•

9 years ago

Assignee: nobody → dd.mozilla

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Whiteboard: [necko-active]

Dragana Damjanovic [:dragana]

Assignee

Updated

•

9 years ago

Blocks: 1236277

Keywords: regression

Dragana Damjanovic [:dragana]

Assignee

Comment 10

•

9 years ago

Attached patch bug_1269055.patch — Details — Splinter Review

Attachment #8748112 - Flags: review?(mcmanus)

Dragana Damjanovic [:dragana]

Assignee

Comment 11

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=261d00589467

Patrick McManus [:mcmanus]

Comment 12

•

9 years ago

Comment on attachment 8748112 [details] [diff] [review] bug_1269055.patch Review of attachment 8748112 [details] [diff] [review]: ----------------------------------------------------------------- uplifts too. thanks!

Attachment #8748112 - Flags: review?(mcmanus) → review+

Dragana Damjanovic [:dragana]

Assignee

Updated

•

9 years ago

Keywords: checkin-needed

Dragana Damjanovic [:dragana]

Assignee

Updated

•

9 years ago

status-firefox46: --- → affected

status-firefox47: --- → affected

status-firefox48: --- → affected

status-firefox49: --- → affected

Dragana Damjanovic [:dragana]

Assignee

Comment 13

•

9 years ago

Comment on attachment 8748112 [details] [diff] [review] bug_1269055.patch Approval Request Comment [Feature/regressing bug #]: regression from bug 1236277. [User impact if declined]: if a connection breaks we use to try again if the connection is reused. We decided not to do it for a not-safe requests (non-idempotent method), like POST (1236277). But some applications rely on firefox retrying also POST. See bug 1269482. This will be rarely triggered. [Describe test coverage new/current, TreeHerder]: there is a test attached in this bug [Risks and why]: very low, reverting the old behavior. [String/UUID change made/needed]: none

Attachment #8748112 - Flags: approval-mozilla-release?

Attachment #8748112 - Flags: approval-mozilla-beta?

Attachment #8748112 - Flags: approval-mozilla-aurora?

Loic

Updated

•

9 years ago

tracking-firefox46: --- → ?

tracking-firefox47: --- → ?

tracking-firefox48: --- → ?

tracking-firefox49: --- → ?

Pulsebot

Comment 14

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/454eb0d09444

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 15

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/454eb0d09444

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox49: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla49

Liz Henry (:lizzard) (relman/hg->git project)

Comment 16

•

9 years ago

Comment on attachment 8748112 [details] [diff] [review] bug_1269055.patch Fix for regression from 46, includes tests.

Attachment #8748112 - Flags: approval-mozilla-beta?

Attachment #8748112 - Flags: approval-mozilla-beta+

Attachment #8748112 - Flags: approval-mozilla-aurora?

Attachment #8748112 - Flags: approval-mozilla-aurora+

Liz Henry (:lizzard) (relman/hg->git project)

Comment 17

•

9 years ago

Tracking for 46+. I can keep an eye on it but it seems unlikely this will be fixed before 47 comes out (June 7th)

tracking-firefox46: ? → +

tracking-firefox47: ? → +

tracking-firefox48: ? → +

tracking-firefox49: ? → +

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 18

•

9 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/314eb11784ea

status-firefox48: affected → fixed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 19

•

9 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/aa9c3d613c8c

status-firefox47: affected → fixed

Patrick McManus [:mcmanus]

Comment 20

•

9 years ago

This impacts nginx pretty severely - it would be nice to put it on release in case there was a ride along

Ondrej Brablc

Comment 22

•

9 years ago

The duplicate bug 1268775 "Cannot establish connection when using HTTP/2 and POST after idle time" gives more information about relation of this bug to nginx releases. It works fine with nginx 1.9.12, and does not work with 1.10 (and bug 1268775 mentions even 1.9.15 does not work). Just adding the comment for people hit by the same problem.

Xidorn Quan [:xidorn] UTC+10

Updated

•

9 years ago

Updated

•

9 years ago

Keywords: dev-doc-needed, site-compat

Kohei Yoshino

Updated

•

9 years ago

Comment 28

•

9 years ago

Liz: Given 7 duplicates and low fix risk, I believe we should fix this ASAP with Firefox 46.0.2. One of the affected sites reported in Bug 1269170 is a very (the most?) popular Flash game in Japan, and many people are complaining about the issue on Twitter.

Flags: needinfo?(lhenry)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 29

•

9 years ago

This was suggested as a ridealong if we had another dot release a couple of weeks ago, but I didn't realize the problem was bigger than one game, until now. Thanks for bringing it back to my attention! I can look at the duplicates and talk about this tomorrow morning. So, this seems related to the april 26th ngnix 1.10 release? Or is it a combination of that and something new we introduced in 46?

Dan Stillman

Comment 30

•

9 years ago

No, this was a major change in POST behavior in Firefox 46, and it can happen with any HTTP server. Among other things, if the browser makes at least two requests to a server, and waits longer than the server's keep-alive timeout (which could easily be only a few seconds or less) between them, if the request after the delay is a POST, it will fail. As far as I know, this is a change from how Firefox has handled POST requests for many years, and quite possibly forever. There's some discussion in comment 5 about what the correct behavior really should be, but given the history of this behavior this is a pretty serious regression in Firefox 46 that would affect many websites.

Mahks

Comment 31

•

9 years ago

I suspect this bug is causing random crashes of gmail as well.

Patrick McManus [:mcmanus]

Comment 33

•

9 years ago

(In reply to Mahks from comment #31) > I suspect this bug is causing random crashes of gmail as well. I'm not sure what you mean by crash - but assuming you mean firefox crashing then that's not a possible impact of this bug. liz, the nginx thing was unfortunate timing as their implementation hit this squarely on the nose.. other implementations are more of an unusual race condition. The behavior restored by the patch is actually non-standard, and in general most well functioning sites don't rely on it.. but clearly a few do.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 34

•

9 years ago

What I would like is some way to measure the impact and how widespread this problem is, but going by the number of duplicate bugs a lot of people are running into it. This may drive a release now rather than waiting for June 7 and Firefox 47. Patrick, if this is from non standard configurations, is there anything you can advise as a workaround for server admins?

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

9 years ago

tracking-firefox46: + → blocking

Flags: needinfo?(lhenry)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 35

•

9 years ago

One more question. People have said that the issue is now fixed in beta 47. Can we reproduce the problem and make sure to test it on a release build? Do we know for sure this is a good fix? Thanks.

Patrick McManus [:mcmanus]

Comment 36

•

9 years ago

this would have been perfect ride along material. oh well. Given that I know dot releases (well all releases) annoy the userbase, I wouldn't cut a release for this at this stage. well designed sites are going to be robust to this anyhow - so if there are a handful of high profile sites with the issue we can evangelize an easy fix for them: "Adjust your server side persistent connection timeout to be about 5 seconds longer than the firefox value. Firefox uses 115 seconds for HTTP/1.x and 180 seconds for HTTP/2" does that help? I can't think of any useful telemetry here.

Patrick McManus [:mcmanus]

Comment 37

•

9 years ago

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #35) > One more question. People have said that the issue is now fixed in beta 47. > Can we reproduce the problem and make sure to test it on a release build? Do > we know for sure this is a good fix? Thanks. a reporter verified the fix over here https://bugzilla.mozilla.org/show_bug.cgi?id=1269170#c59

Liz Henry (:lizzard) (relman/hg->git project)

Comment 38

•

9 years ago

Thank you Patrick, that's super helpful. I'll respond in the other bugs to let people know the workaround. I can also find it in support pages and answer there.

Bob Pullen

Comment 39

•

9 years ago

Lithium community software is subject to this problem. A lot of high profile companies use it. I'll aim to feedback Patrick's comments to the developers - http://www.lithium.com/why-lithium/customer-success/

Kohei Yoshino

Comment 40

•

9 years ago

(In reply to Patrick McManus [:mcmanus] from comment #36) > well designed sites are going to be robust to this anyhow - so if there are > a handful of high profile sites with the issue we can evangelize an easy fix > for them: > > "Adjust your server side persistent connection timeout to be about 5 seconds > longer than the firefox value. Firefox uses 115 seconds for HTTP/1.x and 180 > seconds for HTTP/2" I'll post a site compatibility doc for this shortly.

Kohei Yoshino

Comment 41

•

9 years ago

Here we go: https://www.fxsitecompat.com/en-CA/docs/2016/post-request-fails-on-certain-sites-showing-connection-reset-page/

Severity: normal → major

Keywords: dev-doc-needed → dev-doc-complete, testcase

Kohei Yoshino

Comment 42

•

9 years ago

And tweeted, both in English and Japanese: https://twitter.com/FxSiteCompat/status/733836118987268097 https://twitter.com/FxSiteCompat/status/733838037742956545 https://twitter.com/FxSiteCompat/status/733840279904628736

nick

Comment 43

•

9 years ago

Thanks, this fix works for me! Just want to mention that you have to tune linux TCP/IP stack for large timeouts as well. For example, I had to comment out this setting in sysctl.conf: #net.ipv4.tcp_keepalive_time = 120

MarkRH

Comment 46

•

9 years ago

I wonder if this is the reason I am getting Secure Connection Failed messages on Wordpress.org when taking awhile (more than 3 minutes always) to post a reply. Created a post there: https://wordpress.org/support/topic/this-forum-generating-secure-connection-failed-when-post?replies=45 Doesn't happen with 47beta7. Does with 46.0.1.

Loic

Comment 47

•

9 years ago

(In reply to MarkRH from comment #46) > I wonder if this is the reason I am getting Secure Connection Failed > messages on Wordpress.org when taking awhile (more than 3 minutes always) to > post a reply. Created a post there: > https://wordpress.org/support/topic/this-forum-generating-secure-connection- > failed-when-post?replies=45 > > Doesn't happen with 47beta7. Does with 46.0.1. It's bug 1271301 and probably fixed by this current bug fix.

henrik

Comment 50

•

9 years ago

It's pretty absurd you just assume "Oh it's a duplicate" and close the bug without any reply then do snide remarks like "well designed sites are going to be robust to this anyhow". My bug that I filed, https://bugzilla.mozilla.org/show_bug.cgi?id=1270434 is NOT a timeout issue because it errors out faster than 115s into the request. Which you would know if you'd taken the time to read through the nginx log I posted.

henrik

Comment 51

•

9 years ago

(In reply to henrik from comment #50) > It's pretty absurd you just assume "Oh it's a duplicate" and close the bug > without any reply then do snide remarks like "well designed sites are going > to be robust to this anyhow". > > My bug that I filed, https://bugzilla.mozilla.org/show_bug.cgi?id=1270434 is > NOT a timeout issue because it errors out faster than 115s into the request. > Which you would know if you'd taken the time to read through the nginx log I > posted. Correction, https://bugzilla.mozilla.org/show_bug.cgi?id=1274606 – I'm getting so much e-mail from all the linked bugs now that I lost track.

Patrick McManus [:mcmanus]

Comment 52

•

9 years ago

(In reply to henrik from comment #51) > > Correction, https://bugzilla.mozilla.org/show_bug.cgi?id=1274606 – I'm > getting so much e-mail from all the linked bugs now that I lost track. that bug, especially as logged ff46 and nginx 1.10 and POST, reads exactly like a dup of this bug (which is fixed on 47). The issue isn't dismissed, its simply likely the same one and needs only to be tracked once. comment 2 in that bug asked you to help confirm that by testing beta-47 which you haven't replied to. We're still interested in that even though the bug was marked dup'd - I apologize if there was confusion over that.

henrik

Comment 53

•

9 years ago

I have replied to it. I've just gotten a bit annoyed at the statement above about well designed sites; please show me what I've done wrong and I'll fix it. It's a damn well designed site if I may say so myself and we've even taken care to use the best-practise SSL guidance, staying up to date on even this **** HTTP2 protocol that has caused 2 days downtime for some browsers for us. So don't say it's not a "well designed site". I know you sent a comment and excuse me if I'm trying to be the CEO of the company as well as be your testing staff. I answered your query with another one: how do I test FF on Android if it's in beta?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 54

•

9 years ago

Comment on attachment 8748112 [details] [diff] [review] bug_1269055.patch Clearing the m-r flag as 46 will EOL in a week and this fix is already in Beta47.

Attachment #8748112 - Flags: approval-mozilla-release?

Liz Henry (:lizzard) (relman/hg->git project)

Comment 55

•

9 years ago

Hi henrik. If you would like to test with Firefox for Android (beta) you can download it from the Play Store (search for Firefox Beta). I know this took a while to fix, and some back and forth with site admins. The bugmail can certainly get to be a lot. If you test and there are still problems, you can let us know and we can reopen the closed bug(s) if there's still work to do. Thanks for your help, it is appreciated. Sounds to me like this bug is wontfix for 46 and fixed in 47.

status-firefox46: affected → wontfix

what2use

Comment 56

•

9 years ago

I have tested this on two different computers with Firefox 47 that came out today. The bug is fixed

rfc2616-8.2.4.pl 9 years ago Thomas A. Baugh 1.22 KB, application/x-perl		Details
bug_1269055.patch 9 years ago Dragana Damjanovic [:dragana] 2.31 KB, patch	mcmanus : review+ lizzard : approval-mozilla-aurora+ lizzard : approval-mozilla-beta+	Details \| Diff \| Splinter Review