Make sure not to retry socketTransaction if nsHttpConnectionMgr cancels it

RESOLVED FIXED in Firefox -esr52

Status

()

defect
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: dragana, Assigned: dragana)

Tracking

55 Branch
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr5255+ fixed, firefox53+ wontfix, firefox54+ fixed, firefox55+ fixed)

Details

(Whiteboard: [necko-active])

Attachments

(3 attachments, 3 obsolete attachments)

Bug 1354796 is fix, but its fix can cause bug 878792 to come back.
To be sure that does not happen we need to make sure that we do not retry socketTransaction if nsHttpConnectionMgr cancels socketTransaction with TIMEOUT.

If socket or the Poll loop does it, the socketTransport should retry.
Whiteboard: [necko-active[
Whiteboard: [necko-active[ → [necko-active]
Blocks: 1354796
Posted patch bug_1364189.patch (obsolete) — Splinter Review
Attachment #8866936 - Flags: review?(mcmanus)
Comment on attachment 8866936 [details] [diff] [review]
bug_1364189.patch

Review of attachment 8866936 [details] [diff] [review]:
-----------------------------------------------------------------

forgive me, but I think I want to reconsider a solution brought up on the call that I steered us away from (my bad):

which is simply removing NS_ERROR_NET_TIMEOUT from the list of whitelisted conditions ("can only recover from these errors") in RecoverFromError().

That's what this patch effectively does for the one path we're looking at, but reading the code I figure if any other codepath manages to do something similar it will have a the same bug. I don't think I fully understood that until I did the review.

I'll r+ this if you still want to go with it, and I'll also r+ an alternate 1 line patch as above. Again, sorry for steering away from that.

::: netwerk/base/nsISocketTransport.idl
@@ +262,5 @@
>                            in long keepaliveRetryInterval);
>  
>      [noscript] void setFastOpenCallback(in TCPFastOpenPtr aFastOpen);
> +
> +    [noscript] void doNotRetryToConnect();

I don't think you need the [noscript]

::: netwerk/base/nsSocketTransport2.cpp
@@ +1699,5 @@
>      SOCKET_LOG(("nsSocketTransport::RecoverFromError [this=%p state=%x cond=%" PRIx32 "]\n",
>                  this, mState, static_cast<uint32_t>(mCondition)));
>  
> +    if (mDoNotRetryToConnect) {
> +        return false;

log

@@ +3490,5 @@
>  
> +NS_IMETHODIMP
> +nsSocketTransport::DoNotRetryToConnect()
> +{
> +    mDoNotRetryToConnect = true;

log
Attachment #8866936 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #3)
> Comment on attachment 8866936 [details] [diff] [review]
> bug_1364189.patch
> 
> Review of attachment 8866936 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> forgive me, but I think I want to reconsider a solution brought up on the
> call that I steered us away from (my bad):
> 
> which is simply removing NS_ERROR_NET_TIMEOUT from the list of whitelisted
> conditions ("can only recover from these errors") in RecoverFromError().
> 
> That's what this patch effectively does for the one path we're looking at,
> but reading the code I figure if any other codepath manages to do something
> similar it will have a the same bug. I don't think I fully understood that
> until I did the review.
> 
> I'll r+ this if you still want to go with it, and I'll also r+ an alternate
> 1 line patch as above. Again, sorry for steering away from that.
> 

This is the only place we call NS_ERROR_NET_TIMEOUT!
On the call I suggested something a bit different: If we call nsSocketTransport::Close with error NS_ERROR_NET_TIMEOUT we should set the flag mDoNotRetryToConnect. That will cover all future calls to Close(NS_ERROR_NET_TIMEOUT);

I would suggest to do this last suggestion from me for uplift and we can do yours to ride the train. What do you think? We always get surprised when we change something in error handling :)
Flags: needinfo?(mcmanus)
(In reply to Dragana Damjanovic [:dragana] from comment #4)
>
> 
> This is the only place we call NS_ERROR_NET_TIMEOUT!

probably, but I actually couldn't convince myself of that.. there were some paths with "rv" being passed along

> On the call I suggested something a bit different: If we call
> nsSocketTransport::Close with error NS_ERROR_NET_TIMEOUT we should set the
> flag mDoNotRetryToConnect. That will cover all future calls to
> Close(NS_ERROR_NET_TIMEOUT);
> 
> I would suggest to do this last suggestion from me for uplift and we can do
> yours to ride the train. What do you think? We always get surprised when we
> change something in error handling :)

if you think that's best. I'm ok with it.
Flags: needinfo?(mcmanus)
Posted patch bug_1364189_v2.patch (obsolete) — Splinter Review
I forgot to add a log.
Attachment #8867675 - Attachment is obsolete: true
Attachment #8867682 - Flags: review?(mcmanus)
Rebased for beta 54.
(In reply to Dragana Damjanovic [:dragana] from comment #9)
> Created attachment 8867683 [details] [diff] [review]
> bug_1364189_beta.patch
> 
> Rebased for beta 54.

Try run for beta patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=83b5cbf6849c636310b969a7e7f9f4eacc7db2ce
[Tracking Requested - why for this release]: This is a sev1 blocker on Fennec for entering a new device into carrier test. If it could make it into 53.0.3 we would have a very happy partner.
If we get a fix landed quickly and it's verified we could think about getting this into 53.0.3. 
I don't have a definite driver for 53.0.3 fennec yet (maybe bug 1362969, but that isn't certain yet). 

Mike, is this something we would test in 55 and 54 first? is it meant to land on 55 at all?
Who can reproduce and verify the problem and do we need any other exploratory testing? I don't have a good picture of the impact on current users, or the risks, and would like to know more there before we plan a fennec dot release. Can they test with beta 54 instead of with 53 for now?
Flags: needinfo?(mozilla)
Yes, I would want to test there first. Need to get this reviewed and landed ASAP. I'll email folks.

As far as impact goes, see:

https://bugzilla.mozilla.org/show_bug.cgi?id=1354796

that bug is the fix we did on central, so it describes the problem there.

And no, this bug is only for beta and release.
Flags: needinfo?(mozilla)
(In reply to Mike Kaply [:mkaply] from comment #13)
> Yes, I would want to test there first. Need to get this reviewed and landed
> ASAP. I'll email folks.
> 
> As far as impact goes, see:
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1354796
> 
> that bug is the fix we did on central, so it describes the problem there.
> 
> And no, this bug is only for beta and release.

to fix problem described in  bug 1354796 we need to land this as well as patch from 1354796 to beta and release. The patch from bug 1354796 resolves the problem and this one resolves a regression that bug 1354796 will cause. The regression is describe in bug 878792.

so we need both bugs in beta and release.
Comment on attachment 8867683 [details] [diff] [review]
bug_1364189_beta.patch

Approval Request Comment
[Feature/Bug causing the regression]: This bug will resolve a regression that bug 1354796 will cause once it lands. The regression is described in bug 878792.
[User impact if declined]: In some situation some sockets will stay open for ever. This will heppen very very rarely but inpact when its heppens is not good.
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: 
[List of other uplifts needed for the feature/fix]: This is only a partial fix that is needed for bug 1354796
[Is the change risky?]: not high, rather low
[Why is the change risky/not risky?]: SocketTransport will retry to connect with another IP address only for 5 errors. Two of them is TIMEOUT(proxy and not proxy TIMEOUT) and other 3 are only called by OS (e.g. UNKNOWN_HOST). This patch only forbids that we retry to connect if socket is closed by necko, and not canceled by OS(if it is cancel by OS we will still retry it). Current code never closes socketTransport with TIMEOUT (or any other 3 errors). When bug 1354796 lands there will be one case when we close socketTransport with TIMEOUT. we do not want to retry it in case it is closed by necko and not os because that call to nsSocketTransport::Close will close also listener to the socket and therefore the socket will never be closed. 
[String changes made/needed]:none
Attachment #8867683 - Flags: approval-mozilla-beta?
Track 54+/55+ as regression (bug 878792) brought back by bug 1354796.
Posted image Clipboard01.jpg (obsolete) —
how come his has not been fixed since firefox 52?
It is on release and causing data loss and no one here seems to bother to fix this?
Attachment #8869127 - Flags: feedback?
Attachment #8869127 - Flags: approval-mozilla-release?
Comment on attachment 8869127 [details]
Clipboard01.jpg

Please don't abuse the approval flags like this.
Attachment #8869127 - Attachment is obsolete: true
Attachment #8869127 - Flags: feedback?
Attachment #8869127 - Flags: approval-mozilla-release?
Comment on attachment 8867683 [details] [diff] [review]
bug_1364189_beta.patch

Fix a regression (bug 878792) brought back by bug 1354796. Beta54+. Should be in 54 beta 10.
Attachment #8867683 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8866936 [details] [diff] [review]
bug_1364189.patch

Review of attachment 8866936 [details] [diff] [review]:
-----------------------------------------------------------------

I believe this patch has been overtaken by another one.. flag it if I'm wrong.
Attachment #8866936 - Flags: review+
Comment on attachment 8866936 [details] [diff] [review]
bug_1364189.patch

yes.
Attachment #8866936 - Attachment is obsolete: true
I'm a bit lost.  Bug 1354796 has landed on m-c, bringing back bug 878792.  But this has landed only on beta.  What is the plan?

Also, bug 1354796 fixes a regression present on ESR52 (same cause).  There are some demands to uplift 1354796 as well to ESR52.  This one would have to go with it.

Any objections against to uplifting this bug to ESR52?
(In reply to Honza Bambas (:mayhemer) from comment #23)
> I'm a bit lost.  Bug 1354796 has landed on m-c, bringing back bug 878792. 
> But this has landed only on beta.  What is the plan?
> 
> Also, bug 1354796 fixes a regression present on ESR52 (same cause).  There
> are some demands to uplift 1354796 as well to ESR52.  This one would have to
> go with it.
> 
> Any objections against to uplifting this bug to ESR52?

this one needs a review for m.-c. The code is different because of TFO. Do you feel comfortable reviewing it?

I think it is safe to uplift it to ESR52.
(In reply to Dragana Damjanovic [:dragana] from comment #24)
> (In reply to Honza Bambas (:mayhemer) from comment #23)
> > I'm a bit lost.  Bug 1354796 has landed on m-c, bringing back bug 878792. 
> > But this has landed only on beta.  What is the plan?
> > 
> > Also, bug 1354796 fixes a regression present on ESR52 (same cause).  There
> > are some demands to uplift 1354796 as well to ESR52.  This one would have to
> > go with it.
> > 
> > Any objections against to uplifting this bug to ESR52?
> 
> this one needs a review for m.-c. The code is different because of TFO. Do
> you feel comfortable reviewing it?

Not really :)  but if there is no one else...

> 
> I think it is safe to uplift it to ESR52.

OK, thanks!
Comment on attachment 8867683 [details] [diff] [review]
bug_1364189_beta.patch

comment 15
Attachment #8867683 - Flags: approval-mozilla-esr52?
I will ping Patrick.
Patrick can you review the m.-c. patch?


(Honza, thanks!)
Flags: needinfo?(mcmanus)
Attachment #8867682 - Flags: review?(mcmanus) → review+
Flags: needinfo?(mcmanus)
Pushed by dd.mozilla@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c73ef4420951
Make sure not to retry socketTransaction if nsHttpConnectionMgr cancels it. r=mcmanus
https://hg.mozilla.org/mozilla-central/rev/c73ef4420951
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
At this point, I don't think we are going to fix this in 53. The 54 release is coming up in mid-June.
seems that the esr patch doesn't apply

merging netwerk/protocol/http/nsHttpConnectionMgr.h
 warning: conflicts while merging netwerk/base/nsSocketTransport2.cpp! (edit, then use 'hg resolve --mark')
warning: conflicts while merging netwerk/base/nsSocketTransport2.h! (edit, then use 'hg resolve --mark')
warning: conflicts while merging netwerk/protocol/http/nsHttpConnectionMgr.cpp! (edit, then use 'hg resolve --mark')
warning: conflicts while merging netwerk/protocol/http/nsHttpConnectionMgr.h! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
Flags: needinfo?(dd.mozilla)
(In reply to Sylvestre Ledru [:sylvestre] from comment #31)
> seems that the esr patch doesn't apply
> 
> merging netwerk/protocol/http/nsHttpConnectionMgr.h
>  warning: conflicts while merging netwerk/base/nsSocketTransport2.cpp!
> (edit, then use 'hg resolve --mark')
> warning: conflicts while merging netwerk/base/nsSocketTransport2.h! (edit,
> then use 'hg resolve --mark')
> warning: conflicts while merging
> netwerk/protocol/http/nsHttpConnectionMgr.cpp! (edit, then use 'hg resolve
> --mark')
> warning: conflicts while merging
> netwerk/protocol/http/nsHttpConnectionMgr.h! (edit, then use 'hg resolve
> --mark')
> abort: unresolved conflicts, can't continue

have used bug_1364189_beta.patch? It looks like you have used the wrong one. You should use bug_1364189_beta.patch, this one does not change anything in nsHttpConnectionMgr.cpp or nsHttpConnectionMgr.h
Flags: needinfo?(dd.mozilla)
This is rebased version for esr52

Also see comment 15.
Attachment #8872366 - Flags: review+
Attachment #8872366 - Flags: approval-mozilla-esr52?
Attachment #8867683 - Flags: approval-mozilla-esr52?
This fix is needed to fix bug 1354796. Since we don't have QA verification on that bug, let's uplift this in ESR52.3. Also see https://bugzilla.mozilla.org/show_bug.cgi?id=1354796#c86
Comment on attachment 8872366 [details] [diff] [review]
bug_1364189_esr52.patch

Bug 1354796 has been verified on 55, let's uplift this to ESR52
Attachment #8872366 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
You need to log in before you can comment on or make changes to this bug.