Closed
Bug 1364189
Opened 8 years ago
Closed 8 years ago
Make sure not to retry socketTransaction if nsHttpConnectionMgr cancels it
Categories
(Core :: Networking: HTTP, defect)
Tracking
()
People
(Reporter: dragana, Assigned: dragana)
References
Details
(Whiteboard: [necko-active])
Attachments
(3 files, 3 obsolete files)
|
8.51 KB,
patch
|
mcmanus
:
review+
|
Details | Diff | Splinter Review |
|
2.86 KB,
patch
|
mcmanus
:
review+
gchang
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
|
2.82 KB,
patch
|
dragana
:
review+
ritu
:
approval-mozilla-esr52+
|
Details | Diff | Splinter Review |
Bug 1354796 is fix, but its fix can cause bug 878792 to come back.
To be sure that does not happen we need to make sure that we do not retry socketTransaction if nsHttpConnectionMgr cancels socketTransaction with TIMEOUT.
If socket or the Poll loop does it, the socketTransport should retry.
Updated•8 years ago
|
Whiteboard: [necko-active[
Updated•8 years ago
|
Whiteboard: [necko-active[ → [necko-active]
| Assignee | ||
Comment 1•8 years ago
|
||
Attachment #8866936 -
Flags: review?(mcmanus)
| Assignee | ||
Comment 2•8 years ago
|
||
Comment 3•8 years ago
|
||
Comment on attachment 8866936 [details] [diff] [review]
bug_1364189.patch
Review of attachment 8866936 [details] [diff] [review]:
-----------------------------------------------------------------
forgive me, but I think I want to reconsider a solution brought up on the call that I steered us away from (my bad):
which is simply removing NS_ERROR_NET_TIMEOUT from the list of whitelisted conditions ("can only recover from these errors") in RecoverFromError().
That's what this patch effectively does for the one path we're looking at, but reading the code I figure if any other codepath manages to do something similar it will have a the same bug. I don't think I fully understood that until I did the review.
I'll r+ this if you still want to go with it, and I'll also r+ an alternate 1 line patch as above. Again, sorry for steering away from that.
::: netwerk/base/nsISocketTransport.idl
@@ +262,5 @@
> in long keepaliveRetryInterval);
>
> [noscript] void setFastOpenCallback(in TCPFastOpenPtr aFastOpen);
> +
> + [noscript] void doNotRetryToConnect();
I don't think you need the [noscript]
::: netwerk/base/nsSocketTransport2.cpp
@@ +1699,5 @@
> SOCKET_LOG(("nsSocketTransport::RecoverFromError [this=%p state=%x cond=%" PRIx32 "]\n",
> this, mState, static_cast<uint32_t>(mCondition)));
>
> + if (mDoNotRetryToConnect) {
> + return false;
log
@@ +3490,5 @@
>
> +NS_IMETHODIMP
> +nsSocketTransport::DoNotRetryToConnect()
> +{
> + mDoNotRetryToConnect = true;
log
Attachment #8866936 -
Flags: review?(mcmanus) → review+
| Assignee | ||
Comment 4•8 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #3)
> Comment on attachment 8866936 [details] [diff] [review]
> bug_1364189.patch
>
> Review of attachment 8866936 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> forgive me, but I think I want to reconsider a solution brought up on the
> call that I steered us away from (my bad):
>
> which is simply removing NS_ERROR_NET_TIMEOUT from the list of whitelisted
> conditions ("can only recover from these errors") in RecoverFromError().
>
> That's what this patch effectively does for the one path we're looking at,
> but reading the code I figure if any other codepath manages to do something
> similar it will have a the same bug. I don't think I fully understood that
> until I did the review.
>
> I'll r+ this if you still want to go with it, and I'll also r+ an alternate
> 1 line patch as above. Again, sorry for steering away from that.
>
This is the only place we call NS_ERROR_NET_TIMEOUT!
On the call I suggested something a bit different: If we call nsSocketTransport::Close with error NS_ERROR_NET_TIMEOUT we should set the flag mDoNotRetryToConnect. That will cover all future calls to Close(NS_ERROR_NET_TIMEOUT);
I would suggest to do this last suggestion from me for uplift and we can do yours to ride the train. What do you think? We always get surprised when we change something in error handling :)
Flags: needinfo?(mcmanus)
Comment 5•8 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #4)
>
>
> This is the only place we call NS_ERROR_NET_TIMEOUT!
probably, but I actually couldn't convince myself of that.. there were some paths with "rv" being passed along
> On the call I suggested something a bit different: If we call
> nsSocketTransport::Close with error NS_ERROR_NET_TIMEOUT we should set the
> flag mDoNotRetryToConnect. That will cover all future calls to
> Close(NS_ERROR_NET_TIMEOUT);
>
> I would suggest to do this last suggestion from me for uplift and we can do
> yours to ride the train. What do you think? We always get surprised when we
> change something in error handling :)
if you think that's best. I'm ok with it.
Flags: needinfo?(mcmanus)
| Assignee | ||
Comment 6•8 years ago
|
||
| Assignee | ||
Comment 7•8 years ago
|
||
| Assignee | ||
Comment 8•8 years ago
|
||
I forgot to add a log.
Attachment #8867675 -
Attachment is obsolete: true
Attachment #8867682 -
Flags: review?(mcmanus)
| Assignee | ||
Comment 9•8 years ago
|
||
Rebased for beta 54.
| Assignee | ||
Comment 10•8 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #9)
> Created attachment 8867683 [details] [diff] [review]
> bug_1364189_beta.patch
>
> Rebased for beta 54.
Try run for beta patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=83b5cbf6849c636310b969a7e7f9f4eacc7db2ce
Comment 11•8 years ago
|
||
[Tracking Requested - why for this release]: This is a sev1 blocker on Fennec for entering a new device into carrier test. If it could make it into 53.0.3 we would have a very happy partner.
tracking-firefox53:
--- → ?
Comment 12•8 years ago
|
||
If we get a fix landed quickly and it's verified we could think about getting this into 53.0.3.
I don't have a definite driver for 53.0.3 fennec yet (maybe bug 1362969, but that isn't certain yet).
Mike, is this something we would test in 55 and 54 first? is it meant to land on 55 at all?
Who can reproduce and verify the problem and do we need any other exploratory testing? I don't have a good picture of the impact on current users, or the risks, and would like to know more there before we plan a fennec dot release. Can they test with beta 54 instead of with 53 for now?
status-firefox53:
--- → affected
status-firefox54:
--- → ?
status-firefox55:
--- → ?
tracking-firefox54:
--- → ?
tracking-firefox55:
--- → ?
Flags: needinfo?(mozilla)
Comment 13•8 years ago
|
||
Yes, I would want to test there first. Need to get this reviewed and landed ASAP. I'll email folks.
As far as impact goes, see:
https://bugzilla.mozilla.org/show_bug.cgi?id=1354796
that bug is the fix we did on central, so it describes the problem there.
And no, this bug is only for beta and release.
Flags: needinfo?(mozilla)
Updated•8 years ago
|
Attachment #8867683 -
Flags: review+
| Assignee | ||
Comment 14•8 years ago
|
||
(In reply to Mike Kaply [:mkaply] from comment #13)
> Yes, I would want to test there first. Need to get this reviewed and landed
> ASAP. I'll email folks.
>
> As far as impact goes, see:
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=1354796
>
> that bug is the fix we did on central, so it describes the problem there.
>
> And no, this bug is only for beta and release.
to fix problem described in bug 1354796 we need to land this as well as patch from 1354796 to beta and release. The patch from bug 1354796 resolves the problem and this one resolves a regression that bug 1354796 will cause. The regression is describe in bug 878792.
so we need both bugs in beta and release.
| Assignee | ||
Comment 15•8 years ago
|
||
Comment on attachment 8867683 [details] [diff] [review]
bug_1364189_beta.patch
Approval Request Comment
[Feature/Bug causing the regression]: This bug will resolve a regression that bug 1354796 will cause once it lands. The regression is described in bug 878792.
[User impact if declined]: In some situation some sockets will stay open for ever. This will heppen very very rarely but inpact when its heppens is not good.
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]:
[List of other uplifts needed for the feature/fix]: This is only a partial fix that is needed for bug 1354796
[Is the change risky?]: not high, rather low
[Why is the change risky/not risky?]: SocketTransport will retry to connect with another IP address only for 5 errors. Two of them is TIMEOUT(proxy and not proxy TIMEOUT) and other 3 are only called by OS (e.g. UNKNOWN_HOST). This patch only forbids that we retry to connect if socket is closed by necko, and not canceled by OS(if it is cancel by OS we will still retry it). Current code never closes socketTransport with TIMEOUT (or any other 3 errors). When bug 1354796 lands there will be one case when we close socketTransport with TIMEOUT. we do not want to retry it in case it is closed by necko and not os because that call to nsSocketTransport::Close will close also listener to the socket and therefore the socket will never be closed.
[String changes made/needed]:none
Attachment #8867683 -
Flags: approval-mozilla-beta?
Comment 17•8 years ago
|
||
how come his has not been fixed since firefox 52?
It is on release and causing data loss and no one here seems to bother to fix this?
Attachment #8869127 -
Flags: feedback?
Attachment #8869127 -
Flags: approval-mozilla-release?
Comment 18•8 years ago
|
||
Comment on attachment 8869127 [details]
Clipboard01.jpg
Please don't abuse the approval flags like this.
Attachment #8869127 -
Attachment is obsolete: true
Attachment #8869127 -
Flags: feedback?
Attachment #8869127 -
Flags: approval-mozilla-release?
Comment 19•8 years ago
|
||
Comment on attachment 8867683 [details] [diff] [review]
bug_1364189_beta.patch
Fix a regression (bug 878792) brought back by bug 1354796. Beta54+. Should be in 54 beta 10.
Attachment #8867683 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 20•8 years ago
|
||
Comment on attachment 8866936 [details] [diff] [review]
bug_1364189.patch
Review of attachment 8866936 [details] [diff] [review]:
-----------------------------------------------------------------
I believe this patch has been overtaken by another one.. flag it if I'm wrong.
Attachment #8866936 -
Flags: review+
| Assignee | ||
Comment 21•8 years ago
|
||
Comment on attachment 8866936 [details] [diff] [review]
bug_1364189.patch
yes.
Attachment #8866936 -
Attachment is obsolete: true
Comment 22•8 years ago
|
||
| bugherder uplift | ||
Comment 23•8 years ago
|
||
I'm a bit lost. Bug 1354796 has landed on m-c, bringing back bug 878792. But this has landed only on beta. What is the plan?
Also, bug 1354796 fixes a regression present on ESR52 (same cause). There are some demands to uplift 1354796 as well to ESR52. This one would have to go with it.
Any objections against to uplifting this bug to ESR52?
| Assignee | ||
Comment 24•8 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #23)
> I'm a bit lost. Bug 1354796 has landed on m-c, bringing back bug 878792.
> But this has landed only on beta. What is the plan?
>
> Also, bug 1354796 fixes a regression present on ESR52 (same cause). There
> are some demands to uplift 1354796 as well to ESR52. This one would have to
> go with it.
>
> Any objections against to uplifting this bug to ESR52?
this one needs a review for m.-c. The code is different because of TFO. Do you feel comfortable reviewing it?
I think it is safe to uplift it to ESR52.
Comment 25•8 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #24)
> (In reply to Honza Bambas (:mayhemer) from comment #23)
> > I'm a bit lost. Bug 1354796 has landed on m-c, bringing back bug 878792.
> > But this has landed only on beta. What is the plan?
> >
> > Also, bug 1354796 fixes a regression present on ESR52 (same cause). There
> > are some demands to uplift 1354796 as well to ESR52. This one would have to
> > go with it.
> >
> > Any objections against to uplifting this bug to ESR52?
>
> this one needs a review for m.-c. The code is different because of TFO. Do
> you feel comfortable reviewing it?
Not really :) but if there is no one else...
>
> I think it is safe to uplift it to ESR52.
OK, thanks!
Comment 26•8 years ago
|
||
Attachment #8867683 -
Flags: approval-mozilla-esr52?
| Assignee | ||
Comment 27•8 years ago
|
||
I will ping Patrick.
Patrick can you review the m.-c. patch?
(Honza, thanks!)
Flags: needinfo?(mcmanus)
Updated•8 years ago
|
Attachment #8867682 -
Flags: review?(mcmanus) → review+
Updated•8 years ago
|
Flags: needinfo?(mcmanus)
Comment 28•8 years ago
|
||
Pushed by dd.mozilla@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c73ef4420951
Make sure not to retry socketTransaction if nsHttpConnectionMgr cancels it. r=mcmanus
Comment 29•8 years ago
|
||
| bugherder | ||
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 30•8 years ago
|
||
At this point, I don't think we are going to fix this in 53. The 54 release is coming up in mid-June.
status-firefox-esr52:
--- → affected
tracking-firefox-esr52:
--- → ?
Comment 31•8 years ago
|
||
seems that the esr patch doesn't apply
merging netwerk/protocol/http/nsHttpConnectionMgr.h
warning: conflicts while merging netwerk/base/nsSocketTransport2.cpp! (edit, then use 'hg resolve --mark')
warning: conflicts while merging netwerk/base/nsSocketTransport2.h! (edit, then use 'hg resolve --mark')
warning: conflicts while merging netwerk/protocol/http/nsHttpConnectionMgr.cpp! (edit, then use 'hg resolve --mark')
warning: conflicts while merging netwerk/protocol/http/nsHttpConnectionMgr.h! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
Flags: needinfo?(dd.mozilla)
| Assignee | ||
Comment 32•8 years ago
|
||
(In reply to Sylvestre Ledru [:sylvestre] from comment #31)
> seems that the esr patch doesn't apply
>
> merging netwerk/protocol/http/nsHttpConnectionMgr.h
> warning: conflicts while merging netwerk/base/nsSocketTransport2.cpp!
> (edit, then use 'hg resolve --mark')
> warning: conflicts while merging netwerk/base/nsSocketTransport2.h! (edit,
> then use 'hg resolve --mark')
> warning: conflicts while merging
> netwerk/protocol/http/nsHttpConnectionMgr.cpp! (edit, then use 'hg resolve
> --mark')
> warning: conflicts while merging
> netwerk/protocol/http/nsHttpConnectionMgr.h! (edit, then use 'hg resolve
> --mark')
> abort: unresolved conflicts, can't continue
have used bug_1364189_beta.patch? It looks like you have used the wrong one. You should use bug_1364189_beta.patch, this one does not change anything in nsHttpConnectionMgr.cpp or nsHttpConnectionMgr.h
Flags: needinfo?(dd.mozilla)
| Assignee | ||
Comment 33•8 years ago
|
||
This is rebased version for esr52
Also see comment 15.
Attachment #8872366 -
Flags: review+
Attachment #8872366 -
Flags: approval-mozilla-esr52?
| Assignee | ||
Updated•8 years ago
|
Attachment #8867683 -
Flags: approval-mozilla-esr52?
This fix is needed to fix bug 1354796. Since we don't have QA verification on that bug, let's uplift this in ESR52.3. Also see https://bugzilla.mozilla.org/show_bug.cgi?id=1354796#c86
Comment on attachment 8872366 [details] [diff] [review]
bug_1364189_esr52.patch
Bug 1354796 has been verified on 55, let's uplift this to ESR52
Attachment #8872366 -
Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Comment 36•8 years ago
|
||
| bugherder uplift | ||
You need to log in
before you can comment on or make changes to this bug.
Description
•