Closed Bug 1539766 Opened 5 years ago Closed 5 years ago

e10s flow control could cause the canceled channel suspended forever

Categories

(Core :: Networking: HTTP, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- unaffected
firefox66 --- wontfix
firefox67 --- fixed
firefox68 --- fixed

People

(Reporter: kershaw, Assigned: CuveeHsu)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Attachments

(1 file)

When I was investigating the failed tests caused by retargeting (bug 1505493), I found this issue.
Please see the full log at [1] and search the http channel with the address 0x7f14f37ff000 and you can see this channel is not finished (OnStopRequest is not called).

Here is what happened:

  1. The channel is suspended due to flow control [2].
  2. At the same time, the channel is canceled in child process [3].
  3. The channel is canceled at [4].
  4. Since |HttpChannelChild::mCanceled| is true [5], SendBytesRead [6] will not be called.
  5. This means that the channel will not be resumed.

I think this still might happen without retargeting. Maybe it's just the retargeting that make this problem easier to reproduce.

[1] https://taskcluster-artifacts.net/bMAOFkkYRAGIHC67I6MXrQ/0/public/logs/live_backing.log
[2] https://searchfox.org/mozilla-central/rev/3d469329a42644b41e66944d8da22d78d8c0f5ec/netwerk/protocol/http/HttpChannelParent.cpp#1617
[3] https://searchfox.org/mozilla-central/rev/3d469329a42644b41e66944d8da22d78d8c0f5ec/netwerk/protocol/http/HttpChannelChild.cpp#2307
[4] https://searchfox.org/mozilla-central/rev/3d469329a42644b41e66944d8da22d78d8c0f5ec/netwerk/protocol/http/HttpChannelParent.cpp#812
[5] https://searchfox.org/mozilla-central/rev/3d469329a42644b41e66944d8da22d78d8c0f5ec/netwerk/protocol/http/HttpChannelChild.cpp#804
[6] https://searchfox.org/mozilla-central/rev/3d469329a42644b41e66944d8da22d78d8c0f5ec/netwerk/protocol/http/HttpChannelChild.cpp#871

I think this should be a P1 bug. Junior, could you take a look? Thanks.

Flags: needinfo?(juhsu)
Priority: -- → P1

Thanks a lot for finding this, Kershaw!
Of course I'll take a look. Beforehand, I thought the cancel should finish the channel. It looks like it won't.

And thanks for the processed logan log, Honza.

Flags: needinfo?(juhsu)

I think this still might happen without retargeting. Maybe it's just the retargeting that make this problem easier to reproduce.

Yes bug 1515809 already addressed that we have some intermittent without retargeting

Summary: e10s flow control could cause the channel suspended forever → e10s flow control could cause the canceled channel suspended forever
Assignee: nobody → juhsu
Blocks: 1524154
Pushed by juhsu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dcb5b147d6b3
handle the cancel case for e10s bp r=kershaw
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

We plan to nominate this for uplift to beta/release to fix the problem.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment on attachment 9054349 [details]
Bug 1539766 - handle the cancel case for e10s bp

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: Bug 1280629
  • User impact if declined: Users may leaks >1MB when cancel a http request during the suspension triggered by e10s back pressure. The suspension happen frequently when we have high speed internet (especially intranet) and request >1MB file.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Bug 1280629
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): No big logic change
  • String changes made/needed: No
Attachment #9054349 - Flags: approval-mozilla-release?
Attachment #9054349 - Flags: approval-mozilla-beta?
Regressed by: 1280629

Junior, to the question "List of other uplifts needed:' you put Bug 1280629 which is the regressor and landed in 63, did you mean to indicate another bug or was it a copy/paste error and no other uplift is needed for this uplift?

Flags: needinfo?(juhsu)

It’s a paste error. No other bugs for this uplift

Flags: needinfo?(juhsu)

Comment on attachment 9054349 [details]
Bug 1539766 - handle the cancel case for e10s bp

Fix for a major leak, approved for 67 beta 10, thanks.

Attachment #9054349 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED

Selena, you mention uplifting to release. I don't have plans (yet) for another 66 dot release.
Can this fix wait for 67? It looks like potentially a good improvement, but it's not a new problem in 66, so I don't think it needs to be a driver for 66.0.4.

Flags: needinfo?(sdeckelmann)

Comment on attachment 9054349 [details]
Bug 1539766 - handle the cancel case for e10s bp

(Steal the ni?)
Given close to next release, it's fine to not have another dot release for this.

Flags: needinfo?(sdeckelmann)
Attachment #9054349 - Flags: approval-mozilla-release?
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.