1280629 - e10s breaks necko back pressure

Reporter

Description

•

9 years ago

normal single process data flow is that the socket thread fills a pipe and the target thread drains the pipe in onDataAvailable. If the pipe fills up the network transaction associated with it also pauses and eventually (after a few buffers in between fill up) the server stops sending data. The major use case for this is channel.Suspend() and that continues to work fine in e10s. However, this also comes into play simply when the channel code is janky and doesn't get to run for a while and drain the pipe that was filled by the socket thread (at least in single process). in e10s if the parent process runs quickly it will consume data out of that pipe and push it into ipdl where it essentially lives in the event queue - that frees the socket thread up to keep downloading more data and keep putting it in the event queue no matter how big the event queue gets - there is no back pressure on the server. If the event queue runs quickly this is not a problem and we can probably live with this for a little while if the only issue is jank (given network timescales for backpressure are pretty big compared to cpu time scales). however quantum dom considers pausing (or severely degrading) run queues for backgrounded tabs/etc. This means the queue can grow unbounded without creating backpressure. In the case of a large download or something that will massively increase buffering requirements. we can fix this with some kind of window/ack across the ipdl channel and it can certainly be fixed before multiple quantum run queues.

Patrick McManus [:mcmanus]

Reporter

Updated

•

9 years ago

Whiteboard: [necko-next][necko-quantum]

Ben Kelly [:bkelly, not reviewing]

Comment 1

•

9 years ago

This will affect fetch body stream as well. I'd like to be able to let js code stop reading the body and apply back pressure back to the necko socket.

Patrick McManus [:mcmanus]

Reporter

Comment 2

•

9 years ago

so if the js calls channel suspend that should work fine for the channel in most cases This bug is more about the lack of backpressure from a stalled event queue (particularly in a multi queue model). Suspending the channel is a possibility in that case too, but its a lot harder imagining how the scheduler knows which channel to suspend :)

Jim Mathies [:jimm]

Updated

•

9 years ago

tracking-e10s: --- → -

Honza Bambas (:mayhemer)

Comment 3

•

8 years ago

Plan for a patch would be: - on the child process keep track of size of the data pending to be processed on the HttpChannelChild event target (main thread or the retarget thread) - on reaching a certain amount suspend the parent channel - on dropping to a certain (smaller :)) amount resume the channel again - don't forget to resume it when the channel is canceled from the child side - send the suspend/resume with hi-prio Notes: size of the pipe on the parent process between a connection and a transaction is 32k x 24 = 0.75MB.

Timothy Nikkel (:tnikkel)

Updated

•

8 years ago

Blocks: 1352642

Patrick McManus [:mcmanus]

Reporter

Comment 5

•

8 years ago

gary is this something you're willing to look at related to https://bugzilla.mozilla.org/show_bug.cgi?id=1355782

Flags: needinfo?(xeonchen)

Gary Chen [:xeonchen]

Comment 6

•

8 years ago

(In reply to Patrick McManus [:mcmanus] from comment #5) > gary is this something you're willing to look at related to > https://bugzilla.mozilla.org/show_bug.cgi?id=1355782 I'm not very familiar with this part right now, but I'm happy to take this :)

Assignee: nobody → xeonchen

Flags: needinfo?(xeonchen)

Honza Bambas (:mayhemer)

Updated

•

8 years ago

Depends on: 1367861

Gary Chen [:xeonchen]

Comment 7

•

8 years ago

Honza, I guess this bug needs to be put on hold until bug 1367861 is resolved?

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Comment 8

•

8 years ago

(In reply to Gary Chen [:xeonchen] (needinfo plz) from comment #7) > Honza, I guess this bug needs to be put on hold until bug 1367861 is > resolved? Yes, it's directly dependent on it.

Flags: needinfo?(honzab.moz)

Firefox Bug Husbandry Bot

Comment 9

•

7 years ago

Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258

Priority: -- → P2

Shian-Yow Wu [:swu]

Comment 10

•

7 years ago

Moving to P3 as this bug depends on bug 1367861 which is P3.

Priority: P2 → P3

Shian-Yow Wu [:swu]

Comment 11

•

7 years ago

Moving back to P2, as bug 1367861 is now prioritized.

Shian-Yow Wu [:swu]

Updated

•

7 years ago

Priority: P3 → P2

Shian-Yow Wu [:swu]

Comment 12

•

7 years ago

:wiwang will work on it.

Assignee: xeonchen → wiwang

Jason Duell

Updated

•

7 years ago

Assignee: wiwang → nobody

Honza Bambas (:mayhemer)

Updated

•

7 years ago

Assignee: nobody → honzab.moz

Honza Bambas (:mayhemer)

Comment 13

•

7 years ago

Jason, since I can't easily find the junior's bz account, assigning to you tentatively. Please reassign for me, thanks.

Assignee: honzab.moz → jduell.mcbugs

Will Wang (ex-moco)

Comment 14

•

7 years ago

Honza, FYI, Junior's account will be re-enabled very soon :) (should be around 7/16)

Bug 1280629 - Part 1: Suspend the http channel if the child process is not able to consume on time 7 years ago Junior [inactive] 46 bytes, text/x-phabricator-request	dragana : review+	Details \| Review
Bug 1280629 - Part 2: telemetry of e10 back pressure suspension rate 7 years ago Junior [inactive] 46 bytes, text/x-phabricator-request	dragana : review+	Details \| Review