Closed Bug 1386746 Opened 3 years ago Closed 2 years ago

HTTP response throttling: experiment with limiting the amount of data we read during the short don't-block-read period

Categories

(Core :: Networking: HTTP, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: mayhemer, Assigned: mayhemer)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file, 6 obsolete files)

I'm afraid that throttling doesn't have a quick short-time impact (what we mainly want from it).  The socket pipe we use is .75MB.  During the 100ms interval we allow socket reading it's very likely we read it all out.  The server can then send .75MB of data which will fill the pipe again.  Question is how fast and how much bandwidth this will actually eat.

Hence, when a transaction is allowed to read for those 100ms, it should also read only a limited amount of the pipe-buffered data.  Then, it should stop reading again and wait for the wake-up signal after 900ms.
P3 for now, but when we have tools to assess and verify any impact of this patching, I may raise it again.
Priority: -- → P3
Whiteboard: [necko-backlog]
If there is time for 57, this would be great to experiment with.

The throttling now works as:
- all active transactions are tracked
- there is a set of conditions we evaluate at [1] that say 'throttle or not' for each transaction that read data
- then, we have an altering timer firing in 900ms and 100ms that switches nsHttpConnectionMgr::mThrottlingInhibitsReading
- during the 900ms interval mThrottlingInhibitsReading is TRUE and we don't allow transactions that evaluate at [1] as 'do throttle' to read data from its socket
- when the 100ms interval starts we switch mThrottlingInhibitsReading to FALSE and wake all previously stopped transactions and allow them now to read (we don't even evaluate the conditions in [1] during this period)
- when the 900ms interval starts, we again switch mThrottlingInhibitsReading to TRUE and the cycle repeats

During the 100ms interval when we allow reading (regardless if the transaction should or should not be throttled) of any amount of data from the socket.  That should IMO be limited, though.

The implementation should be following: when we are in the "don't stop reading" 100ms interval (mThrottlingInhibitsReading == FALSE) while a transaction falls to the 'stop reading' classification (i.e. when [1] would say 'do stop reading' when mThrottlingInhibitsReading == TRUE), the amount of data allowed to read from the socket has to be limited.  The limit must be preferable and killable.  Let's start with reading only 16 or 32k and see what effect it has.  We could then also shorten the intervals so that recovery may be a bit faster.


[1] https://dxr.mozilla.org/mozilla-central/rev/db7f19e26e571ae1dd309f5d2f387b06ba670c30/netwerk/protocol/http/nsHttpConnectionMgr.cpp#3153
Priority: P3 → P2
Assignee: nobody → amchung
Whiteboard: [necko-backlog] → [necko-active]
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P2 → P1
Priority: P1 → P2
Whiteboard: [necko-active]
Attached patch implementation (obsolete) — Splinter Review
Hi Honza,
I have finished to test the limit of throttle.resume-for and throttle.suspend-for, and the testing steps as below:
1. Found the script which is set to throttlable.
2. Changed throttle.resume-for and throttle.suspend-for.
3. Loaded to cnn.com
4. Used the script which can verify the loading performance to confirm performance.

[Testing result]
+-----+---------------------+----------------------+------------------------------------------------+------------+
|     | throttle.resume-for | throttle.suspend-for | The script set to throttlable                  | total time |
+-----+                     +                      +------------------------------------------------+            +
|     |                     |                      | http://cdn.krxd.net/controltag?confid=ITb_4eqO |            |
+-----+---------------------+----------------------+------------------------------------------------+------------+
| 0k  | 25                  | 975                  | 1238                                           | 10938      |
+-----+                     +                      +------------------------------------------------+------------+
| 16k |                     |                      | 1342                                           | 10357      |
+-----+                     +                      +------------------------------------------------+------------+
| 32k |                     |                      | 1261                                           | 8918       |
+-----+---------------------+----------------------+------------------------------------------------+------------+
| 0k  | 50                  | 950                  | 1410                                           | 10685      |
+-----+                     +                      +------------------------------------------------+------------+
| 16k |                     |                      | 1302                                           | 11815      |
+-----+                     +                      +------------------------------------------------+------------+
| 32k |                     |                      | 1408                                           | 9063       |
+-----+---------------------+----------------------+------------------------------------------------+------------+
| 0k  | 75                  | 925                  | 1355                                           | 10353      |
+-----+                     +                      +------------------------------------------------+------------+
| 16k |                     |                      | 1290                                           | 9151       |
+-----+                     +                      +------------------------------------------------+------------+
| 32k |                     |                      | 1344                                           | 8818       |
+-----+---------------------+----------------------+------------------------------------------------+------------+
| 0k  | 100                 | 900                  | 1242                                           | 11212      |
+-----+                     +                      +------------------------------------------------+------------+
| 16k |                     |                      | 1533                                           | 11190      |
+-----+                     +                      +------------------------------------------------+------------+
| 32k |                     |                      | 1645                                           | 11383      |
+-----+---------------------+----------------------+------------------------------------------------+------------+

[Conclusion]
1. When mThrottlingInhibitsReading == TRUE and ShuldStopReading = true, channel still reads data indeed can improve performance.
2. i. throttle.resume-for=25ms and throttle.suspend-for=925, ii. channel reads 32k. the above combinations is the best performance.

Would you give me your suggestions?
Thanks!
Attachment #8914656 - Flags: feedback?(honzab.moz)
Comment on attachment 8914656 [details] [diff] [review]
implementation

Review of attachment 8914656 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you Amy, but it's not what I wanted.  It's actually quite the opposite.

I don't understand how exactly the test was performed - what is exactly purpose of the http://cdn.krxd.net/controltag?confid=ITb_4eqO script?

To be honest, I think I will write a patch faster than explain how exactly I want to fix this.  So I will take the bug and fix it.
Attachment #8914656 - Flags: feedback?(honzab.moz) → feedback-
Assignee: amchung → honzab.moz
Status: NEW → ASSIGNED
Whiteboard: [necko-triaged]
Attached patch wip1 (obsolete) — Splinter Review
Attachment #8914656 - Attachment is obsolete: true
Attached patch v1 (obsolete) — Splinter Review
- instead of alternating 900/100ms interval to allow reading there is a per-transaction read limit of ~8k that is reset by a single timer tick every ~500ms
- condition whether a trans should throttle (former "stop reading", now "limit reading") or not remains unchanged
- when found to obligate response throttling it sets the 8k read limit which when depleted makes the transaction (as before) return WOULD_BLOCK ; only call to ResumeReading by the throttle timer resets the limit and re-enables reading again (similar to how it was before)
- more than half of the patch is a change to preferences and respective member and local vars names:
  - suspend-for and resume-for go away
  - introduced read-limit-bytes (8k) and read-interval-ms (500ms)
  - resume-background-in -> hold-time-ms
  - time-window -> max-time-ms


Tested on my synthetic cdp-tests suite, with 4 downloads we have >50% overall load time and individual resource load time win, with 1 download we have ~20% win.

We could also start tweaking the max-time to be a bit lower, so that download speeds recover sooner after the last important byte for the active page goes through, since this kind of throttling is way more aggressive (and effective) than before.
Attachment #8917082 - Attachment is obsolete: true
Attachment #8917372 - Flags: review?(mcmanus)
thanks for this

(In reply to Honza Bambas (:mayhemer) from comment #7)
> Created attachment 8917372 [details] [diff] [review]
>
> 
> We could also start tweaking the max-time to be a bit lower, so that
> download speeds recover sooner after the last important byte for the active
> page goes through, since this kind of throttling is way more aggressive (and
> effective) than before.

two things before we go ahead.

1] let's do what you suggest above

and 2] please create a more thorough evaluation plan..
Attachment #8917372 - Flags: review?(mcmanus)
few notes from the necko meeting:
- throttle only for some short since the navigation start (active?)
- shorten the background resume delay when throbber is done
(In reply to Patrick McManus [:mcmanus] from comment #9)
> and 2] please create a more thorough evaluation plan..

thanks for the feedback.  I just need to understand what you exactly want me to do about an evaluation plan.  thanks
Attached patch v1 -> v2 wip1 (obsolete) — Splinter Review
(In reply to Honza Bambas (:mayhemer) from comment #11)
> (In reply to Patrick McManus [:mcmanus] from comment #9)
> > and 2] please create a more thorough evaluation plan..
> 
> thanks for the feedback.  I just need to understand what you exactly want me
> to do about an evaluation plan.  thanks

forgot to ni?...
Flags: needinfo?(mcmanus)
(In reply to Honza Bambas (:mayhemer) from comment #11)
> (In reply to Patrick McManus [:mcmanus] from comment #9)
> > and 2] please create a more thorough evaluation plan..
> 
> thanks for the feedback.  I just need to understand what you exactly want me
> to do about an evaluation plan.  thanks

build a convincing story about how we measure this to know
a] is it regressing
b] is it better

lab probably isn't convincing here as we don't know what the real event triggers are.
Flags: needinfo?(mcmanus)
The evaluation plan could be as follows:

This patch should have some small influence on first render time and some larger on overall page load time for content being loaded in tabs that are active all the time between navigation start and load completion while some background http response throttling happens.

We could use a shield pref study here.

Obvious is TIME_TO_NON_BLANK_PAINT_MS which fits the "active all the time" condition.

Then I'd *clone* TIME_TO_LOAD_EVENT_START_MS that will accumulate the same way.  That probe has a good distribution so that it should be easy to spot imp/reg.

Similar option is TIME_TO_DOM_CONTENT_LOADED_START_MS.


Since the situation when something is being throttled in background will happen rarely, I'm a bit afraid that any differences will be lost in the noise.  Thinking of capturing these only when throttling was in effect during the time span (a bit harder to implement).


(Note that TOTAL_CONTENT_PAGE_LOAD_TIME taken from nsDocShell::EndPageLoad is too flat (not sure if that probe should go through some revision..)
Attached patch v2 (obsolete) — Splinter Review
- hold-time-ms and max-time-ms made more independent:
  - hold-time-ms only affects background transactions that are limited reading (including downloads) and is started every time the last transaction for the active tab has finished
  - max-time-ms effects the same set of transactions and is prolonged on every newly activated transaction (for any tab) and on every byte received by an active tab transaction
- when any of the times is in effect, background transactions are kept limited
- this allows keeping background trans throttled when we are loading the active page and there is a number of gaps between requests but resume very quickly when the load is finished

the values are chosen arbitrarily, based on some local testing experience.

suggestions in comment 10, will be, if, implemented in a followup (to be filed) since with such short timeouts I don't think we need them, and I personally turned to be more against them (I simply want the page in front of me to load AFAP, and don't care about the download, which would slow down w/o throttling anyway thanks narrowing the bandwidth by page resource loads - I observe nearly the same speeds but huge page load performance difference!).  other reason is that it's somewhat hard to track the throbber state (changed on content) on the parent process.  and also that this feature simply should be driven by the current bandwidth utilization and not by the page load progress indicator.
Attachment #8917372 - Attachment is obsolete: true
Attachment #8918995 - Attachment is obsolete: true
Attachment #8921147 - Flags: review?(mcmanus)
wrt shield - how can we rework this patch (if at all?) so that there is very little risk unless some set of prefs is changed? Is it enough to just not tweak the timer values unless opted in by the experiment?
Attachment #8921147 - Flags: review?(mcmanus)
Attached patch v3 (obsolete) — Splinter Review
as requested, this has both versions, otherwise the same as before.  default is the new behavior, which is I think correct think to do (I want Nightly test coverage)
Attachment #8921147 - Attachment is obsolete: true
Attachment #8929094 - Flags: review?(mcmanus)
Comment on attachment 8929094 [details] [diff] [review]
v3

Review of attachment 8929094 [details] [diff] [review]:
-----------------------------------------------------------------

Few notes...

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +3166,5 @@
>      Maybe<bool> reversed;
>      reversed.emplace(!aTrans->EligibleForThrottling());
>      RemoveActiveTransaction(aTrans, reversed);
>  
> +    AddActiveTransaction(aTrans);

this swap is a general fix.  we could also land it separately.

@@ +3257,5 @@
> +        // scheduled to be woken after a delay, hence leave them throttled.
> +        inWindow = inWindow || mDelayedResumeReadTimer;
> +    }
> +
> +    return stop && inWindow;

these changes are here only to allow logging of few important decision makers.

@@ +3566,5 @@
> +
> +    if (!mActiveTabUnthrottledTransactionsExist) {
> +        transactions = mActiveTransactions[true].Get(mCurrentTopLevelOuterContentWindowId);
> +    }
> +    mActiveTabTransactionsExist = !!transactions;

code cleanup only, no func change (activeTabIdChanged was always true)

@@ +3591,5 @@
>      }
>  
>      if (!mActiveTransactions[true].IsEmpty()) {
> +        LOG(("  resuming throttled background transactions"));
> +        ResumeReadOf(mActiveTransactions[true]);

this is also a general fix, there is no need to delay resume here, since this is a tab change, not finishing of foreground-tab transactions (for which we want to wait a bit before resume)
Just merged to land after bug 1411632 and I switched the preference to v1, so that landing this patch will have negligible/none effect on performance.  After the study is done, we can switch or adjust.

Patrick, ping on review.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1bccc4198ad8e53260a195bfcff69606e5ee0344
Attachment #8929094 - Attachment is obsolete: true
Attachment #8929094 - Flags: review?(mcmanus)
Attachment #8932449 - Flags: review?(mcmanus)
Comment on attachment 8932449 [details] [diff] [review]
v3 [merged after bug 1411632]

Review of attachment 8932449 [details] [diff] [review]:
-----------------------------------------------------------------

is there a green try run? The v3 one seems to be all cancels (comment 22)
Attachment #8932449 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #25)
> Comment on attachment 8932449 [details] [diff] [review]
> v3 [merged after bug 1411632]
> 
> Review of attachment 8932449 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> is there a green try run? The v3 one seems to be all cancels (comment 22)

no, I'll do it.  thanks.
(In reply to Honza Bambas (:mayhemer) from comment #26)
> no, I'll do it.  thanks.

Or to be exact, comment 24 pushes were on a bad base cset.
green
Keywords: checkin-needed
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0ba571ea1636
Throttle HTTP response by allowing only small amount of data to read periodically, r=mcmanus
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/0ba571ea1636
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
See Also: → 1434388
You need to log in before you can comment on or make changes to this bug.