Implement HTTP transaction on-demand throttling

RESOLVED FIXED in Firefox 54

Status

()

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: mayhemer, Assigned: u408661)

Tracking

(Blocks: 1 bug)

Trunk
mozilla54
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 wontfix, firefox54 fixed)

Details

(Whiteboard: [necko-active])

Attachments

(2 attachments, 3 obsolete attachments)

(Reporter)

Description

2 years ago
We want on certain signal throttle certain channels (http transactions).  Throttling means to Suspend() those transactions for short amounts of times to slow reading (let the TCP receive window fill up, stop the server sending data and get that way more bandwidth for priority loads.)

"Certain signal" is time between navigation start (page load start) and some later time we either finish the page load or find the page being stall for some time (so we can restore bandwidth for lower priority transactions).

"Certain channels" will be all channels marked as throttle-able (mainly downloads) but in the future we can pick them based on more complex conditions ("C" slot non-hot origins).
(Assignee)

Updated

2 years ago
Whiteboard: [necko-next]
(Reporter)

Comment 1

2 years ago
Nick, this sounds like something for you, would you be OK to take this bug?
Assignee: nobody → hurley
Whiteboard: [necko-next] → [necko-active]
(Assignee)

Comment 2

2 years ago
Sure, let me make sure I have a proper understanding here. Someone, somewhere will mark channels as throttleable (we'll assume for the purposes of this comment that this happens by magic). Then, every time *any* page load starts (not just a page load using the channels marked throttleable), we will Suspend() those channels.

After that, either when the page load is finished, or when some magic signal comes in that the page is stalled (is my code responsible for determining when to send this signal?), we Resume() the channel. (Should we Resume() multiple times when receiving the stalled signal, to get rid of any stacked Suspend()s from multiple concurrent page loads and allow data to flow immediately?)

I know this is probably (hopefully!) just re-stating what you said in my own words, but better to make sure I understand now than write the wrong patch :)
Flags: needinfo?(honzab.moz)
(Reporter)

Comment 3

2 years ago
(In reply to Nicholas Hurley [:nwgh][:hurley] from comment #2)
> Sure, let me make sure I have a proper understanding here. 

The description is rather abstract, so it's mostly implementation free.  However, I have some first thoughts on how it may look.

> Someone,
> somewhere will mark channels as throttleable (we'll assume for the purposes
> of this comment that this happens by magic). 

Something like that :)  For you the only important info is to find the flag in the service class of the channel.  Bug 1312745 comment 1 will tell you more on the API.  there is currently no definition on "when" the flag is going to be (must be) set, but for most cases I can imagine it will be set way before asyncopen call.

> Then, every time *any* page
> load starts (not just a page load using the channels marked throttleable),
> we will Suspend() those channels.

> 
> After that, either when the page load is finished, or when some magic signal
> comes in that the page is stalled (is my code responsible for determining
> when to send this signal?), we Resume() the channel. 

Ideally, it would be nice if this bug introduces some kind of a service that will handle it.  Collect all channels marked as throttable, and watch for pressure and release.

> (Should we Resume()
> multiple times when receiving the stalled signal, to get rid of any stacked
> Suspend()s from multiple concurrent page loads and allow data to flow
> immediately?)

The idea (but I leave it up to you eventually) is more to let the above mentioned service do the pressure/release counting.  When the count goes from 0 to 1, start throttling (precise meaning of it below), when last pressure is released (the counter is back to 0) stop the throttling.

The throttling is not to suspend for more than I believe 2 seconds.  It's more something like a heartbeat thing suspend/resume/suspend/resume/suspend/resume in some discreet intervals (read in smaller amounts).  Here Patrick may have better numbers and suggestions than me.  The reason is we don't want intermediate routers to drop connections when nothing happens on them.  Their timeouts may be insanely short...

For the start, let's define "release" as finishing the page load.  It can be optimized in future when there is just a simple API to use.  Also, the service may do some internal decisions when throttling is on (followups, of course).

Ideally to make sure we don't break the counter in the throttling service is to use some guarding object that will raise the counter in ctor and decrease it in dtor, but that's a detail.  Docshell or tabparent or something (we have to be on the parent, btw - yay..) will grab a new such an object on navigation start and drop it on page load finish or something.

> 
> I know this is probably (hopefully!) just re-stating what you said in my own
> words, but better to make sure I understand now than write the wrong patch :)

Yes, it is :)  In any case, good you asked :)
Flags: needinfo?(honzab.moz)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Reporter)

Comment 7

2 years ago
Created attachment 8826299 [details] [diff] [review]
Copy of: Bug 1312754 - Add a service to throttle certain HTTP channels. (for splinter review)
Attachment #8826299 - Flags: review?(honzab.moz)
(Reporter)

Comment 8

2 years ago
mozreview-review
Comment on attachment 8824111 [details]
Bug 1312754 (prereq) - Fix compile issues caused by shuffling around unified files.

https://reviewboard.mozilla.org/r/100310/#review105156
Attachment #8824111 - Flags: review?(honzab.moz) → review+
(Reporter)

Comment 9

2 years ago
Comment on attachment 8826299 [details] [diff] [review]
Copy of: Bug 1312754 - Add a service to throttle certain HTTP channels. (for splinter review)

Review of attachment 8826299 [details] [diff] [review]:
-----------------------------------------------------------------

there are few details to work out with internal state flags of the service while prefs are changed at runtime + we need to find a better place to monitor active page load (pressure)

I tried to apply this trivially to downloads (patch will be included later) and it's hard to say there is some effect with the default settings.  I was using edition.cnn.com shift-reloads on and on while 3 downloads where running, with a stopwatch in hands.  When I set the suspend-for pref to 20s then there is some effect (~4 secs win: page mostly rendered in 11s vs 14s and throbber stopped 20s vs 26s.  (throttle vs no-throttle).

shift-reload of that page while no downloads are running is the biggest win: 8s resp 16s (mostly rendered resp. throbber done.)

So, I'm not sure that suspending a channel is enough to have the desired effect here.  Also, to find out what's really happening we need better tools (tasktracer/backtrack namely) I'm afraid.

::: docshell/base/nsDocShell.cpp
@@ +7525,5 @@
>                          nsIChannel* aChannel, nsresult aStatus)
>  {
> +  // We can release any pressure we may have had on the throttling service and
> +  // let background channels continue.
> +  mThrottler.reset();

The throttler needs to be reset also when an error page is displayed.  just trying to connect localhost:12345 (leading to netreset) or whatever error will leave the throttle timer on.

@@ +10652,5 @@
>                          nsINetworkPredictor::PREDICT_LOAD, this, nullptr);
>  
> +  // Increase pressure on the throttling service so background channels will be
> +  // appropriately de-prioritized.
> +  mThrottler.reset(new mozilla::net::Throttler());

we'll probably need to find some better place than this.  this will engage the timer also when opening about:newtab, about:blank, the downloads window etc (i.e. this will throttle also for chrome pages)

::: modules/libpref/init/all.js
@@ +1978,5 @@
>  
> +// Control how the throttling service works - number of seconds that each
> +// suspend and resume period (prefs named appropriately)
> +pref("network.throttle.suspend-for", 2);
> +pref("network.throttle.resume-for", 2);

the enabled pref is missing
and I'd like to have ms resolution here

::: netwerk/base/ThrottlingService.cpp
@@ +53,5 @@
> +  MOZ_ASSERT(!mInitCalled);
> +
> +  mInitCalled = true;
> +
> +  mTimer = do_CreateInstance(NS_TIMER_CONTRACTID);

maybe create the timer as last (after all ENSURE_SUCCESS)?

@@ +175,5 @@
> +  }
> +
> +  if (!mChannelHash.Get(key, nullptr)) {
> +    // TODO - warn?
> +    return NS_ERROR_ILLEGAL_VALUE;

what if throttling was disabled (by the pref) while a channel was being added and now the throttling is enabled?  Maybe we should add/remove channels regardless the service is enabled or not?  if it's not that complicated it might then behave more consistently when users play with the enable pref and having active throttleable channels.

@@ +221,5 @@
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  if (!mEnabled) {
> +    return NS_OK;
> +  }

so, when the users plays with the enable pref while we are under pressure, this will break the counter, right?

@@ +225,5 @@
> +  }
> +
> +  if (IsNeckoChild()) {
> +    return NS_ERROR_NOT_AVAILABLE;
> +  }

probably check before mEnabled?

@@ +269,5 @@
> +    // Always try to resume if we were suspended, but only time-limit the
> +    // resumption if we're under pressure and we're enabled. If either of those
> +    // conditions is false, it doesn't make any sense to set a timer to suspend
> +    // things when we don't want to be suspended anyway.
> +    if (mPressureCount && mEnabled) {

aren't we already resumed when not under pressure?  (i.e. when mPressureCount == 0)  Assuming that user never changes the enable pref at runtime.

@@ +318,5 @@
> +
> +  for (auto iter = mChannelHash.ConstIter(); !iter.Done(); iter.Next()) {
> +    const nsCOMPtr<nsIHttpChannel> channel = iter.UserData();
> +    nsCOMPtr<nsIChannel> baseChannel = do_QueryInterface(channel);
> +    baseChannel->Resume();

nit: is the iterator protected or resilient against modifications while iteration is in progress?  I'm thinking of cases when resume may bubble up (like Resume->Cancel->OnStopRequest->DoSomethingCrazy->HTTP::AsyncOpen) to reenter this service with creation of a new channel.  We would then Resume() that new channel here what might either break the suspend counter on the channel or just print warnings.

maybe there is nothing to be concerned about, just checking.

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +6079,5 @@
>  
> +    if (mClassOfService & nsIClassOfService::Throttleable) {
> +        nsIThrottlingService *throttler = gHttpHandler->GetThrottlingService();
> +        if (throttler) {
> +            throttler->AddChannel(this);

maybe add a comment this may immediately call Suspend() on this channel.
Attachment #8826299 - Flags: review?(honzab.moz) → feedback+
(Reporter)

Comment 10

2 years ago
mozreview-review
Comment on attachment 8824112 [details]
Bug 1312754 - Add a service to throttle certain HTTP channels.

https://reviewboard.mozilla.org/r/100312/#review105448

see comment 9
Attachment #8824112 - Flags: review?(honzab.moz)
(Reporter)

Comment 11

2 years ago
Created attachment 8826861 [details] [diff] [review]
downl-throttle.patch (for reference)

This hooks to the SetChannelIsForDownload attribute.  I think there are bugs on how this attribute is used (not used), e.g. it seems it doesn't work for resumed download from the previous session etc... 

Also shows up that we may not always know a channel is for download before async open :(  I had to realize, we know only after onstartrequest.
(Assignee)

Comment 12

2 years ago
This is all good stuff, working my way through it now. I do want to discuss the mEnabled bits, though. You're right, as currently written, things could become inconsistent, and that would be bad. So there are two options - I can either make the changes to ensure things are consistent (including manually watching the enabled pref so I can suspend/resume as appropriate when the pref changes at runtime), or I can just read the pref once at startup and be done with it. I'm leaning towards the "read once at startup" route, as this doesn't seem like something that most people will change (and we already have plenty of prefs that need a restart to be picked up), and it's a much less invasive change, but I wanted to get your view on it, Honza.
Flags: needinfo?(honzab.moz)
(Reporter)

Comment 13

2 years ago
(In reply to Nicholas Hurley [:nwgh][:hurley] from comment #12)
> This is all good stuff, working my way through it now. I do want to discuss
> the mEnabled bits, though. You're right, as currently written, things could
> become inconsistent, and that would be bad. So there are two options - I can
> either make the changes to ensure things are consistent (including manually
> watching the enabled pref so I can suspend/resume as appropriate when the
> pref changes at runtime), or I can just read the pref once at startup and be
> done with it. I'm leaning towards the "read once at startup" route, as this
> doesn't seem like something that most people will change (and we already
> have plenty of prefs that need a restart to be picked up), and it's a much
> less invasive change, but I wanted to get your view on it, Honza.

Hmm.. I'd rather have the possibility to switch this at runtime.  I don't see anything complicated on it or any big obstacle here to not have it.
Flags: needinfo?(honzab.moz)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 16

2 years ago
Created attachment 8831130 [details] [diff] [review]
Patch for splinter review

Here's a splinter version for you, because I Care, Honza :)

(If you do r+ this, even with comments, could you at least r+ it in mozreview so I can use autoland? Thanks!)
Attachment #8826299 - Attachment is obsolete: true
(Assignee)

Comment 17

2 years ago
Forgot to put this in last time, here's a try run for this new version: https://treeherder.mozilla.org/#/jobs?repo=try&revision=97cbcaeba99c509b74345bf6e989cff73c33d343

Honza, any ETA on a review happening here?
Flags: needinfo?(honzab.moz)
(Reporter)

Comment 18

2 years ago
(In reply to Nicholas Hurley [:nwgh][:hurley] from comment #17)
> Forgot to put this in last time, here's a try run for this new version:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=97cbcaeba99c509b74345bf6e989cff73c33d343
> 
> Honza, any ETA on a review happening here?

This week or start of the next week.
Flags: needinfo?(honzab.moz)
(Reporter)

Comment 19

2 years ago
Comment on attachment 8831130 [details] [diff] [review]
Patch for splinter review

Review of attachment 8831130 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the splinter patch!

actual testing when download channels are added (the reference patch of mine) shows that the effect of throttling is negligible or none.  as before, tested with edition.cnn.com, trump-face and overall-page load against two large downloads.

::: docshell/base/nsDocShell.cpp
@@ +10719,5 @@
> +  aURI->SchemeIs("http", &isHTTP);
> +  aURI->SchemeIs("https", &isHTTPS);
> +  if (isHTTP || isHTTPS) {
> +    mThrottler.reset(new mozilla::net::Throttler());
> +  }

this also seems to fix the error page problem, good.

::: netwerk/base/ThrottlingService.cpp
@@ +20,5 @@
> +
> +namespace mozilla {
> +namespace net{
> +
> +static const char kEnabledPref[] = "network.throttle.enabled";

the name doesn't correspondent to what has been added to all.js

@@ +155,5 @@
> +      nsCOMPtr<nsIChannel> baseChannel = do_QueryInterface(channel);
> +      baseChannel->Suspend();
> +    }
> +  } else {
> +    // This gets tricky - we've somehow re-entrantly gotten here through the

can you just for the record provide callstacks?  (if you have them by hand or it's easy to get one)

@@ +228,5 @@
> +  if (IsNeckoChild()) {
> +    return NS_ERROR_NOT_AVAILABLE;
> +  }
> +
> +  if (mPressureCount == 0) {

please do if (mPressureCount++ == 0) { .. } since calls to Suspend/Resume on channels could cause a reentry of this method too.

@@ +258,5 @@
> +
> +  MOZ_ASSERT(mPressureCount > 0, "Unbalanced throttle pressure");
> +
> +  --mPressureCount;
> +  if (mPressureCount == 0) {

if (--mPresureCount == 0) { .. }?  :)

@@ +292,5 @@
> +      // suspending channels and that we don't have any timer that wants to
> +      // change things unexpectedly.
> +      mTimer->Cancel();
> +      MaybeResumeAll();
> +    }

you are forgetting to change the mEnabled flag..

also, I don't see a purpose of the newEnable local var

@@ +351,5 @@
> +  mSuspended = true;
> +
> +  IterateHash([](ChannelHash::Iterator &iter) -> void {
> +    const nsCOMPtr<nsIHttpChannel> channel = iter.UserData();
> +    nsCOMPtr<nsIChannel> baseChannel = do_QueryInterface(channel);

nsIHttpChannel is derived from nsIChannel, why do you need to QI down?
if possible, please save the addref/release via comptr

@@ +370,5 @@
> +  mSuspended = false;
> +
> +  IterateHash([](ChannelHash::Iterator &iter) -> void {
> +    const nsCOMPtr<nsIHttpChannel> channel = iter.UserData();
> +    nsCOMPtr<nsIChannel> baseChannel = do_QueryInterface(channel);

same here

@@ +378,5 @@
> +
> +void
> +ThrottlingService::IterateHash(void (* callback)(ChannelHash::Iterator &iter))
> +{
> +  mIteratingHash = true;

add an assert this is at false initially

@@ +401,5 @@
> +    }
> +  }
> +
> +  mChannelsToAddRemove.Clear();
> +  mChannelIsAdd.Clear();

please swap these two member arrays to local arrays before you work with them (SwapElements).  as another reentry protection.

@@ +437,5 @@
> +    }
> +  } else {
> +    MOZ_RELEASE_ASSERT(mThrottlingService);
> +    mThrottlingService->DecreasePressure();
> +    mThrottlingService = nullptr;

probably no need to do this?  or you expect the dtor of this class be called more than once?

::: netwerk/base/ThrottlingService.h
@@ +57,5 @@
> +  // channels. See comments in AddChannel and RemoveChannel for details.
> +  void IterateHash(void (* callback)(ChannelHash::Iterator &iter));
> +  bool mIteratingHash;
> +  nsCOMArray<nsIHttpChannel> mChannelsToAddRemove;
> +  nsTArray<bool> mChannelIsAdd;

maybe have two arrays, one for adding and second for removing? :)  just to avoid even a slight potential of de-sync of these two arrays here..

::: netwerk/base/nsIThrottlingService.idl
@@ +7,5 @@
> +
> +interface nsIHttpChannel;
> +
> +[builtinclass, uuid(c755ef98-b749-4f30-a658-1e6110013a66)]
> +interface nsIThrottlingService : nsISupports

some comments over this class would be good to add.

::: netwerk/ipc/NeckoParent.cpp
@@ +920,5 @@
> +{
> +  // XXX - this needs fixing for multi-e10s. Won't work when we have more than
> +  // one child. We'll need to keep track of which child this came from so we can
> +  // properly balance throttlers on a per-child basis, as well as be able to
> +  // clear out a child's throttlers when that child goes away.

isn't there a different instance of NeckoParent then?  one for each child process?
(Reporter)

Comment 20

2 years ago
mozreview-review
Comment on attachment 8824112 [details]
Bug 1312754 - Add a service to throttle certain HTTP channels.

https://reviewboard.mozilla.org/r/100312/#review112488

comment 19.
r+ with comments addressed or answered.
Attachment #8824112 - Flags: review?(honzab.moz) → review+
(Assignee)

Comment 21

2 years ago
Any comments not explicitly replied to here have been addressed as requested.

(In reply to Honza Bambas (:mayhemer) from comment #19)
> ::: netwerk/base/ThrottlingService.cpp
> @@ +20,5 @@
> > +
> > +namespace mozilla {
> > +namespace net{
> > +
> > +static const char kEnabledPref[] = "network.throttle.enabled";
> 
> the name doesn't correspondent to what has been added to all.js

*facepalm* fixed that :) (for the record, all.js is the canonical version).

> @@ +155,5 @@
> > +      nsCOMPtr<nsIChannel> baseChannel = do_QueryInterface(channel);
> > +      baseChannel->Suspend();
> > +    }
> > +  } else {
> > +    // This gets tricky - we've somehow re-entrantly gotten here through the
> 
> can you just for the record provide callstacks?  (if you have them by hand
> or it's easy to get one)

I don't, sorry.

> @@ +292,5 @@
> > +      // suspending channels and that we don't have any timer that wants to
> > +      // change things unexpectedly.
> > +      mTimer->Cancel();
> > +      MaybeResumeAll();
> > +    }
> 
> you are forgetting to change the mEnabled flag..
> 
> also, I don't see a purpose of the newEnable local var

Hrm, indeed. I don't get the point there, either. Just made it all mEnabled :)

> @@ +437,5 @@
> > +    }
> > +  } else {
> > +    MOZ_RELEASE_ASSERT(mThrottlingService);
> > +    mThrottlingService->DecreasePressure();
> > +    mThrottlingService = nullptr;
> 
> probably no need to do this?  or you expect the dtor of this class be called
> more than once?

The nulling out? I'm going to leave it in, just so a quick skim can see that we're done with the throttling service there.

> ::: netwerk/base/ThrottlingService.h
> @@ +57,5 @@
> > +  // channels. See comments in AddChannel and RemoveChannel for details.
> > +  void IterateHash(void (* callback)(ChannelHash::Iterator &iter));
> > +  bool mIteratingHash;
> > +  nsCOMArray<nsIHttpChannel> mChannelsToAddRemove;
> > +  nsTArray<bool> mChannelIsAdd;
> 
> maybe have two arrays, one for adding and second for removing? :)  just to
> avoid even a slight potential of de-sync of these two arrays here..

That's what I had originally, but then thought of the case (unlikely, but still possible) where, while we're iterating the hash, a channel gets added and then removed. If we process the remove array first, then the channel will (incorrectly) be added as the last operation on it, and thus will be under control of the throttler when it shouldn't be. The inverse issue happens if a channel that is already under control of the throttler is removed then added while we're iterating the hash, and we process the add array first (the channel will not be under control of the throttler even though it should be). So, we need to have the proper ordering of add/removes, and the easiest way to manage this is to keep one single array.

> ::: netwerk/ipc/NeckoParent.cpp
> @@ +920,5 @@
> > +{
> > +  // XXX - this needs fixing for multi-e10s. Won't work when we have more than
> > +  // one child. We'll need to keep track of which child this came from so we can
> > +  // properly balance throttlers on a per-child basis, as well as be able to
> > +  // clear out a child's throttlers when that child goes away.
> 
> isn't there a different instance of NeckoParent then?  one for each child
> process?

Indeed. This was left over from before I understood that :)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 25

2 years ago
Pushed by hurley@todesschaf.org:
https://hg.mozilla.org/integration/autoland/rev/469857ed52b9
(prereq) - Fix compile issues caused by shuffling around unified files. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/145410f30206
Add a service to throttle certain HTTP channels. r=mayhemer

Comment 26

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/469857ed52b9
https://hg.mozilla.org/mozilla-central/rev/145410f30206
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox54: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
status-firefox52: affected → wontfix
(Reporter)

Updated

2 years ago
Blocks: 1348061
(Reporter)

Updated

2 years ago
Blocks: 1348062
(Reporter)

Updated

2 years ago
No longer blocks: 1348061
Depends on: 1348061
(Reporter)

Updated

2 years ago
Attachment #8831130 - Attachment is obsolete: true
(Reporter)

Updated

2 years ago
Attachment #8826861 - Attachment is obsolete: true
(Reporter)

Updated

2 years ago
Depends on: 1355782
(Reporter)

Updated

2 years ago
Blocks: 1360603
You need to log in before you can comment on or make changes to this bug.