594121 - Investigate nice'ing the content process above the chrome process

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Description

•

15 years ago

In general, for responsiveness's sake, we want the chrome process to have scheduling priority over the content process. For example, chrome repainting during a pan animation trumps content-process content script spinning in JS. It's not clear how much of a difference this would make with the newer linux schedulers, but it's worth a look. To investigate, we can have something in the content process, say ContentChild::Init(), do kContentNiceness = 1 prio = getpriority(PRIO_PROCESS, 0) setpriority(PRIO_PROCESS, 0, prio + kContentNiceness) If this proves fruitful, we may want to create a ChildProcess interface for it and port to windows.

Doug Turner (:dougt)

Updated

•

15 years ago

tracking-fennec: --- → ?

Oleg Romashin (:romaxa)

Comment 1

•

15 years ago

Some platforms can do re-nice automatically for process which is running currently foreground.

Doug Turner (:dougt)

Comment 2

•

15 years ago

i have a patch that does this, it is hard right now to see a perf improvement. Lets see if we can get better data when things settle down (cedar lands, and hw accel is enabled)

tracking-fennec: ? → 2.0+

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 3

•

15 years ago

I tried renice manually on n900, and I think it does help. UI stays significantly more responsive when chrome process has better priority. The test case is to load some cpu heavy web site and try to pan or drag the side UI visible.

Doug Turner (:dougt)

Comment 4

•

15 years ago

On an similar device, I couldn't tell. What was your renice value?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 5

•

15 years ago

20

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 6

•

15 years ago

And for testing use a web page which takes lots of cpu time. For example http://mozilla.pettay.fi/moztests/events/event_speed_2.1.html Press the "Click to start" and wait for some results and then zoom-in and try to pan.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 7

•

15 years ago

20 possibly too much, but it does help.

Doug Turner (:dougt)

Comment 8

•

15 years ago

Attached patch patch v.1 (obsolete) — Details — Splinter Review

we can set this from the mobile-browser launching code, i suspect.

Assignee: nobody → doug.turner

Attachment #482991 - Flags: review?(jones.chris.g)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 9

•

15 years ago

Comment on attachment 482991 [details] [diff] [review] patch v.1 - windows doesn't have getpriority/setpriority. We need to test for those APIs (testing only for one or the other would suffice) in configure.in. - using MOZ_CHILD_PROCESS_PRIORITY to set an absolute priority is harder to maintain than a nice-delta. Let's instead use a delta, call getpriority() here, apply the delta, then setpriority(). - I don't like relying on an environment variable for a core feature like this. Let's instead use |static const int kNiceDelta| or something, which can be overridden by an environment variable |ifdef DEBUG|. (This will want to be a GeckoChildProcessHost interface, but we can do that in a followup.)

Attachment #482991 - Flags: review?(jones.chris.g) → review-

Doug Turner (:dougt)

Comment 10

•

15 years ago

I compared the default nice value to a nice value of 10 (lower priority) for the child process on a linux arm device and found that it hurt page load perf by about 3.3%. I couldn't notice any page panning performance difference during these loads. There might be other benefits to having a lower child process priority. The numbers might be different on other hardware.

tracking-fennec: 2.0+ → ?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 11

•

15 years ago

On N900 I nowadays pretty much always renice the child process manually (yes yes, I should just make some script) because the change to the responsiveness is rather big. And yes, pages may load a bit slower when the child process has lower priority.

Doug Turner (:dougt)

Comment 12

•

15 years ago

olli, do you have a test page that shows the importance of nicing?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 13

•

15 years ago

Any pages. Having one or two pages loading in the background while panning the foreground tab.

Doug Turner (:dougt)

Comment 14

•

15 years ago

okay. changing the nice obviously makes a difference in that case.

Doug Turner (:dougt)

Comment 15

•

15 years ago

we might want to dynamically change the priority of the child based on what fennec is currently doing. For example, if we are loading multiple webpages, we could nice the child. However, if we only have one page loading, maybe there is no need. The heuristic needs to be figured out, but building it would be trivial.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 16

•

15 years ago

Can we ever give child process higher priority than what it already has? I mean, are there any OS level methods for that?

Alon Zakai (:azakai)

Assignee

Comment 17

•

15 years ago

How about doing this: * The parent measures its current 'lag', defined as how much CPU it is getting compared to what it needs. Various ways to measure that. It should basically correspond to 'responsiveness'. * If the parent is lagging, nice the child some more. If the parent is not lagging, nice the child less.

Benjamin Smedberg

Comment 18

•

15 years ago

I really don't think we need to dynamically renice the child. nice only matters where there is CPU contention: if the only process which is trying to do something is the content process, then it will get all the CPU anyway. We should just make it a little nicer than the chrome process, to preserve responsiveness, and leave it alone.

Doug Turner (:dougt)

Comment 19

•

15 years ago

> We should just make it a little nicer than the chrome process, to preserve responsiveness, and leave it alone. Just to be clear, you are saying we should nice the content process so that the chrome process is more responsive.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 20

•

15 years ago

That's the whole point of this. See comment 0.

Alon Zakai (:azakai)

Assignee

Comment 21

•

15 years ago

(In reply to comment #18) > I really don't think we need to dynamically renice the child. nice only matters > where there is CPU contention: if the only process which is trying to do > something is the content process, then it will get all the CPU anyway. The problem is with other processes. If we make the child nicer than the parent (and the parent has the default nice value) then the child will get less CPU than other processes in the system. It'll be nice to everyone, when we want it to be nice just to the parent.

Assignee: doug.turner → azakai

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 22

•

15 years ago

That's exactly what we want. The content process crunching away shouldn't make any part of the device's UI less responsive.

Doug Turner (:dougt)

Comment 23

•

15 years ago

regarding comment 18: It is pretty easy to see that when the child is loading two tabs, panning suffers with the default nice values. However, when just loading a single page, if we nice the child process, we lose 3%. I do not think that this is an either or situation -- we don't need to lose 3% to get responsiveness. Instead, we should only nice the child process when we have to and revert this when possible. regarding comment 22: Exactly.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 24

•

15 years ago

3% measured how? Doing what? It doesn't seem worth fiddling around with the nice value, yet. If we get it right, we might win a few % on page load in some situations. If we get it wrong, responsiveness suffers. Risk/reward doesn't look good to me, especially since I don't know how we would gather data to compute heuristics. Let's take the responsiveness win and file away adding heuristics for when we have/start caring about quantitative benchmarks on which this repeatably, demonstrably loses us points.

Doug Turner (:dougt)

Comment 25

•

15 years ago

> 3% measured how? Doing what? i loaded news.google.com 15 times over a usb network and measured the time between onstart for the document and onstop for the document using dump() and Date.now(). I toss the highest and lowest values, and averaged the remaining 13 values. Then compared these averages. Happy to try a different test. > Risk/reward doesn't look good to me Assuming that the above measurement is sound, then how can a 3% page load perf be not a good reward. > I don't know how we would gather data to compute heuristics. This is a fair point and I don't know what the best place is either. The front end can probably handle this -- it can do something as simple as "when multiple tabs are being loaded, renice the child". Maybe there is a better place for this logic. I guess we can renice the child but, as comment 21 suggests, this means that the content process will always run less than every other user ui process. Doing so will lose us valuable page load perf. Doing so will always make sure that benchmarks will run slower than they can.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 26

•

15 years ago

(In reply to comment #25) > > 3% measured how? Doing what? > > i loaded news.google.com 15 times over a usb network and measured the time > between onstart for the document and onstop for the document using dump() and > Date.now(). I toss the highest and lowest values, and averaged the remaining > 13 values. Then compared these averages. Happy to try a different test. > Are you loading the site from cache or over network on the host? What else is going on both on device and on the machine that's servicing network requests? What's the variance in the measurements? What's the effect on other top 500 sites? What's the effect on responsiveness by changing niceness on sites that chew CPU after page load? > > Risk/reward doesn't look good to me > > Assuming that the above measurement is sound, then how can a 3% page load perf > be not a good reward. > Because we don't have a good framework for measuring small changes like this. We also don't have a good framework for measuring responsiveness. We're trading off a clear qualitative win in responsiveness for a small potential win on pageload on one site. See above re: measurements. > > I don't know how we would gather data to compute heuristics. > > This is a fair point and I don't know what the best place is either. The front > end can probably handle this -- it can do something as simple as "when multiple > tabs are being loaded, renice the child". Maybe there is a better place for > this logic. > This requires yet more work of unknown scope, and additionally this heuristic isn't right. Sites chewing CPU after page load, e.g. > I guess we can renice the child but, as comment 21 suggests, this means that > the content process will always run less than every other user ui process. > Doing so will lose us valuable page load perf. Doing so will always make sure > that benchmarks will run slower than they can. Responsiveness is more valuable than absolute page load time. We should worry about benchmarks when we run out of huge wins from big changes like GL compositing, e.g.

Alon Zakai (:azakai)

Assignee

Comment 27

•

15 years ago

How about this approach: * Use getrusage to find out how many involuntary context switches the parent process has. Those are caused when the OS forcibly interrupts us, if we took more than our allocated slice of time. 0 involuntary context switches means that we don't need any more CPU time than we are getting. 1 or more means that we want more CPU than we are receiving. * If the parent wants more CPU, it nices the child a little, and vice versa. * If the parent wanted more CPU, it tries to guess if nicing the child helped in that regard or not (by seeing if it is now closer to getting all the CPU it wants). If not, it gives up and de-nices the child. The downside, as mentioned before, is that the child will be nice to other processes as well. If those are OS UI things, maybe that is good (but probably those were given higher priority anyhow by the OS?). If those are background things, it is bad. But if the parent only nices the child when that actually helps the parent's responsiveness, maybe the risk is low?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 28

•

15 years ago

Can we please fix Doug's patch and land it? Please? Pretty please? (In reply to comment #27) > * Use getrusage to find out how many involuntary context switches the parent > process has. Those are caused when the OS forcibly interrupts us, if we took > more than our allocated slice of time. 0 involuntary context switches means > that we don't need any more CPU time than we are getting. 1 or more means that > we want more CPU than we are receiving. That's only true wrt chrome main thread and content process. We don't care about switches between, e.g., the sync thread and anything else, which getrusage() will report. > * If the parent wants more CPU, it nices the child a little, and vice versa. > * If the parent wanted more CPU, it tries to guess if nicing the child helped > in that regard or not (by seeing if it is now closer to getting all the CPU it > wants). If not, it gives up and de-nices the child. It's not at all clear where/when we would perform this check or how to decide whether we succeded. These heuristics would require an unknown amount of time to tune. Lucky we have all those benchmarks to tune on. This also suffers from the problem of requiring several unfavorable scheduling decisions to adjust niceness (and we also have to reach the code that detects/adjust), during which we might lose animation frames etc. Unclear how much of a difference it would make in practice. > If those are OS UI things, maybe that is good (but probably > those were given higher priority anyhow by the OS?). Wouldn't count on it. > If those are background things, it is bad. Not necessarily. > But if the parent only nices the child when that actually > helps the parent's responsiveness, maybe the risk is low? The risk is unchanged because we don't know if this scheme works. If we knew it worked, there would be no risk. At this point we're only speculating because we don't have code or data. I think what you propose sounds very interesting and worth looking into, though there are details to be worked out. But why are we reaching over backwards to try and shoot ourselves in the foot responsiveness-wise? Can we pretty pretty please with sugar on top land the responsiveness win and look into this in a followup?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 29

•

15 years ago

(In reply to comment #28) > Can we please fix Doug's patch and land it? Please? Pretty please? Yeah, I don't see reason to make this all overly complicated. If there is something to tweak, we can do that in a followup. Atm this is pretty much a blocker for me to use Fennec, since I do open background tabs very often, but that makes the UI very slow.

Alon Zakai (:azakai)

Assignee

Comment 30

•

15 years ago

This was brought up in the weekly meeting, and nicing was not seen as a good idea until we have clear benchmarks for what we gain (presumably responsiveness, but how much and when?) and what we lose (presumably page load time and benchmarks, but again how much and when?). I'll focus on figuring out ways to do measurements. The suggestions from the meeting included benchmarking stuff like Tpan, while running other stuff in the background or not (stuff like sync in the parent, or outside processes), etc. Also measuring low-level stuff might help as well.

Alon Zakai (:azakai)

Assignee

Updated

•

15 years ago

Depends on: 606574

Alon Zakai (:azakai)

Assignee

Comment 31

•

15 years ago

Some measurements on the N900, without nicing and with nice +10: Page load for cnn.com doesn't change in any noticeable way. Maybe a few %, but not sure that isn't noise, it's 33-36 seconds in both. Dromaeo-v8 results stay basically the same as well - close enough as to not being able to see a difference. So doesn't seem to be a clear downside. On the upside, measured responsiveness (see bug 606574 for definitions) is better: * On loading cnn.com, tracer_delay_ms reached 200 in one case in nice +10, but otherwise was ~0, while without nicing it passed 500 several times. Also lots of small delays there, 10-100 range. main_loop_max_lag_ms passes the 500 mark several times without nicing, and only once with nice +10. * When running v8 in dromaeo, there is little difference. Maybe some slight benefit to nicing, but I guess responsiveness is good anyhow in that test. So, these measurements support nicing the child process. If anyone wants to run their own tests, builds are here: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/azakai@mozilla.com-25e687c8b25b/

Alon Zakai (:azakai)

Assignee

Comment 32

•

15 years ago

Attached patch patch v2 (obsolete) — Details — Splinter Review

Tested on desktop Linux. Still need to confirm this builds and runs properly on Android and Maemo.

Attachment #482991 - Attachment is obsolete: true

Attachment #485911 - Flags: review?(doug.turner)

Doug Turner (:dougt)

Comment 33

•

15 years ago

Comment on attachment 485911 [details] [diff] [review] patch v2 >diff --git a/dom/ipc/ContentChild.cpp b/dom/ipc/ContentChild.cpp >--- a/dom/ipc/ContentChild.cpp >+++ b/dom/ipc/ContentChild.cpp >@@ -77,16 +77,22 @@ > > #include "nsIGeolocationProvider.h" > > #ifdef MOZ_PERMISSIONS > #include "nsPermission.h" > #include "nsPermissionManager.h" > #endif > >+#if defined(MOZ_PLATFORM_MAEMO) || defined(ANDROID) || defined(LINUX) >+#include <sys/time.h> >+#include <sys/resource.h> >+static const int kNiceness = 10; >+#endif I think we should do the #includes on all platforms, and have the default kNiceness be zero. So, something like: #include <sys/time.h> #include <sys/resource.h> #if defined(MOZ_PLATFORM_MAEMO) || defined(ANDROID) || defined(LINUX) static const int kNiceness = 10; #else static const int kNiceness = 0; #endif >+ // XXX We change the behavior of Linux child processes here. That s/Linux/ >+ char* nicenessStr = getenv("MOZ_CHILD_PROCESS_NICENESS"); probably something "MOZ_CHILD_PROCESS_RELATIVE_NICENESS" since, well, it isn't an absolute OS niceness. Bikeshedding done.

Attachment #485911 - Flags: review?(doug.turner) → review+

Doug Turner (:dougt)

Comment 34

•

15 years ago

Comment on attachment 485911 [details] [diff] [review] patch v2 lets see one more patch.

Attachment #485911 - Flags: review+ → review-

Alon Zakai (:azakai)

Assignee

Comment 35

•

15 years ago

> > So, something like: > > > #include <sys/time.h> > #include <sys/resource.h> > > #if defined(MOZ_PLATFORM_MAEMO) || defined(ANDROID) || defined(LINUX) > static const int kNiceness = 10; > #else > static const int kNiceness = 0; > #endif > Hmm, <sys/resource.h> and getpriority/setpriority don't exist on all platforms, and in particular on windows, AFAIK? (Also, they work differently on the BSDs, including OS X, but maybe that difference can be ignored?) > >+ // XXX We change the behavior of Linux child processes here. That > > s/Linux/ > Well, I was trying to say that on (non-mobile) Firefox, Linux will be different than all the other platforms: Windows, OS X, BSD, etc. There is something special about Linux, after this patch, once Firefox gets a child process. I think this is risky to do, unlike everyone else it seems, but I was hoping to at least be able to write a comment...

Doug Turner (:dougt)

Comment 36

•

15 years ago

can you add a configure test for [s|g]etpriority?

Alon Zakai (:azakai)

Assignee

Comment 37

•

15 years ago

Attached patch patch v3 — Details — Splinter Review

Updated patch following comments and IRC discussion.

Attachment #485911 - Attachment is obsolete: true

Attachment #486094 - Flags: review?(doug.turner)

Doug Turner (:dougt)

Comment 38

•

15 years ago

Comment on attachment 486094 [details] [diff] [review] patch v3 >diff --git a/dom/ipc/ContentChild.cpp b/dom/ipc/ContentChild.cpp >--- a/dom/ipc/ContentChild.cpp >+++ b/dom/ipc/ContentChild.cpp >@@ -77,16 +77,24 @@ > > #include "nsIGeolocationProvider.h" > > #ifdef MOZ_PERMISSIONS > #include "nsPermission.h" > #include "nsPermissionManager.h" > #endif > >+#if defined(MOZ_PLATFORM_MAEMO) || defined(ANDROID) || defined(LINUX) MOZ_PLATFORM_MAEMO is linux. so, just: #if defined(ANDROID) || defined(LINUX) otherwise, looks good! Also, lets file a follow up bug. In it, we should build an API so that an application can control the nice value of child processes. This would allow Fennec to change its children processes, and it could also allow dynamic changing of process priority.

Attachment #486094 - Flags: review?(doug.turner) → review+

Doug Turner (:dougt)

Updated

•

15 years ago

tracking-fennec: ? → 2.0+

Alon Zakai (:azakai)

Assignee

Comment 39

•

15 years ago

Did some measurements on a Nexus One, Android 2.2.1. Basically the same as with the N900, loading websites has better responsiveness - time spent in the main loop is lower (<150ms versus <200 without nicing), and tracer event times are also lower (<20ms versus <40). No difference when running benchmarks, responsiveness is high anyhow. No noticeable downside in benchmarks or page load times.

Doug Turner (:dougt)

Updated

•

15 years ago

tracking-fennec: 2.0+ → 2.0b3+

Brad Lassey [:blassey] (use needinfo?)

Updated

•

15 years ago

Whiteboard: [fennec-checkin-postb2][has-patch]

Alon Zakai (:azakai)

Assignee

Comment 40

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/6edccbb39c1b

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Alon Zakai (:azakai)

Assignee

Updated

•

15 years ago

Whiteboard: [fennec-checkin-postb2][has-patch]

(no longer active)

Comment 41

•

15 years ago

On Windows, we need to do something like this: SetPriorityClass(GetCurrentProcess(), PROCESS_MODE_BACKGROUND_BEGIN); Should I create a patch which does that (perhaps in another bug)?

Alon Zakai (:azakai)

Assignee

Comment 42

•

15 years ago

We can do that in another bug, sure. Just for perspective though, this stuff is really only noticeable on mobile devices with weak (single) CPUs. On a Windows desktop or laptop today, typically with a powerful dual-core+ CPU, we'd probably never see the difference. Regardless, this would be good to have, nice idea!

(no longer active)

Comment 43

•

15 years ago

Filed bug 610170.

Matt Brubeck (:mbrubeck)

Updated

•

14 years ago

Blocks: 633986

patch v.1 15 years ago Doug Turner (:dougt) 1.06 KB, patch	cjones : review-	Details \| Diff \| Splinter Review
patch v2 15 years ago Alon Zakai (:azakai) 1.64 KB, patch	dougt : review-	Details \| Diff \| Splinter Review
patch v3 15 years ago Alon Zakai (:azakai) 1.81 KB, patch	dougt : review+	Details \| Diff \| Splinter Review