Last Comment Bug 663970 - Disable nice on systems that have multiple CPUs.
: Disable nice on systems that have multiple CPUs.
Status: RESOLVED FIXED
QA?
:
Product: Core
Classification: Components
Component: Widget: Android (show other bugs)
: Trunk
: All Linux
: -- normal (vote)
: mozilla7
Assigned To: Gian-Carlo Pascutto [:gcp]
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-13 14:26 PDT by Doug Turner (:dougt)
Modified: 2011-10-10 01:17 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch v1 (5.84 KB, patch)
2011-06-15 11:22 PDT, Gian-Carlo Pascutto [:gcp]
doug.turner: review-
Details | Diff | Splinter Review
Patch v2 (1/2) (1.84 KB, patch)
2011-06-15 18:11 PDT, Gian-Carlo Pascutto [:gcp]
doug.turner: review-
Details | Diff | Splinter Review
Patch v2 (2/2) (2.00 KB, patch)
2011-06-15 18:12 PDT, Gian-Carlo Pascutto [:gcp]
doug.turner: review+
Details | Diff | Splinter Review
Patch v3 (1/2) (1.80 KB, patch)
2011-06-16 11:50 PDT, Gian-Carlo Pascutto [:gcp]
doug.turner: review+
Details | Diff | Splinter Review
Patch v3 (2/2) (1.97 KB, patch)
2011-06-16 11:51 PDT, Gian-Carlo Pascutto [:gcp]
no flags Details | Diff | Splinter Review
Patch v4 (1/2) (1.77 KB, patch)
2011-06-16 19:59 PDT, Gian-Carlo Pascutto [:gcp]
doug.turner: review+
Details | Diff | Splinter Review
Patch v4 (2/2) (2.08 KB, patch)
2011-06-16 20:01 PDT, Gian-Carlo Pascutto [:gcp]
ted: review+
Details | Diff | Splinter Review
Patch 2/2 v5 Detect cpucount with hotplugging/powersaving (2.54 KB, patch)
2011-06-27 10:33 PDT, Gian-Carlo Pascutto [:gcp]
ted: review+
Details | Diff | Splinter Review

Description Doug Turner (:dougt) 2011-06-13 14:26:48 PDT
On Mobile, we can and do tweak the nice value of the child process:

http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#217

However, this doesn't make much sense and can be a perf hit on devices that have multiple cores.

Lets change this so that on devices with multiple cores we stop doing this nicing.  We may also want to experiment with cpu scheduling.

Note:

I do not think that

`cat /proc/cpuinfo | grep processor | wc -l`

will work on all devices.  I have a tablet that has multiple cores which does not show more than one processor line.  Maybe that is a bug on the device, not sure.

Maybe we can try something like calling sched_getaffinity and seeing what is set.
Comment 1 Brad Lassey [:blassey] (use needinfo?) 2011-06-13 23:17:26 PDT
`adb shell cat /proc/cpuinfo | grep  -i processor | wc -l` gives correct values on my atrix and nexus one, can you test on your device?
Comment 2 Gian-Carlo Pascutto [:gcp] 2011-06-14 17:22:27 PDT
On my Galaxy S2 /proc/cpuinfo seems to be correct as well, i.e. has 2 processor lines.

Using mfinkel's test runner, with nice:
Fennec 7.0a1,GT-I9100,20110615011310,3486,AVERAGE_TP
Fennec 7.0a1,GT-I9100,20110615011310,3119,AVERAGE_TP
Fennec 7.0a1,GT-I9100,20110615011310,4274,AVERAGE_TP

without nice:
Fennec 7.0a1,GT-I9100,20110615003204,3948,AVERAGE_TP
Fennec 7.0a1,GT-I9100,20110615003204,3505,AVERAGE_TP
Fennec 7.0a1,GT-I9100,20110615020648,3823,AVERAGE_TP

Variance looks a bit big to say a lot, but it doesn't look significantly different. I'll check also with nice=20 as asked by dougt and code up a patch.
Comment 3 Gian-Carlo Pascutto [:gcp] 2011-06-15 11:22:45 PDT
Created attachment 539597 [details] [diff] [review]
Patch v1
Comment 4 Doug Turner (:dougt) 2011-06-15 11:53:27 PDT
Comment on attachment 539597 [details] [diff] [review]
Patch v1

I think the best way to expose the CPU count is through system info.  This would allow anyone (including chrome script) to figure out how many CPUs there are.  Could you look at doing that:

http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsSystemInfo.cpp
Comment 5 Gian-Carlo Pascutto [:gcp] 2011-06-15 13:11:52 PDT
Looks like it's already there:
http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsSystemInfo.cpp#92

but it's checking the amount online instead of present, so it will go wrong with powersaving:
http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/misc/prsystem.c#259

I'll change the behavior of the latter and rework the patch to use it instead.
Comment 6 Gian-Carlo Pascutto [:gcp] 2011-06-15 14:20:58 PDT
Darn, Android's libc gets it wrong:

android/bionic/libc/unistd/sysconf.c:

static int
__get_nproc_conf(void)
...
    if (line_parser_init(parser, "/proc/cpuinfo") < 0)
...

Will have to put the correct code into prsystem.c directly.
Comment 7 Gian-Carlo Pascutto [:gcp] 2011-06-15 18:11:08 PDT
Created attachment 539697 [details] [diff] [review]
Patch v2 (1/2)
Comment 8 Gian-Carlo Pascutto [:gcp] 2011-06-15 18:12:01 PDT
Created attachment 539698 [details] [diff] [review]
Patch v2 (2/2)
Comment 9 Doug Turner (:dougt) 2011-06-16 09:22:10 PDT
Comment on attachment 539697 [details] [diff] [review]
Patch v2 (1/2)

Review of attachment 539697 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentParent.cpp
@@ +223,5 @@
> +        /* make the GUI thread have higher priority on single-cpu devices */
> +        nsresult rv;
> +        nsCOMPtr<nsIPropertyBag2> infoService =
> +            do_GetService(NS_SYSTEMINFO_CONTRACTID, &rv);
> +        if (NS_SUCCEEDED(rv)) {

nsCOMPtr<nsIPropertyBag2> infoService = do_GetService(NS_SYSTEMINFO_CONTRACTID)
if (infoService) {
  ...


There is no real need to deal with the nsresult

@@ +226,5 @@
> +            do_GetService(NS_SYSTEMINFO_CONTRACTID, &rv);
> +        if (NS_SUCCEEDED(rv)) {
> +            PRInt32 cpus;
> +            NS_NAMED_LITERAL_STRING(cpucount, "cpucount");
> +            rv = infoService->GetPropertyAsInt32(cpucount, &cpus);

infoService->GetPropertyAsInt32(cpucount, &cpus);


infoService->GetPropertyAsInt32(NS_LITERAL_STRING("cpucount"), &cpus);

no need for the named literal.

@@ +228,5 @@
> +            PRInt32 cpus;
> +            NS_NAMED_LITERAL_STRING(cpucount, "cpucount");
> +            rv = infoService->GetPropertyAsInt32(cpucount, &cpus);
> +            if (NS_SUCCEEDED(rv)) {
> +                if (nice != 0 && cpus == 1) {

What happens if GetPropertyAsInt32 fails -- does that mean we don't ever setpriority?  If GetPropertyAsInt32 fails, maybe we just set cpus to 1, and just do the existing logic.
Comment 10 Doug Turner (:dougt) 2011-06-16 09:28:11 PDT
Comment on attachment 539698 [details] [diff] [review]
Patch v2 (2/2)

Review of attachment 539698 [details] [diff] [review]:
-----------------------------------------------------------------

::: nsprpub/pr/src/misc/prsystem.c
@@ +259,5 @@
>  #elif defined(XP_UNIX)
> +   /* for the benefit of devices with advanced power-saving, that
> +      actually hotplug their cpus in heavy load, try to figure out
> +      the real number of CPUs */
> +   char buf[512];

Maybe a #define for 512, then use that in the call to fgets() below.
Comment 11 Doug Turner (:dougt) 2011-06-16 09:28:40 PDT
ted needs to review the nspr bits  -- he is a module peer of nspr.
Comment 12 Gian-Carlo Pascutto [:gcp] 2011-06-16 11:50:14 PDT
Created attachment 539852 [details] [diff] [review]
Patch v3 (1/2)
Comment 13 Gian-Carlo Pascutto [:gcp] 2011-06-16 11:51:35 PDT
Created attachment 539853 [details] [diff] [review]
Patch v3 (2/2)

Some small fixes/cleanups in addition to the comments from dougt.
Comment 14 Doug Turner (:dougt) 2011-06-16 12:08:33 PDT
Comment on attachment 539852 [details] [diff] [review]
Patch v3 (1/2)

i think you want to drop the rv from the call to do_GetService().  otherwise, looks good.
Comment 15 Doug Turner (:dougt) 2011-06-16 12:15:05 PDT
first part w/ the rv correction:

http://hg.mozilla.org/mozilla-central/rev/35a679df430b
Comment 16 Doug Turner (:dougt) 2011-06-16 12:48:38 PDT
backed out due to bustage.
Comment 17 Gian-Carlo Pascutto [:gcp] 2011-06-16 19:59:45 PDT
Created attachment 539977 [details] [diff] [review]
Patch v4 (1/2)
Comment 18 Gian-Carlo Pascutto [:gcp] 2011-06-16 20:01:42 PDT
Created attachment 539978 [details] [diff] [review]
Patch v4 (2/2)
Comment 19 Doug Turner (:dougt) 2011-06-16 21:21:08 PDT
Comment on attachment 539977 [details] [diff] [review]
Patch v4 (1/2)

less buildfail now.
Comment 21 Philipp von Weitershausen [:philikon] 2011-06-17 07:33:42 PDT
http://hg.mozilla.org/mozilla-central/rev/0be95155a3ef
Comment 22 Gian-Carlo Pascutto [:gcp] 2011-06-17 08:49:53 PDT
Reopening for part 2: it will not work correctly on some Android devices without more advanced CPU-count detection in NSPR.
Comment 23 Ted Mielczarek [:ted.mielczarek] 2011-06-24 12:37:09 PDT
Comment on attachment 539978 [details] [diff] [review]
Patch v4 (2/2)

Review of attachment 539978 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with a few changes. If you attach an updated patch I can land it in NSPR CVS for you.

::: nsprpub/pr/src/misc/prsystem.c
@@ +262,5 @@
> +      actually hotplug their cpus in heavy load, try to figure out
> +      the real number of CPUs */
> +   char buf[MAX_LINE];
> +   FILE *fin;
> +   const char *cpu_present = "/sys/devices/system/cpu/present";

This is Linux-only, so I think this ought to be inside an #if defined(LINUX).

@@ +268,5 @@
> +   fin = fopen(cpu_present, "r");
> +   if (fin != NULL) {
> +        if (fgets(buf, MAX_LINE, fin) != NULL) {
> +	    size_t strsize = strlen(buf);
> +	    if (strsize == 1 && isdigit(buf[0])) {

I can't find clear documentation about what values will be in this file. ( http://www.kernel.org/doc/Documentation/cputopology.txt only seems to give examples) Does this always hold true regardless of the actual value of buf[0]?

@@ +272,5 @@
> +	    if (strsize == 1 && isdigit(buf[0])) {
> +	        /* single core */
> +		numCpus = 1;
> +	    } else if (strsize >= 3 && strsize <= 5) {
> +	        /* should be of the form 0-999 */

Should you sanity-check the first digit here as well?
Comment 24 Gian-Carlo Pascutto [:gcp] 2011-06-24 13:06:20 PDT
>I can't find clear documentation about what values will be in this file. ( 
>http://www.kernel.org/doc/Documentation/cputopology.txt only seems to give 
>examples) Does this always hold true regardless of the actual value of buf[0]?

The docs say the format is that of cpulist_parse
http://lxr.free-electrons.com/source/include/linux/cpumask.h#L573
which really calls
http://lxr.free-electrons.com/source/lib/bitmap.c#L590

I wish the cputopology.txt doc stated it more explicitly, but I believe the present CPUs are actually those that are numbered, which would limit it to be 0-indexed without gaps.

Maybe I can add a sanity check that the first number is actually 0, the second one a dash, and pull out of this routine if it isn't (because that means the above assumption is wrong).
Comment 25 Gian-Carlo Pascutto [:gcp] 2011-06-27 10:33:01 PDT
Created attachment 542200 [details] [diff] [review]
Patch 2/2 v5 Detect cpucount with hotplugging/powersaving
Comment 26 Ted Mielczarek [:ted.mielczarek] 2011-07-01 10:06:08 PDT
Comment on attachment 542200 [details] [diff] [review]
Patch 2/2 v5 Detect cpucount with hotplugging/powersaving

Review of attachment 542200 [details] [diff] [review]:
-----------------------------------------------------------------

::: nsprpub/pr/src/misc/prsystem.c
@@ +297,2 @@
>  #elif defined(XP_UNIX)
> +    numCpus = sysconf( _SC_NPROCESSORS_CONF );

Did you mean to change the sysconf parameter here for the unix-but-not-linux case?
Comment 27 Gian-Carlo Pascutto [:gcp] 2011-07-05 11:15:37 PDT
> Did you mean to change the sysconf parameter here for the unix-but-not-linux case?

Yes, "configured" instead of "currently online" sounds closer to what we really want, because the function is only called once at startup and the result cached.

These constants aren't standardized (man says "These values also exist, but may not be standard") but systems that know the one will know the other. There's no reason for us to do it one way on Linux and differently on other Unixy systems.

The risk here is if there are use cases for _SC_NPROCESSORS_CONF != _SC_NPROCESSORS_ONLN that are very differently from what we'd expect (Power saving or replacing faulty CPUs in a running system) and would cause us to behave wrongly.

Given that we have a relevant use case where we'd prefer the new behavior, and no known cases where we'd want the old one, this change seemed sensible to me.
Comment 28 Ted Mielczarek [:ted.mielczarek] 2011-07-07 09:47:18 PDT
Okay, that seems fine.
Comment 29 Ted Mielczarek [:ted.mielczarek] 2011-07-07 10:19:50 PDT
Checked in the NSPR patch to NSPR CVS:
Checking in pr/src/misc/prsystem.c;
/cvsroot/mozilla/nsprpub/pr/src/misc/prsystem.c,v  <--  prsystem.c
new revision: 3.37; previous revision: 3.36
done
Comment 30 Doug Turner (:dougt) 2011-07-08 13:29:59 PDT
ted, when can we merge nspr into mc?  Otherwise this bug is ready to be marked fixed.
Comment 31 Ted Mielczarek [:ted.mielczarek] 2011-07-18 06:39:22 PDT
You can check with wtc, I think we can probably produce another beta tag and take a snapshot of that in m-c.
Comment 32 Wan-Teh Chang 2011-07-22 09:26:08 PDT
(In reply to comment #27)
>
> The risk here is if there are use cases for _SC_NPROCESSORS_CONF !=
> _SC_NPROCESSORS_ONLN that are very differently from what we'd expect (Power
> saving or replacing faulty CPUs in a running system) and would cause us to
> behave wrongly.
> 
> Given that we have a relevant use case where we'd prefer the new behavior,
> and no known cases where we'd want the old one, this change seemed sensible
> to me.

Gian-Carlo, thank you for your patch.

NSPR is not used just by Mozilla.  I assume you have considered
only the use cases of this function in Mozilla products.  So this
behavior change may break other NSPR-based applications.

I don't know what you meant by "the real number of CPUs".  Do
you mean the static count or the dynamic count?  Can you make it
unambiguous?  I guess you want the static count, to be consistent
with _SC_NPROCESSORS_CONF.

Re: the MAX_LINE macro: you can just pass sizeof(buf) to fgets().
Comment 33 Gian-Carlo Pascutto [:gcp] 2011-07-22 12:45:15 PDT
>NSPR is not used just by Mozilla.  I assume you have considered
>only the use cases of this function in Mozilla products.

No, we did consider other users. I presume a typical use of this function is to know the amount of threads to launch to get a task done as quickly as possible. (And note that the FF usage that lead to this bug is something entirely different) In that case, they need this fix too, because they'll get it wrong with current NSPR.

There's a risk some odd use case exists that goes well with the current behavior and wrong with the fixed one. Given that we know none of those but at least 2 opposite ones, I think fixing this is the most sensible course of action.

If anyone knows a use case that breaks with this new behavior, *now* would certainly be a good time to speak up.

>I don't know what you meant by "the real number of CPUs".  Do
>you mean the static count or the dynamic count?

The number of physical CPUs present that may be brought up into the OS at any time. That should fit with _SC_NPROCESSORS_CONF.

>Re: the MAX_LINE macro: you can just pass sizeof(buf) to fgets().

I'd prefer to get rid of the arbitrary buffer completely, but I'm not sure if there's any simple method that's portable enough for NSPR. I know getline(), but that's glibc only...
Comment 34 Doug Turner (:dougt) 2011-07-27 21:37:23 PDT
wah-teh, what are the next steps here?
Comment 35 Gian-Carlo Pascutto [:gcp] 2011-07-28 07:25:09 PDT
This just landed: 
http://hg.mozilla.org/mozilla-central/rev/ec7ad3a1de60

Note You need to log in before you can comment on or make changes to this bug.