Closed Bug 861434 Opened 11 years ago Closed 11 years ago

Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux

Categories

(NSPR :: NSPR, defect, P1)

defect

Tracking

(blocking-b2g:tef+, firefox21 wontfix, firefox22 wontfix, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)

RESOLVED FIXED
blocking-b2g tef+
Tracking Status
firefox21 --- wontfix
firefox22 --- wontfix
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: justin.lebar+bug, Assigned: gsvelto)

References

Details

(Whiteboard: [madrid][NPOTB][actually part of the build; ignore NPOTB])

Attachments

(5 files, 6 obsolete files)

10.21 KB, patch
wtc
: review+
Details | Diff | Splinter Review
1.01 KB, patch
Details | Diff | Splinter Review
9.65 KB, patch
wtc
: review+
Details | Diff | Splinter Review
4.26 KB, patch
Details | Diff | Splinter Review
866 bytes, patch
gsvelto
: review+
Details | Diff | Splinter Review
I think we should back out bug 841651.  The old behavior of doing nothing on PR_SetThreadPriority is more correct than the new behavior.

We discovered the essential problem here while looking into bug 860526: The kernel treats threads exactly the same as it treats processes for the purposes of the setpriority call and its resultant effect on the scheduler.

As a result, if you setpriority(0) on a tid, you've reniced that thread to 0, even if "the process" (which really means "the main thread") is running with some other nice.

So the mapping of priorities LOW, NORMAL, HIGH, and CRITICAL to niceness 1, 0, -1, and -2 only makes sense if the process in question is in fact running with niceness 0.  Otherwise, if the process is running at say priority 10, priority LOW is actually much higher than the main thread's priority.

Additionally, this change causes us to renice the process to 0 when its main thread is initialized (assuming we're root, which we are at this point in the process's lifecycle in B2G).  That defeats our attempts to renice processes early in their lifetime.

See http://jlebar.com/2013/4/11/Beware_of_renice.html for more on this general issue.

If it's OK with the NSPR folks, I'd like to use this bug to track backing out bug 841651.  We can use another bug for tracking its replacement.
tef+ via its dupe, bug 860526.
blocking-b2g: --- → tef+
The changeset being backed out regresses running processes as root with
non-zero nice (it forces them to zero nice).  It also does not set thread
priorities sanely when the process runs as root and has its nice set to a
non-zero value.  It also breaks running processes as non-root with negative
nice (it forces them to zero nice).
The changeset being backed out regresses running processes as root with
non-zero nice (it forces them to zero nice).  It also does not set thread
priorities sanely when the process runs as root and has its nice set to a
non-zero value.  It also breaks running processes as non-root with negative
nice (it forces them to zero nice).
Attachment #737041 - Attachment is obsolete: true
The changeset being backed out regresses running processes as root with
non-zero nice (it forces them to zero nice).  It also does not set thread
priorities sanely when the process runs as root and has its nice set to a
non-zero value.  It also breaks running processes as non-root with negative
nice (it forces them to zero nice).
Attachment #737042 - Attachment is obsolete: true
Attachment #737044 - Flags: review?(wtc)
Sorry for the bugspam; I don't use hg bzexport very often, and I thought it was failing when it was in fact succeeding.  :)
I don't think backing that patch out is a good idea, without it worker threads can effectively starve the main thread of the b2g process and a few issues we had previously would regress (e.g. installing an update making the phone unusable).

Instead I can change the behavior to make the adjustments relative to the main thread's niceness and prevent the main thread from being reniced upon initialization.

That being said I really think we should move away from renicing a process using setpriority() as there's no sane way to prevent race conditions. I'd rather improve on my cgroups patch by tweaking the existing cgroups (or adding new ones with even higher CPU allocation than foreground to prevent the problem in bug 861441 under process-buster stress).
This patch makes the nice adjustments relative; this should fix the problems with PR_SetThreadPriority() though it does not address the overall issue in bug 861441.
Attachment #737119 - Flags: feedback?(justin.lebar+bug)
Can a thread decrease its nice value if the process is not root?  That is, will priorities HIGH and URGENT work at all with non-root processes?  And will non-root threads be able to decrease their nice values after they lower them?  I suspect the answer is no to both of these, since the answer is no for processes.

We may not be able to make these things work, but it's not clear to me that supporting the NSPR interface halfway is better than not supporting it at all.  We could write an alternative function (say, in GonkHal) that adds 1 to a thread's nice, which I understand was the main thing we were trying to accomplish here.  In fact, from GonkHal we could do whatever we want with the priorities, because commands there are forwarded up to the main process, which runs as root.

>+static int pt_RelativePriority(int nice, PRThreadPriority pri)
>+{
>+    return nice + (1 - pri);
>+}

Is this right?  If the process's nice is 0 and pri is URGENT, this returns 1 - (-2) == 3.  We then use this directly as a nice value.
> And will non-root threads be able to decrease their nice values after they lower them?

s/lower/raise.  That is, will a non-root thread be able to increase its CPU priority after lowering it?
Attachment #737119 - Flags: feedback?(justin.lebar+bug)
(In reply to Justin Lebar [:jlebar] from comment #9)
> Can a thread decrease its nice value if the process is not root?  That is,
> will priorities HIGH and URGENT work at all with non-root processes?  And
> will non-root threads be able to decrease their nice values after they lower
> them?  I suspect the answer is no to both of these, since the answer is no
> for processes.

You're right, non-root threads can't lower their nice value, only increase it.

> We may not be able to make these things work, but it's not clear to me that
> supporting the NSPR interface halfway is better than not supporting it at
> all.

Well, there's good reasons for supporting it like this: first of all increasing the nice value works well and fixes all problems we had with worker threads monopolizing the CPU. Then for the b2g main process higher priorities also work and I think that's pretty much the only process where we would actually want to allow threads to get higher priorities anyway.

> We could write an alternative function (say, in GonkHal) that adds 1
> to a thread's nice, which I understand was the main thing we were trying to
> accomplish here.  In fact, from GonkHal we could do whatever we want with
> the priorities, because commands there are forwarded up to the main process,
> which runs as root.

That could have potential to be a good solution because it would also allow us to avoid races by having the main process taking care of any of these adjustments; it wouldn't work on Linux in general though were we wouldn't have GonkHal around.

It would also add complexity and if the Android model of using cgroups can be made to work well then I'd be more keen on using it than adjusting priorities one by one. Did you try receiving a call under process-buster stress with my WIP cgroups patch? I'll try writing a more elaborate one today with an additional cgroup to make sure that PROCESS_PRIORITY_FOREGROUND_HIGH gets a larger share of CPU when running alongside PROCESS_PRIORITY_FOREGROUND processes.

> >+static int pt_RelativePriority(int nice, PRThreadPriority pri)
> >+{
> >+    return nice + (1 - pri);
> >+}
> 
> Is this right?  If the process's nice is 0 and pri is URGENT, this returns 1
> - (-2) == 3.  We then use this directly as a nice value.

Wouldn't that be 0 + (1 - 3) = -2?
> -     * PR_PRIORITY_URGENT -2 */

Oh, this is a comment; URGENT isn't defined to -2.  Got it.

> it wouldn't work on Linux in 
> general though were we wouldn't have GonkHal around.

We have hal on Linux, so we can implement the same or similar code there.

I don't think we should have code in NSPR that only half works, and it sounds like there is no way to implement the full NSPR interface in general.  I'm also wary of having code in NSPR that behaves differently depending on whether or not the process is root, particularly given that B2G has both root and non-root processes.  At the very least if we implement root-only code in hal, we can indicate to the caller whether it succeeded.  (And anyway that won't be a problem if we forward the calls up to the master process, which is root.)

Backing this out of NSPR and re-implementing the required set of features in hal seems far safer, saner, and simpler than trying to fix NSPR.
To be clear, I'm not proposing that we permanently or even temporarily break anything.  We can time the uplift of this new NSPR version with landing the changes to hal.
> It would also add complexity and if the Android model of using cgroups can be made to work well then 
> I'd be more keen on using it than adjusting priorities one by one.

cgroups doesn't replace nice, right?  No matter what we need a way to change the priority of threads, and the only way we know of to do that is with nice.

> Did you try receiving a call under process-buster stress with my WIP cgroups patch?

No, but anyway it's not a fair comparison if the fg process gets as much priority as the dialer process, like you say.  Let's discuss that in bug 861441, if you don't mind, so we don't conflate the issue at hand here, which is that NSPR is broken.
(In reply to Justin Lebar [:jlebar] from comment #12)
> We have hal on Linux, so we can implement the same or similar code there.

That sounds like the best road to follow but the only practical issue is that I don't think NSPR can be made to depend on HAL. Somebody working on it might clear it up for us though.

> I don't think we should have code in NSPR that only half works, and it
> sounds like there is no way to implement the full NSPR interface in general.
> I'm also wary of having code in NSPR that behaves differently depending on
> whether or not the process is root, particularly given that B2G has both
> root and non-root processes.

Well that part already depends heavily on permissions, there's quite a few places were we explicitly check for EPERM errors and we're accepting that in those cases the operations affecting scheduling will effectively be no-ops. Under this respect my changes behave like the rest of the existing code.

> At the very least if we implement root-only
> code in hal, we can indicate to the caller whether it succeeded.  (And
> anyway that won't be a problem if we forward the calls up to the master
> process, which is root.)

Yes, having a return code would already help. One thing that bugged me at the time is that PR_SetThreadPriority() unconditionally sets the |priority| field even when failing. That's bad because otherwise one could call PR_GetThreadPriority() and check if the change actually happened or not. BTW fixing this would be trivial.

> Backing this out of NSPR and re-implementing the required set of features in
> hal seems far safer, saner, and simpler than trying to fix NSPR.
> [...]
> To be clear, I'm not proposing that we permanently or even temporarily break
> anything.  We can time the uplift of this new NSPR version with landing the
> changes to hal.

OK, that sounds reasonable; we should have a clear idea of how to replace this behavior before backing the change out though.

(In reply to Justin Lebar [:jlebar] from comment #14)
> cgroups doesn't replace nice, right?  No matter what we need a way to change
> the priority of threads, and the only way we know of to do that is with nice.

Yes, I meant that we should use cgroups for whole-process priority changes. We still need to sort out threads within a process with nice values.

> No, but anyway it's not a fair comparison if the fg process gets as much
> priority as the dialer process, like you say.  Let's discuss that in bug
> 861441, if you don't mind, so we don't conflate the issue at hand here,
> which is that NSPR is broken.

Right, I'll post the updated patch there directly.
> That sounds like the best road to follow but the only practical issue is that I don't think NSPR can 
> be made to depend on HAL.

Correct; NSPR can't depend on hal (which depends on plenty of other pieces in Gecko).  We'd be restoring NSPR back to its original behavior.
cgroups can handle threads separately.
(In reply to Mike Hommey [:glandium] from comment #17)
> cgroups can handle threads separately.

You mean you can put a thread PID into /dev/cpuctl/*/cgroup.procs? If that's the case then we could evaluate that to solve this problem too (at least for B2G where we can guarantee the existence of the appropriate cgroups).
(In reply to Gabriele Svelto [:gsvelto] from comment #18)
> (In reply to Mike Hommey [:glandium] from comment #17)
> > cgroups can handle threads separately.
> 
> You mean you can put a thread PID into /dev/cpuctl/*/cgroup.procs?

Yes.
well, cgroups.procs is for thread groups (aka processes) ; tasks is for threads.
This ultimately blocks bug 847592, so marking as tef+ (assuming the NSPR peers are OK with this change).
Whiteboard: [status: needs review and landing]
Comment on attachment 737119 [details] [diff] [review]
[PATCH] Adjust thread nice values relatively to the main thread's nice value when using PR_SetThreadPriority() on Linux

Review of attachment 737119 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc. This patch seems fine. I suggest some changes.

::: nsprpub/pr/include/private/primpl.h
@@ +1555,5 @@
>  
>  #if defined(_PR_PTHREADS)
>      pthread_t id;                   /* pthread identifier for the thread */
>  #ifdef _PR_NICE_PRIORITY_SCHEDULING
> +    pid_t pid;                      /* PID of this process             */

Is this just a performance optimization?  We can call getpid() easily
to get the PID of the process, right?

::: nsprpub/pr/src/pthreads/ptthread.c
@@ +79,5 @@
>  }
> +#elif defined(_PR_NICE_PRIORITY_SCHEDULING)
> +/*
> + * This functions maps higher priorities to lower nice values relative to the
> + * priority specified in the |pri| parameter. The corresponding relative

I think this should say "relative to the nice value specified in the
|nice| parameter", right?

@@ +149,2 @@
>  
> +    if (errno != 0) {

I think we only need to check errno if rv == -1, right?

Same for line 715.

@@ +704,5 @@
>          PR_WaitCondVar(pt_book.cv, PR_INTERVAL_NO_TIMEOUT);
>      PR_Unlock(pt_book.ml);
>  
> +    if (thred->tid == thred->pid) {
> +      /* This is the main thread, never change its priority */

This comment should explain why we don't change the priority of the
main thread.

Refusing to change the priority of the main thread could break a program
that sets the priorities of the main thread and some other thread and
expects certain scheduling priority of the main thread relative to that
other thread. So it seems that we should also assert newPri == PR_PRIORITY_NORMAL if thred->tid == thred->pid.
Attachment #737119 - Flags: review+
Comment on attachment 737044 [details] [diff] [review]
Back out 39cb7968bf9b (bug 841651).

r=wtc.

Justin, Gabriele: Do you want me to back out the old setpriority patch,
or work with Gabriele to make his setpriority approach work?
Attachment #737044 - Flags: review?(wtc) → review+
(In reply to Wan-Teh Chang from comment #22)
> Is this just a performance optimization?  We can call getpid() easily
> to get the PID of the process, right?

Yeah, I wanted to shave a syscall from each call to PR_SetThreadPriority() but you're right, it's probably not worth it.

> I think this should say "relative to the nice value specified in the
> |nice| parameter", right?

Yes, good catch.

> I think we only need to check errno if rv == -1, right?
> [...]
> Same for line 715.

Yes, I'll change that.

> Refusing to change the priority of the main thread could break a program
> that sets the priorities of the main thread and some other thread and
> expects certain scheduling priority of the main thread relative to that
> other thread. So it seems that we should also assert newPri ==
> PR_PRIORITY_NORMAL if thred->tid == thred->pid.

You mean we should skip setting the priority only when (thred->tid == thred->pid) && (newPri == PR_PRIORITY_NORMAL)? I think it might be a reasonable choice; that would make changing the priority of the main thread a relative affair too.

Regarding how to proceed I'd go with applying my patch to "fix" this behavior; coupling it with the cgroups-based approach I'm taking for bug 861441 it should give us exactly the behavior we need in B2G while also helping out for Firefox on Linux (there DOM worker threads weren't given a lower priority too before my patch). However I don't think Justin is completely fine with this so I want to hear his opinion on this.
As I've explained in this bug, I do not think that a setpriority approach /can/ work in a sane way.

This patch will allow us to decrease thread priority only when we're root.  If we're non-root, we can't ever decrease a thread's priority.

This is bad for three reasons.

 1. It's bad because NSPR will behave differently depending on whether the 
    process is running as root. 

 2. It's bad because the difference in behavior in (1) is completely silent; 
    PR_SetThreadPriority returns void, and therefore embedders have /no 
    idea/ whether the setpriority call worked.

 3. It's bad because, when the process is not running as root, the NSPR interface is 
    half-implemented.  I think it's better not to implement this interface at all than to 
    implement it halfway.

This code is also problematic because it introduces race conditions around the set/get-priority calls if a NSPR embedder uses set/get-priority itself.  There's no way for such a client to avoid racing with NSPR, short of modifying NSPR.

There is absolutely no reason this logic must live in NSPR.  We can put it in hal, where we can provide a smaller interface (e.g. "LowerThreadPriority") that always works.  Putting it in hal also gives us more flexibility if we need to change behavior in the future.

I think this patch is the wrong approach for NSPR.  It silently behaves differently depending on whether we're root.  It is not guaranteed that it does what we want for B2G (we may not want to renice worker threads on B2G if we use cgroups).  It adds race conditions that clients -- perhaps including Firefox, depending on the tack we take -- cannot avoid without modifying NSPR.
OK, but I'd just like to clear up one thing: the issues you describe for this patch apply to all other implementations of PR_SetThreadPriority() too. Using pthread_setschedprio() on other POSIX platforms or SetThreadPriority() on Windows is subject to the same races and permission issues as setpriority() on Linux. Since it is my understanding that the whole idea of an NSPR embedder is that it will not use the native functionality but use NSPR instead this shouldn't be a problem. It has become a problem in B2G because NSPR doesn't offer functionality to schedule processes and so we essentially started bypassing it.

That being said we already have users of PR_SetThreadPriority(), PR_Init() and PR_CreateThread() that specify a value different from PR_PRIORITY_NORMAL. We also document in NSPR that whichever thread calls PR_Initialize() will be set to run at the PR_PRIORITY_NORMAL level. So my question is: what shall we do with those?

If the general plan is to move away entirely from NSPR and use hal instead then I'm fine with it but I'd like to know what we should do with the existing users because they all suffer already from the problems you describe (raciness with native adjustments, different behavior depending on permissions, inability to check if the change happened though the latter is fixable).
> Using pthread_setschedprio() on other POSIX platforms or SetThreadPriority() on Windows 
> is subject to the same races and permission issues as setpriority() on Linux. 

Does SetThreadPriority on Windows actually have the same permission issues on Windows?  You can lower a thread's priority using SetThreadPriority iff you're root?  Similarly with pthread_setschedprio() on, say, Mac?  That would surprise me, but I've been really surprised here already.

On Windows we don't appear to have the same get/set race condition; we call SetThreadPriority without ever reading the priority first, because we're just setting an absolute priority.  I suppose if someone else did a get/set then they'd have a race, but at least NSPR itself doesn't have races.  But this is pretty orthogonal to the question at hand, I think.

> So my question is: what shall we do with those?

If things don't work properly today with NSPR, we should move them into hal, where we have flexibility to Do What We Want.  I think the only reason not to move functionality that we want to change into hal is when we can't change all consumers (e.g. NSS, or when there are tons of consumers).
(In reply to Justin Lebar [:jlebar] from comment #27)
> Does SetThreadPriority on Windows actually have the same permission issues
> on Windows?  You can lower a thread's priority using SetThreadPriority iff
> you're root?  Similarly with pthread_setschedprio() on, say, Mac?  That
> would surprise me, but I've been really surprised here already.

Both are documented to be fallible and in the case of pthread_setschedprio() that's with an EPERM error code too so my guess is that both will be subject to the current user permissions. We actually check for those errors in the code explicitly; what we fail to do is reflect their failure in the thread's priority field and I think that part should be addressed irrespective of what we do here.

> If things don't work properly today with NSPR, we should move them into hal,
> where we have flexibility to Do What We Want.  I think the only reason not
> to move functionality that we want to change into hal is when we can't
> change all consumers (e.g. NSS, or when there are tons of consumers).

OK so, let's back out my change but then we need to at least file a follow up to prevent regressing bug 841041 which was the motivation for the change.
> what we fail to do is reflect their failure in the thread's priority field and I think 
> that part should be addressed irrespective of what we do here.

Yeah, that sounds like a bug.

> we need to at least file a follow up to prevent regressing bug 841041 which was the 
> motivation for the change.

Yes, absolutely.
Whiteboard: [status: needs review and landing] → [status: needs review and landing][madrid]
It is fine for PR_SetThreadPriority to fail silently when the caller
does not have sufficient privilege to change thread priorities.

The race condition that Justin described is acceptable to NSPR-based
programs. It is one of the limitations of any platform abstraction layer.
In general a feature will work correctly only if the program does it
through NSPR exclusively. We avoid this limitation whenever possible,
but there are cases where we can't avoid it.

Gabriele: I'll let you decide if you want to fix your get/setpriority
based implementation of PR_SetThreadPriority. It is fine by me if
Firefox won't use PR_SetThreadPriority, but it seems useful to make
PR_SetThreadPriority work on Linux.
If we take the fix instead of the backout, can we please add documentation to PR_SetThreadPriority indicating what consumers should expect?
(In reply to Wan-Teh Chang from comment #30)
> Gabriele: I'll let you decide if you want to fix your get/setpriority
> based implementation of PR_SetThreadPriority. It is fine by me if
> Firefox won't use PR_SetThreadPriority, but it seems useful to make
> PR_SetThreadPriority work on Linux.

Thanks, I'll think about it a little bit more then because I don't think we're in a hurry to fix this right now but see below.

(In reply to Justin Lebar [:jlebar] from comment #31)
> If we take the fix instead of the backout, can we please add documentation
> to PR_SetThreadPriority indicating what consumers should expect?

I'd rather first finish fixing bug 861441 which IMHO is more important for the problems we're seeing in B2G. Once we have that one working we can come back and re-evaluate this depending on how we fixed that one. Obviously if we end up using setpriority() there we'll have to change how PR_SetThreadPriority() works but I'm fairly confident my cgroups-based solution will work well there.

BTW I noticed we have two pieces of code using setpriority() directly and at least one looks wrong:

https://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#848

From its looks it was probably made with the assumption that it would alter the priority of the entire process. Should we change it to using hal::SetProcessPriority()? The other one is in the updater but that one is not even linking to NSPR so I think it's alright.
> I don't think we're in a hurry to fix this right now.  [...] I'd rather first finish 
> fixing bug 861441 which IMHO is more important for the problems we're seeing in B2G.

If the determination is that we're going to use cgroups in bug 861441, I agree that bug is more important.  But all things being equal, I'd prefer to use nice, since it's a simpler, more stable interface, so I think we're still waiting on more measurements before we commit to cgroups.  In particular we need to ensure that cgroups plays well with nice, or otherwise we need to change how we re-prioritize threads in b2g.

If we have a patch that you and wtc are happy with, I'd rather we land it.  We can worry about uplifting it to b2g pending bug 861441.
(Please consider "superreview" as "second review".)

We could change PR_SetThreadPriority to set
    thred->priority = newPri;
only if it changed the thread's priority successfully.
But that requires more changes, so for now I just documented
the current behavior.
Attachment #739233 - Flags: superreview?(justin.lebar+bug)
Attachment #739233 - Flags: review?(gsvelto)
Comment on attachment 739233 [details] [diff] [review]
Document the limitations and current behavior of PR_SetThreadPriority

It's good to have it documented especially in the light of our recent troubles with it. As for not setting |priority| when we fail, we can fix that in a follow-up and then change the comment at the same time; for now it's good to have it spelled out so that one knows it cannot rely on that field reflecting the actual status of the thread.
Attachment #739233 - Flags: review?(gsvelto) → review+
Comment on attachment 739233 [details] [diff] [review]
Document the limitations and current behavior of PR_SetThreadPriority

r=me, except I'd mention somewhere that root access is often required to /raise/ thread priorities.  The main gotcha on Linux is that you don't need permission to /lower/ a thread priority, but you do need a permission to /raise/ the priority.
Attachment #739233 - Flags: superreview?(justin.lebar+bug) → superreview+
Whiteboard: [status: needs review and landing][madrid] → [status: needs more investigation][madrid]
I called out the need for a special permission to raise thread priorities
as Justin suggested.
https://hg.mozilla.org/projects/nspr/rev/14d8789ebb23
Attachment #739233 - Attachment is obsolete: true
Attachment #739714 - Flags: checked-in+
Wan-Teh, it's fine with me if you land Gabriele's patch here, but can we please do that?  The situation as-is is pretty broken for b2g.
I'm not convinced that we are going to figure out what to do in bug 861441 within the next day or two.  Something complex is going on in the scheduler.

We have two bugs blocking certification that we'd like closed by Wednesday: bug 847592 and bug 860799.  The former requires bug 861441, but the latter does not.  But I cannot fix the latter without either this bug or bug 861441.

To keep our options open, I'd like to land a fix here as soon as possible, please.  If you can't land this tomorrow, we may have to patch NSPR in b2g18.
I've refreshed the patch addressing your comments and the other things that emerged during the discussion. The following changes were made:

- The |priority| field is not changed if the call fails; the documentation was changed to reflect this. The only codepath I haven't changed in this regard is the one using pthread_setprio() which is obsolete and I couldn't find some conclusive documentation regarding how to detect failures.

- I've removed the |pid| field and the getpid() calls, to obtain the main process' priority we only need to call getpriority(PRIO_PROCESS, 0).

- This patch allows the main thread priority to be changed relatively like other the other threads do. This means that using PR_PRIORITY_NORMAL on the main thread is effectively a no-op; other values will raise and lower priority as appropriate though.
Assignee: wtc → gsvelto
Attachment #737119 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #740728 - Flags: review?(wtc)
Actually I've changed my mind, this patch only has the changes to do the relative adjustments but I've stripped everything regarding updating the thred->priority field; that belongs to a follow-up and I don't want to add more complexity here.
Attachment #740728 - Attachment is obsolete: true
Attachment #740728 - Flags: review?(wtc)
Attachment #740747 - Flags: review?(wtc)
Here's the back-ported patch for the mozilla-b2g18 tree. Besides applying the changes I've added the patch to nsprpub/patches and documented it in nsprpub/patches/README.
Summary: Back out bug 841651 (make PR_SetThreadPriority work on Bionic), because it doesn't do what it intends to do → Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux
I fixed two comments and folded long lines in Gabriele's patch v3
and checked it in to the NSPR hg repository (NSPR 4.10).
Attachment #740747 - Attachment is obsolete: true
Attachment #740747 - Flags: review?(wtc)
Attachment #741106 - Flags: checked-in+
Comment on attachment 741106 [details] [diff] [review]
[PATCH v4] Adjust thread nice values relatively to the main thread's nice value when using PR_SetThreadPriority() on Linux, by Gabriele Svelto

Pushed to the NSPR hg repository:
https://hg.mozilla.org/projects/nspr/rev/8927f44164de


Gabriele: this patch interdiff shows the changes I made to your patch v3:

https://bugzilla.mozilla.org/attachment.cgi?oldid=740747&action=interdiff&newid=741106&headers=1
Comment on attachment 741106 [details] [diff] [review]
[PATCH v4] Adjust thread nice values relatively to the main thread's nice value when using PR_SetThreadPriority() on Linux, by Gabriele Svelto

Review of attachment 741106 [details] [diff] [review]:
-----------------------------------------------------------------

Gabriele:

I have three simple questions, and one complicated question
on your patch.

::: pr/src/pthreads/ptthread.c
@@ +143,4 @@
>       */
>      tid = gettid();
> +    errno = 0;
> +    rv = getpriority(PRIO_PROCESS, 0);

Rather than getting the main thread's nice value every time, I
wonder if we should just get it during NSPR initialization
(for example, in _PR_InitThreads()) and cache it inside the
'pt_book' global struct. We can do this near where we initialize
pt_book.minPrio and pt_book.maxPrio.

This will also allow us to use pt_PriorityMap for PR_NICE_PRIORITY_SCHEDULING
because we will get the main thread's nice value from a global
variable (a new member of pt_book).

More info:

1. This MXR query shows we only call _PR_InitThreads() with
priority=PR_PRIORITY_NORMAL:
http://mxr.mozilla.org/nspr/ident?i=_PR_InitThreads

and this shows the 'priority' argument of PR_Init() is ignored:
http://mxr.mozilla.org/nspr/source/pr/src/misc/prinit.c#263

So we can assume the thread that initializes NSPR has the
priority PR_PRIORITY_NORMAL.

2. Usually the main thread initializes NSPR, but this is not
required. So, in _PR_InitThreads(), we should check if the
main thread and the current thread have the same nice value.
If they have different nice values, our assumption in (1) will
be invalid, and we should print a warning message to stderr.

@@ +147,4 @@
>  
> +    /* If we cannot read the main thread's nice value don't try to change the
> +     * new thread's nice value. */
> +    if (errno == 0) {

Should this test be rv != -1 || errno == 0?

@@ +706,3 @@
>  
> +    /* Do not proceed unless we know the main thread's nice value. */
> +    if (errno == 0) {

Should this test be rv != -1 || errno == 0?

@@ +710,5 @@
> +                         pt_RelativePriority(rv, newPri));
> +
> +        if (rv == -1)
> +        {
> +            /* We don't set pt_schedpriv to EPERM in case errno == EPERM

Could you update this comment?  The test used to be rv == -1 && errno == EPERM.
Now the test is simply rv == -1, so it is confusing for the comment to mention
errno == EPERM.

Also, if errno is not EPERM, then the log message "no thread scheduling privilege"
isn't correct.
Comment on attachment 740750 [details] [diff] [review]
[PATCH v3 mozilla-b2g18] Adjust thread nice values relatively to the main thread's nice value when using PR_SetThreadPriority() on Linux

Review of attachment 740750 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc.

You can use this patch as is in mozilla-b2g18. The changes I made were all
in comments or formatting.
Attachment #740750 - Flags: review+
Thank you for reviewing & checking in, and also thank you for your changes!

> I have three simple questions, and one complicated question
> on your patch.
> [...]
> This will also allow us to use pt_PriorityMap for PR_NICE_PRIORITY_SCHEDULING
> because we will get the main thread's nice value from a global
> variable (a new member of pt_book).

The issue with this is that if the main thread's priority changes at runtime we won't catch it. What Justin has been doing in his B2G-specific patches is to re-nice all the threads in a process by adjusting them up and down appropriately; with my patch always using relative values we should get consistent scheduling of all threads (minus race conditions but those are unavoidable unfortunately).

> @@ +147,4 @@
> >  
> > +    /* If we cannot read the main thread's nice value don't try to change the
> > +     * new thread's nice value. */
> > +    if (errno == 0) {
> 
> Should this test be rv != -1 || errno == 0?
> 
> @@ +706,3 @@
> >  
> > +    /* Do not proceed unless we know the main thread's nice value. */
> > +    if (errno == 0) {
> 
> Should this test be rv != -1 || errno == 0?

Yes to both, but since errno == 0 is true when getpriority() succeeds I thought adding the other check was unnecessary.

> Could you update this comment?  The test used to be rv == -1 && errno ==
> EPERM.
> Now the test is simply rv == -1, so it is confusing for the comment to
> mention
> errno == EPERM.
> 
> Also, if errno is not EPERM, then the log message "no thread scheduling
> privilege"
> isn't correct.

Right, I changed it because I wanted the test to be more robust even though I knew we could only ever hit EPERM as an error but as you point out the comment is misleading and the LOG message can be wrong.

I was thinking of opening a follow-up to fix |thred->priority| being set even when the underlying thread priority wasn't changed. This would mean wading through some old code and ensuring all error conditions (such as the one above) are correct and properly documented; what do you think?
(In reply to Wan-Teh Chang from comment #46)
> Rather than getting the main thread's nice value every time, I
> wonder if we should just get it during NSPR initialization
> (for example, in _PR_InitThreads()) and cache it inside the
> 'pt_book' global struct. We can do this near where we initialize
> pt_book.minPrio and pt_book.maxPrio.

That would assume that nothing external is going to alter the thread's nice value. While that might work in the context of b2g, that's not something that can work everywhere.
Attachment 740750 [details] [diff] should land on top of mozilla-b2g18 to resolve the tef+ness of this bug. I'm a bit puzzled by the try run in comment 43. On Fedora the build shows up as orange but the logs don't show any error; is it because the xpcshell tests did not run even though I had requested them? Anyway the other oranges look like known intermittent issues.
Keywords: checkin-needed
> On Fedora the build shows up as orange but the logs don't show any error; is it because 
> the xpcshell tests did not run even though I had requested them?

The xpcshell tests did run -- on AWS, which we call "ubuntu".

It might be worth asking a sheriff why those builds are orange; I also don't see anything in the logs.
(In reply to Gabriele Svelto [:gsvelto] from comment #50)
> Attachment 740750 [details] [diff] should land on top of mozilla-b2g18 to
> resolve the tef+ness of this bug.

https://hg.mozilla.org/releases/mozilla-b2g18/rev/ae6cac4083ae
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/89796cc8d74c

I assume that this bug will be fixed when NSPR is actually updated on m-c, so leaving open until then.
Whiteboard: [status: needs more investigation][madrid] → [status: fixed for v1.0.1, open for m-c landing][madrid]
Marking as npotb to get this out of our queries (since this hasn't been resolved on m-c yet).
Whiteboard: [status: fixed for v1.0.1, open for m-c landing][madrid] → [status: fixed for v1.0.1, open for m-c landing][madrid][NPOTB]
Whiteboard: [status: fixed for v1.0.1, open for m-c landing][madrid][NPOTB] → [status: fixed for v1.0.1, open for m-c landing][madrid][NPOTB][actually part of the build; ignore NPOTB]
(In reply to Gabriele Svelto [:gsvelto] from comment #48)
>
> I was thinking of opening a follow-up to fix |thred->priority| being set
> even when the underlying thread priority wasn't changed. This would mean
> wading through some old code and ensuring all error conditions (such as the
> one above) are correct and properly documented; what do you think?

Fixing |thred->priority| could be worthwhile, but only if you are motivated
to work on it. You will also need to look at not only PR_SetThreadPriority
but also PR_CreateThread (which sets |thred->priority| of the new thread)
and _PR_InitThreads and PR_AttachThread (which set |thred->priority| of the
current thread.)

The current code has the nice property that PR_GetThreadPriority and
PR_SetThreadPriority are consistent with each other. We should think about
whether anything may break if we lost this consistency.

In summary, it may be worth doing, but I am somewhat in favor of just
documenting the current "best-effort" behavior of setting/specifying
thread priority in NSPR.
(In reply to Gabriele Svelto [:gsvelto] from comment #48)
>
> ...What Justin has been doing in his B2G-specific patches is
> to re-nice all the threads in a process by adjusting them up and down
> appropriately; ...

Is this the patch you're referring to?

https://bugzilla.mozilla.org/show_bug.cgi?id=861441#c48

That is a brute-force way to re-nice all the threads in a process!
It's too bad that there isn't an easier way.
Comment on attachment 741953 [details] [diff] [review]
Update the log message for setpriority failure

Review of attachment 741953 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for fixing this for me. As for changing the way |thred->priority| is updated then maybe it's better to leave it as-is now that you've documented it. I think that the two biggest reasons not to touch it is that some code somewhere out there might depend on the current behavior, and that this would need to be changed also for obscure/obsolete platforms and I don't think that's actually worth the effort.
Attachment #741953 - Flags: review?(gsvelto) → review+
(In reply to Wan-Teh Chang from comment #56)
> Is this the patch you're referring to?
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=861441#c48
> 
> That is a brute-force way to re-nice all the threads in a process!
> It's too bad that there isn't an easier way.

Yes, that one. We would have liked to use control groups instead but we quickly discovered that the Android kernel we're using has a buggy implementation and working around the bug is no less race-prone than setting all the process' threads nice values (and setting the nice values is far more likely not to make us trip over some obscure kernel bug).
Comment on attachment 741953 [details] [diff] [review]
Update the log message for setpriority failure

https://hg.mozilla.org/projects/nspr/rev/35ea28b6a84b
Attachment #741953 - Flags: checked-in+
(In reply to Ryan VanderMeulen [:RyanVM] from comment #52)
> 
> I assume that this bug will be fixed when NSPR is actually updated on m-c,
> so leaving open until then.

I'm nervous about leaving a bug open that's for m-c purpose only when we don't have appropriate blocking flag attached. Technically, this would be a koi blocker for the m-c patch since this is eventually needed on m-c as well.

Could we file a followup bug updating m-c here and close this for b2g18 & 1.01 patches?
Flags: needinfo?(ryanvm)
Depends on: 873767
Actually, I'm going to just do that. I filed bug 873767 for the central landing. I'm closing this bug specifically for the b2g18 and 1.01 landing.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Flags: needinfo?(ryanvm)
Resolution: --- → FIXED
Summary: Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux → [b2g18 only] Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux
The patch has been pushed to mozilla-central.
Priority: -- → P1
Summary: [b2g18 only] Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux → Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux
Whiteboard: [status: fixed for v1.0.1, open for m-c landing][madrid][NPOTB][actually part of the build; ignore NPOTB] → [madrid][NPOTB][actually part of the build; ignore NPOTB]
Target Milestone: --- → 4.10
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: