Closed
Bug 861434
Opened 11 years ago
Closed 11 years ago
Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux
Categories
(NSPR :: NSPR, defect, P1)
NSPR
NSPR
Tracking
(blocking-b2g:tef+, firefox21 wontfix, firefox22 wontfix, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)
People
(Reporter: justin.lebar+bug, Assigned: gsvelto)
References
Details
(Whiteboard: [madrid][NPOTB][actually part of the build; ignore NPOTB])
Attachments
(5 files, 6 obsolete files)
10.21 KB,
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
1.01 KB,
patch
|
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
9.65 KB,
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
4.26 KB,
patch
|
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
866 bytes,
patch
|
gsvelto
:
review+
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
I think we should back out bug 841651. The old behavior of doing nothing on PR_SetThreadPriority is more correct than the new behavior. We discovered the essential problem here while looking into bug 860526: The kernel treats threads exactly the same as it treats processes for the purposes of the setpriority call and its resultant effect on the scheduler. As a result, if you setpriority(0) on a tid, you've reniced that thread to 0, even if "the process" (which really means "the main thread") is running with some other nice. So the mapping of priorities LOW, NORMAL, HIGH, and CRITICAL to niceness 1, 0, -1, and -2 only makes sense if the process in question is in fact running with niceness 0. Otherwise, if the process is running at say priority 10, priority LOW is actually much higher than the main thread's priority. Additionally, this change causes us to renice the process to 0 when its main thread is initialized (assuming we're root, which we are at this point in the process's lifecycle in B2G). That defeats our attempts to renice processes early in their lifetime. See http://jlebar.com/2013/4/11/Beware_of_renice.html for more on this general issue. If it's OK with the NSPR folks, I'd like to use this bug to track backing out bug 841651. We can use another bug for tracking its replacement.
Reporter | ||
Comment 3•11 years ago
|
||
The changeset being backed out regresses running processes as root with non-zero nice (it forces them to zero nice). It also does not set thread priorities sanely when the process runs as root and has its nice set to a non-zero value. It also breaks running processes as non-root with negative nice (it forces them to zero nice).
Reporter | ||
Comment 4•11 years ago
|
||
The changeset being backed out regresses running processes as root with non-zero nice (it forces them to zero nice). It also does not set thread priorities sanely when the process runs as root and has its nice set to a non-zero value. It also breaks running processes as non-root with negative nice (it forces them to zero nice).
Reporter | ||
Updated•11 years ago
|
Attachment #737041 -
Attachment is obsolete: true
Reporter | ||
Comment 5•11 years ago
|
||
The changeset being backed out regresses running processes as root with non-zero nice (it forces them to zero nice). It also does not set thread priorities sanely when the process runs as root and has its nice set to a non-zero value. It also breaks running processes as non-root with negative nice (it forces them to zero nice).
Reporter | ||
Updated•11 years ago
|
Attachment #737042 -
Attachment is obsolete: true
Reporter | ||
Updated•11 years ago
|
Attachment #737044 -
Flags: review?(wtc)
Reporter | ||
Comment 6•11 years ago
|
||
Sorry for the bugspam; I don't use hg bzexport very often, and I thought it was failing when it was in fact succeeding. :)
Assignee | ||
Comment 7•11 years ago
|
||
I don't think backing that patch out is a good idea, without it worker threads can effectively starve the main thread of the b2g process and a few issues we had previously would regress (e.g. installing an update making the phone unusable). Instead I can change the behavior to make the adjustments relative to the main thread's niceness and prevent the main thread from being reniced upon initialization. That being said I really think we should move away from renicing a process using setpriority() as there's no sane way to prevent race conditions. I'd rather improve on my cgroups patch by tweaking the existing cgroups (or adding new ones with even higher CPU allocation than foreground to prevent the problem in bug 861441 under process-buster stress).
Assignee | ||
Comment 8•11 years ago
|
||
This patch makes the nice adjustments relative; this should fix the problems with PR_SetThreadPriority() though it does not address the overall issue in bug 861441.
Attachment #737119 -
Flags: feedback?(justin.lebar+bug)
Reporter | ||
Comment 9•11 years ago
|
||
Can a thread decrease its nice value if the process is not root? That is, will priorities HIGH and URGENT work at all with non-root processes? And will non-root threads be able to decrease their nice values after they lower them? I suspect the answer is no to both of these, since the answer is no for processes.
We may not be able to make these things work, but it's not clear to me that supporting the NSPR interface halfway is better than not supporting it at all. We could write an alternative function (say, in GonkHal) that adds 1 to a thread's nice, which I understand was the main thing we were trying to accomplish here. In fact, from GonkHal we could do whatever we want with the priorities, because commands there are forwarded up to the main process, which runs as root.
>+static int pt_RelativePriority(int nice, PRThreadPriority pri)
>+{
>+ return nice + (1 - pri);
>+}
Is this right? If the process's nice is 0 and pri is URGENT, this returns 1 - (-2) == 3. We then use this directly as a nice value.
Reporter | ||
Comment 10•11 years ago
|
||
> And will non-root threads be able to decrease their nice values after they lower them?
s/lower/raise. That is, will a non-root thread be able to increase its CPU priority after lowering it?
Reporter | ||
Updated•11 years ago
|
Attachment #737119 -
Flags: feedback?(justin.lebar+bug)
Assignee | ||
Comment 11•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #9) > Can a thread decrease its nice value if the process is not root? That is, > will priorities HIGH and URGENT work at all with non-root processes? And > will non-root threads be able to decrease their nice values after they lower > them? I suspect the answer is no to both of these, since the answer is no > for processes. You're right, non-root threads can't lower their nice value, only increase it. > We may not be able to make these things work, but it's not clear to me that > supporting the NSPR interface halfway is better than not supporting it at > all. Well, there's good reasons for supporting it like this: first of all increasing the nice value works well and fixes all problems we had with worker threads monopolizing the CPU. Then for the b2g main process higher priorities also work and I think that's pretty much the only process where we would actually want to allow threads to get higher priorities anyway. > We could write an alternative function (say, in GonkHal) that adds 1 > to a thread's nice, which I understand was the main thing we were trying to > accomplish here. In fact, from GonkHal we could do whatever we want with > the priorities, because commands there are forwarded up to the main process, > which runs as root. That could have potential to be a good solution because it would also allow us to avoid races by having the main process taking care of any of these adjustments; it wouldn't work on Linux in general though were we wouldn't have GonkHal around. It would also add complexity and if the Android model of using cgroups can be made to work well then I'd be more keen on using it than adjusting priorities one by one. Did you try receiving a call under process-buster stress with my WIP cgroups patch? I'll try writing a more elaborate one today with an additional cgroup to make sure that PROCESS_PRIORITY_FOREGROUND_HIGH gets a larger share of CPU when running alongside PROCESS_PRIORITY_FOREGROUND processes. > >+static int pt_RelativePriority(int nice, PRThreadPriority pri) > >+{ > >+ return nice + (1 - pri); > >+} > > Is this right? If the process's nice is 0 and pri is URGENT, this returns 1 > - (-2) == 3. We then use this directly as a nice value. Wouldn't that be 0 + (1 - 3) = -2?
Reporter | ||
Comment 12•11 years ago
|
||
> - * PR_PRIORITY_URGENT -2 */ Oh, this is a comment; URGENT isn't defined to -2. Got it. > it wouldn't work on Linux in > general though were we wouldn't have GonkHal around. We have hal on Linux, so we can implement the same or similar code there. I don't think we should have code in NSPR that only half works, and it sounds like there is no way to implement the full NSPR interface in general. I'm also wary of having code in NSPR that behaves differently depending on whether or not the process is root, particularly given that B2G has both root and non-root processes. At the very least if we implement root-only code in hal, we can indicate to the caller whether it succeeded. (And anyway that won't be a problem if we forward the calls up to the master process, which is root.) Backing this out of NSPR and re-implementing the required set of features in hal seems far safer, saner, and simpler than trying to fix NSPR.
Reporter | ||
Comment 13•11 years ago
|
||
To be clear, I'm not proposing that we permanently or even temporarily break anything. We can time the uplift of this new NSPR version with landing the changes to hal.
Reporter | ||
Comment 14•11 years ago
|
||
> It would also add complexity and if the Android model of using cgroups can be made to work well then > I'd be more keen on using it than adjusting priorities one by one. cgroups doesn't replace nice, right? No matter what we need a way to change the priority of threads, and the only way we know of to do that is with nice. > Did you try receiving a call under process-buster stress with my WIP cgroups patch? No, but anyway it's not a fair comparison if the fg process gets as much priority as the dialer process, like you say. Let's discuss that in bug 861441, if you don't mind, so we don't conflate the issue at hand here, which is that NSPR is broken.
Assignee | ||
Comment 15•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #12) > We have hal on Linux, so we can implement the same or similar code there. That sounds like the best road to follow but the only practical issue is that I don't think NSPR can be made to depend on HAL. Somebody working on it might clear it up for us though. > I don't think we should have code in NSPR that only half works, and it > sounds like there is no way to implement the full NSPR interface in general. > I'm also wary of having code in NSPR that behaves differently depending on > whether or not the process is root, particularly given that B2G has both > root and non-root processes. Well that part already depends heavily on permissions, there's quite a few places were we explicitly check for EPERM errors and we're accepting that in those cases the operations affecting scheduling will effectively be no-ops. Under this respect my changes behave like the rest of the existing code. > At the very least if we implement root-only > code in hal, we can indicate to the caller whether it succeeded. (And > anyway that won't be a problem if we forward the calls up to the master > process, which is root.) Yes, having a return code would already help. One thing that bugged me at the time is that PR_SetThreadPriority() unconditionally sets the |priority| field even when failing. That's bad because otherwise one could call PR_GetThreadPriority() and check if the change actually happened or not. BTW fixing this would be trivial. > Backing this out of NSPR and re-implementing the required set of features in > hal seems far safer, saner, and simpler than trying to fix NSPR. > [...] > To be clear, I'm not proposing that we permanently or even temporarily break > anything. We can time the uplift of this new NSPR version with landing the > changes to hal. OK, that sounds reasonable; we should have a clear idea of how to replace this behavior before backing the change out though. (In reply to Justin Lebar [:jlebar] from comment #14) > cgroups doesn't replace nice, right? No matter what we need a way to change > the priority of threads, and the only way we know of to do that is with nice. Yes, I meant that we should use cgroups for whole-process priority changes. We still need to sort out threads within a process with nice values. > No, but anyway it's not a fair comparison if the fg process gets as much > priority as the dialer process, like you say. Let's discuss that in bug > 861441, if you don't mind, so we don't conflate the issue at hand here, > which is that NSPR is broken. Right, I'll post the updated patch there directly.
Reporter | ||
Comment 16•11 years ago
|
||
> That sounds like the best road to follow but the only practical issue is that I don't think NSPR can
> be made to depend on HAL.
Correct; NSPR can't depend on hal (which depends on plenty of other pieces in Gecko). We'd be restoring NSPR back to its original behavior.
Comment 17•11 years ago
|
||
cgroups can handle threads separately.
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #17) > cgroups can handle threads separately. You mean you can put a thread PID into /dev/cpuctl/*/cgroup.procs? If that's the case then we could evaluate that to solve this problem too (at least for B2G where we can guarantee the existence of the appropriate cgroups).
Comment 19•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #18) > (In reply to Mike Hommey [:glandium] from comment #17) > > cgroups can handle threads separately. > > You mean you can put a thread PID into /dev/cpuctl/*/cgroup.procs? Yes.
Comment 20•11 years ago
|
||
well, cgroups.procs is for thread groups (aka processes) ; tasks is for threads.
Comment 21•11 years ago
|
||
This ultimately blocks bug 847592, so marking as tef+ (assuming the NSPR peers are OK with this change).
Updated•11 years ago
|
Whiteboard: [status: needs review and landing]
Comment 22•11 years ago
|
||
Comment on attachment 737119 [details] [diff] [review] [PATCH] Adjust thread nice values relatively to the main thread's nice value when using PR_SetThreadPriority() on Linux Review of attachment 737119 [details] [diff] [review]: ----------------------------------------------------------------- r=wtc. This patch seems fine. I suggest some changes. ::: nsprpub/pr/include/private/primpl.h @@ +1555,5 @@ > > #if defined(_PR_PTHREADS) > pthread_t id; /* pthread identifier for the thread */ > #ifdef _PR_NICE_PRIORITY_SCHEDULING > + pid_t pid; /* PID of this process */ Is this just a performance optimization? We can call getpid() easily to get the PID of the process, right? ::: nsprpub/pr/src/pthreads/ptthread.c @@ +79,5 @@ > } > +#elif defined(_PR_NICE_PRIORITY_SCHEDULING) > +/* > + * This functions maps higher priorities to lower nice values relative to the > + * priority specified in the |pri| parameter. The corresponding relative I think this should say "relative to the nice value specified in the |nice| parameter", right? @@ +149,2 @@ > > + if (errno != 0) { I think we only need to check errno if rv == -1, right? Same for line 715. @@ +704,5 @@ > PR_WaitCondVar(pt_book.cv, PR_INTERVAL_NO_TIMEOUT); > PR_Unlock(pt_book.ml); > > + if (thred->tid == thred->pid) { > + /* This is the main thread, never change its priority */ This comment should explain why we don't change the priority of the main thread. Refusing to change the priority of the main thread could break a program that sets the priorities of the main thread and some other thread and expects certain scheduling priority of the main thread relative to that other thread. So it seems that we should also assert newPri == PR_PRIORITY_NORMAL if thred->tid == thred->pid.
Attachment #737119 -
Flags: review+
Comment 23•11 years ago
|
||
Comment on attachment 737044 [details] [diff] [review] Back out 39cb7968bf9b (bug 841651). r=wtc. Justin, Gabriele: Do you want me to back out the old setpriority patch, or work with Gabriele to make his setpriority approach work?
Attachment #737044 -
Flags: review?(wtc) → review+
Assignee | ||
Comment 24•11 years ago
|
||
(In reply to Wan-Teh Chang from comment #22) > Is this just a performance optimization? We can call getpid() easily > to get the PID of the process, right? Yeah, I wanted to shave a syscall from each call to PR_SetThreadPriority() but you're right, it's probably not worth it. > I think this should say "relative to the nice value specified in the > |nice| parameter", right? Yes, good catch. > I think we only need to check errno if rv == -1, right? > [...] > Same for line 715. Yes, I'll change that. > Refusing to change the priority of the main thread could break a program > that sets the priorities of the main thread and some other thread and > expects certain scheduling priority of the main thread relative to that > other thread. So it seems that we should also assert newPri == > PR_PRIORITY_NORMAL if thred->tid == thred->pid. You mean we should skip setting the priority only when (thred->tid == thred->pid) && (newPri == PR_PRIORITY_NORMAL)? I think it might be a reasonable choice; that would make changing the priority of the main thread a relative affair too. Regarding how to proceed I'd go with applying my patch to "fix" this behavior; coupling it with the cgroups-based approach I'm taking for bug 861441 it should give us exactly the behavior we need in B2G while also helping out for Firefox on Linux (there DOM worker threads weren't given a lower priority too before my patch). However I don't think Justin is completely fine with this so I want to hear his opinion on this.
Reporter | ||
Comment 25•11 years ago
|
||
As I've explained in this bug, I do not think that a setpriority approach /can/ work in a sane way. This patch will allow us to decrease thread priority only when we're root. If we're non-root, we can't ever decrease a thread's priority. This is bad for three reasons. 1. It's bad because NSPR will behave differently depending on whether the process is running as root. 2. It's bad because the difference in behavior in (1) is completely silent; PR_SetThreadPriority returns void, and therefore embedders have /no idea/ whether the setpriority call worked. 3. It's bad because, when the process is not running as root, the NSPR interface is half-implemented. I think it's better not to implement this interface at all than to implement it halfway. This code is also problematic because it introduces race conditions around the set/get-priority calls if a NSPR embedder uses set/get-priority itself. There's no way for such a client to avoid racing with NSPR, short of modifying NSPR. There is absolutely no reason this logic must live in NSPR. We can put it in hal, where we can provide a smaller interface (e.g. "LowerThreadPriority") that always works. Putting it in hal also gives us more flexibility if we need to change behavior in the future. I think this patch is the wrong approach for NSPR. It silently behaves differently depending on whether we're root. It is not guaranteed that it does what we want for B2G (we may not want to renice worker threads on B2G if we use cgroups). It adds race conditions that clients -- perhaps including Firefox, depending on the tack we take -- cannot avoid without modifying NSPR.
Assignee | ||
Comment 26•11 years ago
|
||
OK, but I'd just like to clear up one thing: the issues you describe for this patch apply to all other implementations of PR_SetThreadPriority() too. Using pthread_setschedprio() on other POSIX platforms or SetThreadPriority() on Windows is subject to the same races and permission issues as setpriority() on Linux. Since it is my understanding that the whole idea of an NSPR embedder is that it will not use the native functionality but use NSPR instead this shouldn't be a problem. It has become a problem in B2G because NSPR doesn't offer functionality to schedule processes and so we essentially started bypassing it. That being said we already have users of PR_SetThreadPriority(), PR_Init() and PR_CreateThread() that specify a value different from PR_PRIORITY_NORMAL. We also document in NSPR that whichever thread calls PR_Initialize() will be set to run at the PR_PRIORITY_NORMAL level. So my question is: what shall we do with those? If the general plan is to move away entirely from NSPR and use hal instead then I'm fine with it but I'd like to know what we should do with the existing users because they all suffer already from the problems you describe (raciness with native adjustments, different behavior depending on permissions, inability to check if the change happened though the latter is fixable).
Reporter | ||
Comment 27•11 years ago
|
||
> Using pthread_setschedprio() on other POSIX platforms or SetThreadPriority() on Windows > is subject to the same races and permission issues as setpriority() on Linux. Does SetThreadPriority on Windows actually have the same permission issues on Windows? You can lower a thread's priority using SetThreadPriority iff you're root? Similarly with pthread_setschedprio() on, say, Mac? That would surprise me, but I've been really surprised here already. On Windows we don't appear to have the same get/set race condition; we call SetThreadPriority without ever reading the priority first, because we're just setting an absolute priority. I suppose if someone else did a get/set then they'd have a race, but at least NSPR itself doesn't have races. But this is pretty orthogonal to the question at hand, I think. > So my question is: what shall we do with those? If things don't work properly today with NSPR, we should move them into hal, where we have flexibility to Do What We Want. I think the only reason not to move functionality that we want to change into hal is when we can't change all consumers (e.g. NSS, or when there are tons of consumers).
Assignee | ||
Comment 28•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #27) > Does SetThreadPriority on Windows actually have the same permission issues > on Windows? You can lower a thread's priority using SetThreadPriority iff > you're root? Similarly with pthread_setschedprio() on, say, Mac? That > would surprise me, but I've been really surprised here already. Both are documented to be fallible and in the case of pthread_setschedprio() that's with an EPERM error code too so my guess is that both will be subject to the current user permissions. We actually check for those errors in the code explicitly; what we fail to do is reflect their failure in the thread's priority field and I think that part should be addressed irrespective of what we do here. > If things don't work properly today with NSPR, we should move them into hal, > where we have flexibility to Do What We Want. I think the only reason not > to move functionality that we want to change into hal is when we can't > change all consumers (e.g. NSS, or when there are tons of consumers). OK so, let's back out my change but then we need to at least file a follow up to prevent regressing bug 841041 which was the motivation for the change.
Reporter | ||
Comment 29•11 years ago
|
||
> what we fail to do is reflect their failure in the thread's priority field and I think > that part should be addressed irrespective of what we do here. Yeah, that sounds like a bug. > we need to at least file a follow up to prevent regressing bug 841041 which was the > motivation for the change. Yes, absolutely.
Updated•11 years ago
|
Whiteboard: [status: needs review and landing] → [status: needs review and landing][madrid]
Comment 30•11 years ago
|
||
It is fine for PR_SetThreadPriority to fail silently when the caller does not have sufficient privilege to change thread priorities. The race condition that Justin described is acceptable to NSPR-based programs. It is one of the limitations of any platform abstraction layer. In general a feature will work correctly only if the program does it through NSPR exclusively. We avoid this limitation whenever possible, but there are cases where we can't avoid it. Gabriele: I'll let you decide if you want to fix your get/setpriority based implementation of PR_SetThreadPriority. It is fine by me if Firefox won't use PR_SetThreadPriority, but it seems useful to make PR_SetThreadPriority work on Linux.
Reporter | ||
Comment 31•11 years ago
|
||
If we take the fix instead of the backout, can we please add documentation to PR_SetThreadPriority indicating what consumers should expect?
Assignee | ||
Comment 32•11 years ago
|
||
(In reply to Wan-Teh Chang from comment #30) > Gabriele: I'll let you decide if you want to fix your get/setpriority > based implementation of PR_SetThreadPriority. It is fine by me if > Firefox won't use PR_SetThreadPriority, but it seems useful to make > PR_SetThreadPriority work on Linux. Thanks, I'll think about it a little bit more then because I don't think we're in a hurry to fix this right now but see below. (In reply to Justin Lebar [:jlebar] from comment #31) > If we take the fix instead of the backout, can we please add documentation > to PR_SetThreadPriority indicating what consumers should expect? I'd rather first finish fixing bug 861441 which IMHO is more important for the problems we're seeing in B2G. Once we have that one working we can come back and re-evaluate this depending on how we fixed that one. Obviously if we end up using setpriority() there we'll have to change how PR_SetThreadPriority() works but I'm fairly confident my cgroups-based solution will work well there. BTW I noticed we have two pieces of code using setpriority() directly and at least one looks wrong: https://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#848 From its looks it was probably made with the assumption that it would alter the priority of the entire process. Should we change it to using hal::SetProcessPriority()? The other one is in the updater but that one is not even linking to NSPR so I think it's alright.
Reporter | ||
Comment 33•11 years ago
|
||
> I don't think we're in a hurry to fix this right now. [...] I'd rather first finish > fixing bug 861441 which IMHO is more important for the problems we're seeing in B2G. If the determination is that we're going to use cgroups in bug 861441, I agree that bug is more important. But all things being equal, I'd prefer to use nice, since it's a simpler, more stable interface, so I think we're still waiting on more measurements before we commit to cgroups. In particular we need to ensure that cgroups plays well with nice, or otherwise we need to change how we re-prioritize threads in b2g. If we have a patch that you and wtc are happy with, I'd rather we land it. We can worry about uplifting it to b2g pending bug 861441.
Comment 34•11 years ago
|
||
(Please consider "superreview" as "second review".) We could change PR_SetThreadPriority to set thred->priority = newPri; only if it changed the thread's priority successfully. But that requires more changes, so for now I just documented the current behavior.
Attachment #739233 -
Flags: superreview?(justin.lebar+bug)
Attachment #739233 -
Flags: review?(gsvelto)
Assignee | ||
Comment 35•11 years ago
|
||
Comment on attachment 739233 [details] [diff] [review] Document the limitations and current behavior of PR_SetThreadPriority It's good to have it documented especially in the light of our recent troubles with it. As for not setting |priority| when we fail, we can fix that in a follow-up and then change the comment at the same time; for now it's good to have it spelled out so that one knows it cannot rely on that field reflecting the actual status of the thread.
Attachment #739233 -
Flags: review?(gsvelto) → review+
Reporter | ||
Comment 36•11 years ago
|
||
Comment on attachment 739233 [details] [diff] [review] Document the limitations and current behavior of PR_SetThreadPriority r=me, except I'd mention somewhere that root access is often required to /raise/ thread priorities. The main gotcha on Linux is that you don't need permission to /lower/ a thread priority, but you do need a permission to /raise/ the priority.
Attachment #739233 -
Flags: superreview?(justin.lebar+bug) → superreview+
Updated•11 years ago
|
Whiteboard: [status: needs review and landing][madrid] → [status: needs more investigation][madrid]
Comment 37•11 years ago
|
||
I called out the need for a special permission to raise thread priorities as Justin suggested. https://hg.mozilla.org/projects/nspr/rev/14d8789ebb23
Attachment #739233 -
Attachment is obsolete: true
Attachment #739714 -
Flags: checked-in+
Reporter | ||
Comment 38•11 years ago
|
||
Wan-Teh, it's fine with me if you land Gabriele's patch here, but can we please do that? The situation as-is is pretty broken for b2g.
Reporter | ||
Comment 39•11 years ago
|
||
I'm not convinced that we are going to figure out what to do in bug 861441 within the next day or two. Something complex is going on in the scheduler. We have two bugs blocking certification that we'd like closed by Wednesday: bug 847592 and bug 860799. The former requires bug 861441, but the latter does not. But I cannot fix the latter without either this bug or bug 861441. To keep our options open, I'd like to land a fix here as soon as possible, please. If you can't land this tomorrow, we may have to patch NSPR in b2g18.
Assignee | ||
Comment 40•11 years ago
|
||
I've refreshed the patch addressing your comments and the other things that emerged during the discussion. The following changes were made: - The |priority| field is not changed if the call fails; the documentation was changed to reflect this. The only codepath I haven't changed in this regard is the one using pthread_setprio() which is obsolete and I couldn't find some conclusive documentation regarding how to detect failures. - I've removed the |pid| field and the getpid() calls, to obtain the main process' priority we only need to call getpriority(PRIO_PROCESS, 0). - This patch allows the main thread priority to be changed relatively like other the other threads do. This means that using PR_PRIORITY_NORMAL on the main thread is effectively a no-op; other values will raise and lower priority as appropriate though.
Assignee: wtc → gsvelto
Attachment #737119 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #740728 -
Flags: review?(wtc)
Assignee | ||
Comment 41•11 years ago
|
||
Actually I've changed my mind, this patch only has the changes to do the relative adjustments but I've stripped everything regarding updating the thred->priority field; that belongs to a follow-up and I don't want to add more complexity here.
Attachment #740728 -
Attachment is obsolete: true
Attachment #740728 -
Flags: review?(wtc)
Attachment #740747 -
Flags: review?(wtc)
Assignee | ||
Comment 42•11 years ago
|
||
Here's the back-ported patch for the mozilla-b2g18 tree. Besides applying the changes I've added the patch to nsprpub/patches and documented it in nsprpub/patches/README.
Assignee | ||
Updated•11 years ago
|
Summary: Back out bug 841651 (make PR_SetThreadPriority work on Bionic), because it doesn't do what it intends to do → Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux
Assignee | ||
Comment 43•11 years ago
|
||
The try runs are here: - mozilla-central https://tbpl.mozilla.org/?tree=Try&rev=328e32213458 - mozilla-b2g18 https://tbpl.mozilla.org/?tree=Try&rev=5ea67101c4d7
Comment 44•11 years ago
|
||
I fixed two comments and folded long lines in Gabriele's patch v3 and checked it in to the NSPR hg repository (NSPR 4.10).
Attachment #740747 -
Attachment is obsolete: true
Attachment #740747 -
Flags: review?(wtc)
Attachment #741106 -
Flags: checked-in+
Comment 45•11 years ago
|
||
Comment on attachment 741106 [details] [diff] [review] [PATCH v4] Adjust thread nice values relatively to the main thread's nice value when using PR_SetThreadPriority() on Linux, by Gabriele Svelto Pushed to the NSPR hg repository: https://hg.mozilla.org/projects/nspr/rev/8927f44164de Gabriele: this patch interdiff shows the changes I made to your patch v3: https://bugzilla.mozilla.org/attachment.cgi?oldid=740747&action=interdiff&newid=741106&headers=1
Comment 46•11 years ago
|
||
Comment on attachment 741106 [details] [diff] [review] [PATCH v4] Adjust thread nice values relatively to the main thread's nice value when using PR_SetThreadPriority() on Linux, by Gabriele Svelto Review of attachment 741106 [details] [diff] [review]: ----------------------------------------------------------------- Gabriele: I have three simple questions, and one complicated question on your patch. ::: pr/src/pthreads/ptthread.c @@ +143,4 @@ > */ > tid = gettid(); > + errno = 0; > + rv = getpriority(PRIO_PROCESS, 0); Rather than getting the main thread's nice value every time, I wonder if we should just get it during NSPR initialization (for example, in _PR_InitThreads()) and cache it inside the 'pt_book' global struct. We can do this near where we initialize pt_book.minPrio and pt_book.maxPrio. This will also allow us to use pt_PriorityMap for PR_NICE_PRIORITY_SCHEDULING because we will get the main thread's nice value from a global variable (a new member of pt_book). More info: 1. This MXR query shows we only call _PR_InitThreads() with priority=PR_PRIORITY_NORMAL: http://mxr.mozilla.org/nspr/ident?i=_PR_InitThreads and this shows the 'priority' argument of PR_Init() is ignored: http://mxr.mozilla.org/nspr/source/pr/src/misc/prinit.c#263 So we can assume the thread that initializes NSPR has the priority PR_PRIORITY_NORMAL. 2. Usually the main thread initializes NSPR, but this is not required. So, in _PR_InitThreads(), we should check if the main thread and the current thread have the same nice value. If they have different nice values, our assumption in (1) will be invalid, and we should print a warning message to stderr. @@ +147,4 @@ > > + /* If we cannot read the main thread's nice value don't try to change the > + * new thread's nice value. */ > + if (errno == 0) { Should this test be rv != -1 || errno == 0? @@ +706,3 @@ > > + /* Do not proceed unless we know the main thread's nice value. */ > + if (errno == 0) { Should this test be rv != -1 || errno == 0? @@ +710,5 @@ > + pt_RelativePriority(rv, newPri)); > + > + if (rv == -1) > + { > + /* We don't set pt_schedpriv to EPERM in case errno == EPERM Could you update this comment? The test used to be rv == -1 && errno == EPERM. Now the test is simply rv == -1, so it is confusing for the comment to mention errno == EPERM. Also, if errno is not EPERM, then the log message "no thread scheduling privilege" isn't correct.
Comment 47•11 years ago
|
||
Comment on attachment 740750 [details] [diff] [review] [PATCH v3 mozilla-b2g18] Adjust thread nice values relatively to the main thread's nice value when using PR_SetThreadPriority() on Linux Review of attachment 740750 [details] [diff] [review]: ----------------------------------------------------------------- r=wtc. You can use this patch as is in mozilla-b2g18. The changes I made were all in comments or formatting.
Attachment #740750 -
Flags: review+
Assignee | ||
Comment 48•11 years ago
|
||
Thank you for reviewing & checking in, and also thank you for your changes! > I have three simple questions, and one complicated question > on your patch. > [...] > This will also allow us to use pt_PriorityMap for PR_NICE_PRIORITY_SCHEDULING > because we will get the main thread's nice value from a global > variable (a new member of pt_book). The issue with this is that if the main thread's priority changes at runtime we won't catch it. What Justin has been doing in his B2G-specific patches is to re-nice all the threads in a process by adjusting them up and down appropriately; with my patch always using relative values we should get consistent scheduling of all threads (minus race conditions but those are unavoidable unfortunately). > @@ +147,4 @@ > > > > + /* If we cannot read the main thread's nice value don't try to change the > > + * new thread's nice value. */ > > + if (errno == 0) { > > Should this test be rv != -1 || errno == 0? > > @@ +706,3 @@ > > > > + /* Do not proceed unless we know the main thread's nice value. */ > > + if (errno == 0) { > > Should this test be rv != -1 || errno == 0? Yes to both, but since errno == 0 is true when getpriority() succeeds I thought adding the other check was unnecessary. > Could you update this comment? The test used to be rv == -1 && errno == > EPERM. > Now the test is simply rv == -1, so it is confusing for the comment to > mention > errno == EPERM. > > Also, if errno is not EPERM, then the log message "no thread scheduling > privilege" > isn't correct. Right, I changed it because I wanted the test to be more robust even though I knew we could only ever hit EPERM as an error but as you point out the comment is misleading and the LOG message can be wrong. I was thinking of opening a follow-up to fix |thred->priority| being set even when the underlying thread priority wasn't changed. This would mean wading through some old code and ensuring all error conditions (such as the one above) are correct and properly documented; what do you think?
Comment 49•11 years ago
|
||
(In reply to Wan-Teh Chang from comment #46) > Rather than getting the main thread's nice value every time, I > wonder if we should just get it during NSPR initialization > (for example, in _PR_InitThreads()) and cache it inside the > 'pt_book' global struct. We can do this near where we initialize > pt_book.minPrio and pt_book.maxPrio. That would assume that nothing external is going to alter the thread's nice value. While that might work in the context of b2g, that's not something that can work everywhere.
Assignee | ||
Comment 50•11 years ago
|
||
Attachment 740750 [details] [diff] should land on top of mozilla-b2g18 to resolve the tef+ness of this bug. I'm a bit puzzled by the try run in comment 43. On Fedora the build shows up as orange but the logs don't show any error; is it because the xpcshell tests did not run even though I had requested them? Anyway the other oranges look like known intermittent issues.
Keywords: checkin-needed
Reporter | ||
Comment 51•11 years ago
|
||
> On Fedora the build shows up as orange but the logs don't show any error; is it because
> the xpcshell tests did not run even though I had requested them?
The xpcshell tests did run -- on AWS, which we call "ubuntu".
It might be worth asking a sheriff why those builds are orange; I also don't see anything in the logs.
Comment 52•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #50) > Attachment 740750 [details] [diff] should land on top of mozilla-b2g18 to > resolve the tef+ness of this bug. https://hg.mozilla.org/releases/mozilla-b2g18/rev/ae6cac4083ae https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/89796cc8d74c I assume that this bug will be fixed when NSPR is actually updated on m-c, so leaving open until then.
status-b2g18:
--- → fixed
status-b2g18-v1.0.0:
--- → wontfix
status-b2g18-v1.0.1:
--- → fixed
status-firefox21:
--- → wontfix
status-firefox22:
--- → wontfix
status-firefox23:
--- → affected
Keywords: checkin-needed
Updated•11 years ago
|
Whiteboard: [status: needs more investigation][madrid] → [status: fixed for v1.0.1, open for m-c landing][madrid]
Comment 53•11 years ago
|
||
Marking as npotb to get this out of our queries (since this hasn't been resolved on m-c yet).
Whiteboard: [status: fixed for v1.0.1, open for m-c landing][madrid] → [status: fixed for v1.0.1, open for m-c landing][madrid][NPOTB]
Reporter | ||
Updated•11 years ago
|
Whiteboard: [status: fixed for v1.0.1, open for m-c landing][madrid][NPOTB] → [status: fixed for v1.0.1, open for m-c landing][madrid][NPOTB][actually part of the build; ignore NPOTB]
Comment 54•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #48) > > I was thinking of opening a follow-up to fix |thred->priority| being set > even when the underlying thread priority wasn't changed. This would mean > wading through some old code and ensuring all error conditions (such as the > one above) are correct and properly documented; what do you think? Fixing |thred->priority| could be worthwhile, but only if you are motivated to work on it. You will also need to look at not only PR_SetThreadPriority but also PR_CreateThread (which sets |thred->priority| of the new thread) and _PR_InitThreads and PR_AttachThread (which set |thred->priority| of the current thread.) The current code has the nice property that PR_GetThreadPriority and PR_SetThreadPriority are consistent with each other. We should think about whether anything may break if we lost this consistency. In summary, it may be worth doing, but I am somewhat in favor of just documenting the current "best-effort" behavior of setting/specifying thread priority in NSPR.
Comment 55•11 years ago
|
||
Attachment #741953 -
Flags: review?(gsvelto)
Comment 56•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #48) > > ...What Justin has been doing in his B2G-specific patches is > to re-nice all the threads in a process by adjusting them up and down > appropriately; ... Is this the patch you're referring to? https://bugzilla.mozilla.org/show_bug.cgi?id=861441#c48 That is a brute-force way to re-nice all the threads in a process! It's too bad that there isn't an easier way.
Assignee | ||
Comment 57•11 years ago
|
||
Comment on attachment 741953 [details] [diff] [review] Update the log message for setpriority failure Review of attachment 741953 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for fixing this for me. As for changing the way |thred->priority| is updated then maybe it's better to leave it as-is now that you've documented it. I think that the two biggest reasons not to touch it is that some code somewhere out there might depend on the current behavior, and that this would need to be changed also for obscure/obsolete platforms and I don't think that's actually worth the effort.
Attachment #741953 -
Flags: review?(gsvelto) → review+
Assignee | ||
Comment 58•11 years ago
|
||
(In reply to Wan-Teh Chang from comment #56) > Is this the patch you're referring to? > > https://bugzilla.mozilla.org/show_bug.cgi?id=861441#c48 > > That is a brute-force way to re-nice all the threads in a process! > It's too bad that there isn't an easier way. Yes, that one. We would have liked to use control groups instead but we quickly discovered that the Android kernel we're using has a buggy implementation and working around the bug is no less race-prone than setting all the process' threads nice values (and setting the nice values is far more likely not to make us trip over some obscure kernel bug).
Comment 59•11 years ago
|
||
Comment on attachment 741953 [details] [diff] [review] Update the log message for setpriority failure https://hg.mozilla.org/projects/nspr/rev/35ea28b6a84b
Attachment #741953 -
Flags: checked-in+
Comment 60•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #52) > > I assume that this bug will be fixed when NSPR is actually updated on m-c, > so leaving open until then. I'm nervous about leaving a bug open that's for m-c purpose only when we don't have appropriate blocking flag attached. Technically, this would be a koi blocker for the m-c patch since this is eventually needed on m-c as well. Could we file a followup bug updating m-c here and close this for b2g18 & 1.01 patches?
Flags: needinfo?(ryanvm)
Comment 61•11 years ago
|
||
Actually, I'm going to just do that. I filed bug 873767 for the central landing. I'm closing this bug specifically for the b2g18 and 1.01 landing.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
status-firefox23:
affected → ---
Flags: needinfo?(ryanvm)
Resolution: --- → FIXED
Reporter | ||
Updated•11 years ago
|
Summary: Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux → [b2g18 only] Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux
Comment 62•11 years ago
|
||
The patch has been pushed to mozilla-central.
Priority: -- → P1
Summary: [b2g18 only] Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux → Make PR_SetThreadPriority() change priorities relatively to the main process instead of using absolute values on Linux
Whiteboard: [status: fixed for v1.0.1, open for m-c landing][madrid][NPOTB][actually part of the build; ignore NPOTB] → [madrid][NPOTB][actually part of the build; ignore NPOTB]
Target Milestone: --- → 4.10
You need to log in
before you can comment on or make changes to this bug.
Description
•