Closed Bug 870203 Opened 11 years ago Closed 11 years ago

Weird immediate downgrading of priority for Communications process

Categories

(Firefox OS Graveyard :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(blocking-b2g:tef+, b2g18 fixed, b2g18-v1.0.1 fixed)

RESOLVED FIXED
1.0.1 Cert2 (21may)
blocking-b2g tef+
Tracking Status
b2g18 --- fixed
b2g18-v1.0.1 --- fixed

People

(Reporter: bzbarsky, Assigned: justin.lebar+bug)

References

Details

(Whiteboard: [target:resolved already?])

Attachments

(2 files, 2 obsolete files)

I was logging things as I tried to call while running the process buster thing, and some logs like so:

05-08 21:49:11.317 I/Gecko:ProcessPriorityManager(  104): [Communications, child-id=133, pid=2757] Creating ParticularProcessPriorityManager.
05-08 21:49:11.317 I/Gecko:ProcessPriorityManager(  104): [Communications, child-id=133, pid=2757] Done starting up.  mHoldsCPUWakeLock=0, mHoldsHighPriorityWakeLock=0
05-08 21:49:11.317 I/Gecko:ProcessPriorityManager(  104): [Communications, child-id=133, pid=2757] Changing priority from UNKNOWN to FOREGROUND_HIGH.

then setting the priorities to low on all the other processes, then:

05-08 21:49:11.397 I/Gecko:ProcessPriorityManager(  104): [Communications, child-id=133, pid=2757] Scheduling reset timer to fire in 1000ms.
05-08 21:49:11.447 I/Gecko:ProcessPriorityManager(  104): [Communications, child-id=133, pid=2757] Changing priority from FOREGROUND_HIGH to BACKGROUND.

Note the timestamps.  Why did we just go back from HIGH to BACKGROUND after about 150ms of being HIGH?

Note that right after this in the log we have:

05-08 21:49:11.918 I/Gecko   (  104): XXX ril_worker::_process_calls() got new call.

so this is in the middle of us starting up the Communications app to take the call!
To be clear, this is with the STR from bug 847592 comment 19.
Ah, I think I may understand this better now.

The process should hold a wake lock when it starts up, due to ContentParent::MaybeTakeCPUWakeLock.  It doesn't: "Done starting up.  mHoldsCPUWakeLock=0, mHoldsHighPriorityWakeLock=0".

We force the priority to FOREGROUND_HIGH after starting it up (that's the third log line), but then if anybody takes or releases a wake lock for any reason, this forces us to reconsider the priority we've given the process.
> The process should hold a wake lock when it starts up, due to 
> ContentParent::MaybeTakeCPUWakeLock.  It doesn't: "Done starting up.  mHoldsCPUWakeLock=0, 
> mHoldsHighPriorityWakeLock=0".

Oh, okay.  We don't take the wake lock immediately after starting the process, but we take it within one shot of the event loop after creating the ContentParent.  That should be soon enough.

When we're starting up the communications app (that is, when it's not already loaded), you should see

> I/Gecko:ProcessPriorityManager(  144): [Communications, child-id=32, pid=932] Got wake lock changed event. Now mHoldsCPUWakeLock=1, mHoldsHighPriorityWakeLock=0

immediately after we renice all of the other processes due to our getting FOREGROUND_HIGH priority.
I do NOT see that.  What I see in my log is lots of things like:

05-08 21:49:11.357 I/Gecko:ProcessPriorityManager(  104): [Usage, child-id=21, pid=841] Changing priority from FOREGROUND to FOREGROUND (low CPU).
05-08 21:49:11.357 I/Gecko   (  104): XXX Setting nice for pid 841 to 18
05-08 21:49:11.367 I/Gonk    (  104): Changed nice for pid 841 from 1 to 18.

as we renice all the processes and then immediately after that is the second log quote from comment 0.

Note that if we happen to set visibility in there we'll immediately and synchronously reset our priority.. I wonder if that's what happens here.

In any case, in the relevant log I never see mHoldsCPUWakeLock=1.  It's always 0 every time it's logged.
Attached patch Moar logging (obsolete) — Splinter Review
Comment on attachment 747269 [details] [diff] [review]
Moar logging

This may clear things up.
Attachment #747269 - Flags: feedback?(bzbarsky)
Comment on attachment 747269 [details] [diff] [review]
Moar logging

Add something to OnFrameloaderVisibleChanged too, for its ResetPriorityNow call?
Attachment #747269 - Flags: feedback?(bzbarsky) → feedback+
Attached patch More logging, v2Splinter Review
Attachment #747269 - Attachment is obsolete: true
Managed to get this while profiling...  still not sure whether the profiler bit matters.

The profile is at http://people.mozilla.com/~bgirard/cleopatra/#report=99acc84a9c8c3b4462f16b539bc62b417421d054

This was a phone that had tried to take a call once already, and I _think_ had a Communications running when I started the testcase.

Salient bits from this log:

1)  There is a Communications with pid=478 but then it looks like it dies and we create a new Communications with pid=928?

2)  When this second Communications finished startup, it has
    "mHoldsCPUWakeLock=0, mHoldsHighPriorityWakeLock=0"

3)  Right after going into the FOREGROUND_HIGH state, this Communications gets
    a ResetPriority via OnRemoteBrowserFrameShown that schedules a timer to
    reset the priority back to FOREGROUND.

4)  Right after that, we get a OnFrameloaderVisibleChanged and are now
    isVisible=0(?), and synchronously go into BACKGROUND state.

5)  Much later (12s later) we get mHoldsHighPriorityWakeLock=1 and switch back
    to FOREGROUND_HIGH mode.
Blocks: 847592
> 1)  There is a Communications with pid=478 but then it looks like it dies and we create a new 
> Communications with pid=928?

Looks like it, particularly since we destroy its particular process priority manager.

> I/Gecko:ProcessPriorityManager(  104): [child-id=4, pid=-1] Destroying ParticularProcessPriorityManager.

Then we create a new preallocated process (pid 697) and then it OOMs.  This should be fixed by bug 868521.  Hopefully, anyway.

But then this is bad.

> XXX ContentParent::MaybeTakeCPUWakeLock taking wake lock
> [Communications, child-id=34, pid=928] OnRemoteBrowserFrameShown; calling ResetPriority.
> [Communications, child-id=34, pid=928] ComputePriority[mHoldsCPUWakeLock=0, mHoldsHighPriorityWakeLock=0, HasAppType('critical')=1, isVisible=1, IsExpectingSystemMessage=1

It looks like taking the wake lock on behalf of the new process isn't working right; we should have mHoldsCPUWakeLock=1 there.
Ohhhh.

We don't notify observers of wake-lock changes if the state remains the same.

Which is to say, if the CPU wake lock is held, and then we acquire the CPU wake lock on behalf of some process, we don't hear about that.
> 4)  Right after that, we get a OnFrameloaderVisibleChanged and are now
>     isVisible=0(?), and synchronously go into BACKGROUND state.

This is separately problematic.  If a process is loading and doesn't happen to hold a wake lock, it still shouldn't get shoved into the bg.

If you have time to look into this bug, I'd want to understand why this is happening (e.g., what's the backtrace for this call).
This also fixes an important bug where, if a process crashed while holding a wake lock, we'd send a wake lock changed notification without including the list of locking processes.  This could cause us to incorrectly believe that no processes were holding the lock.

Unfortunately both T-Mobile and AT&T cell service is currently not functioning at my house (?!), so I haven't been able to test this.  But I also haven't been able to reproduce this problem in the first place.
Depends on: 870480
Comment on attachment 747538 [details] [diff] [review]
Patch v1: Notify wake lock listeners when a process adds or removes a wake lock, even if the wake lock itself doesn't change state.

Moving this to bug 870480.
Attachment #747538 - Attachment is obsolete: true
This is tef+ because we've determined it blocks bug 847592.
blocking-b2g: --- → tef+
> 4)  Right after that, we get a OnFrameloaderVisibleChanged and are now
>     isVisible=0(?), and synchronously go into BACKGROUND state.
>
> This is separately problematic.  If a process is loading and doesn't happen to hold a wake lock, it 
> still shouldn't get shoved into the bg.

Okay, I finally tracked down this OnFrameloaderVisibleChanged.  It's coming from the Gaia window manager calling setVisible on this particular frame because, as far as Gaia knows at the time, it's launching a background service.  This should have no effect on the process's priority, since it's holding a wake lock here.

In other words, this is totally fine, and we may only need the patch in bug 847592 here.

I do want to write a patch to avoid sending frameloader-visible-changed when the visibility didn't actually change.  That will help avoid races around the grace period timer.  But I'll explain this in more detail in a separate bug.
> But I'll explain this in more detail in a separate bug.

Bug 870598.
The only remaining work to be done here, I think, is to verify that bug 870480 fixes this issue.  I've never been able to reliably reproduce myself, so I'd appreciate some help here.
> The only remaining work to be done here, I think, is to verify that bug 870480 fixes this
> issue. 

I could reproduce it pretty reliably without that patch; I haven't managed to with the patch yet.  And the responsiveness to an incoming call when running the process buster thing seems to be a good bit better, too.  So I'd say that patch fixes this problem, yes.
I'll figure out tomorrow if I want to land any of this logging.  It's a fine line between being helpful and being too verbose.
Flags: needinfo?(justin.lebar+bug)
Target Milestone: --- → 1.0.1 Cert2 (28may)
jlebar - are you the right owner here?
Assignee: nobody → justin.lebar+bug
Yes, I'm the right owner.  I think we're good here; I'm waiting for QA in bug 847592, and if they don't find any problems, we can close this bug.

I'd still like to add some logging, but I can do that separately.
Flags: needinfo?(justin.lebar+bug)
Whiteboard: [target:resolved already?]
Okay, QA is doing work in bug 847592; we don't need to have two bugs open for that.  We can reopen this if we find that this is still happening somehow.
Closing per jlebar's earlier comment.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: