Closed Bug 836199 Opened 11 years ago Closed 11 years ago

LMK can kill the communications app when receiving a call

Categories

(Firefox OS Graveyard :: General, defect)

defect
Not set
normal

Tracking

(blocking-b2g:tef+, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)

VERIFIED FIXED
blocking-b2g tef+
Tracking Status
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

(Whiteboard: [target 28/2])

This is essentially the same symptom as bug 834059, but a slightly different STR.  I've been working on this for a few days and I now think this is the right bug to file; I'll dupe and otherwise redirect my existing bugs here.

STR:

 * Reboot the phone
 * Load the browser app
 * Load http://people.mozilla.com/~cjones/membuster.html
 * Click "bust processes"
 * Quickly (within three seconds is fine) call the phone.  (This is what's different from bug 834059; you could trigger that bug by calling the phone after the page had fully loaded, but this STR requires you to call the phone while the tabs are loading.)

Result: Phone never rings, because the phone app is killed during startup.

I think the result above is blocking, unfortunately; if we can't receive calls, what we have isn't really a smart/phone/.
Blocks: 815348
My current plan requires a few separate changes.  (I don't know if this writeup is helpful to anyone else, but it's helpful for me to write out.)

1) When the phone app is in the foreground, it gets PROCESS_PRIORITY_FOREGROUND_COMMUNICATIONS, which places its priority above all other foreground apps' (but below the main process's).

With this in place, bugs where we don't send processes to the foreground quickly enough, or where we have too many fg processes, should not be able to cause us to drop calls.  This makes bug 835829 and bug 835563 non-critical.

I've already written the code for this.  It's relatively simple and low-risk.

2) Switch ProcessPriorityManager to using nsIDocShell::IsActive instead of document visibility to determine whether a process is in the foreground.  When we navigate, a process briefly loses its visible document.  If this navigation takes more than 1s (as it does during these STR), then we end up sending the process into the background.

I have this part written too; it's not a big deal.

3) Somehow keep the phone app in the foreground from when it starts until it has a chance to open the attention screen.

Right now, the sequence of events AIUI is as follows.

  a) Gecko receives a call.
  b) Gecko asks Gaia to start the phone app in the background.
  c) The phone app starts up in the background and receives a message indicating that there's a call pending.
  d) The phone app opens an attention window, which is in the foreground.

If the phone app gets bg priority before (d) happens, it becomes eligible to be killed.

We start apps with foreground priority and give them a grace period (currently 5s) before they drop down to bg priority.  But right now (a-d) takes longer than 5s, so the phone app drops down to bg priority and we drop the call.

I think (a-d) is taking so long because the phone app is competing with 10 (yes, 10) foreground browser tabs for CPU cycles. (*)  In my current patches, the FOREGROUND_COMMUNICATIONS priority doesn't give the process elevated CPU priority.  Perhaps if I increase the CPU priority of FOREGROUND_COMMUNICATIONS, the phone app will always start up within the grace period, and we'll be golden.

Otherwise, we'll have to start thinking about more complicated changes.  For example, perhaps we need to hold the phone app in the fg until it has a chance to process its incoming message.  Indeed, perhaps we should do this with all apps; I expect we have a similar bug with the SMS app missing SMS messages, for example (unless SMS'es are handled by the system app).

(*) The browser app keeps tabs in the foreground until it gets a screenshot, and it doesn't take a screenshot until the tab finishes loading.  I don't think this is a critical issue in the browser, because opening 10 tabs at once won't be a common occurrence.  One ought to be able to reproduce this bug without relying on this particular behavior in the browser app, so fixing the browser app would not fix this bug.
blocking-b2g: --- → tef?
Assignee: nobody → justin.lebar+bug
> I expect we have a similar bug with the SMS app missing SMS messages, for example (unless SMS'es are 
> handled by the system app).

Sorry, s/system app/communications app/.
blocking-b2g: tef? → tef+
> The browser app keeps tabs in the foreground until it gets a screenshot, and it doesn't 
> take a screenshot until the tab finishes loading.  I don't think this is a critical issue 
> in the browser, because opening 10 tabs at once won't be a common occurrence.  One ought 
> to be able to reproduce this bug without relying on this particular behavior in the 
> browser app, so fixing the browser app would not fix this bug.

Actually, I want to revisit this assumption.

Part of what's causing this problem is that we have at once ~10 foreground processes all competing for CPU with the communications app.  AFAIK, this situation can occur /only/ with the browser app.

I can raise the CPU priority of the communications app, but that requires also raising the priority of the main B2G process, because we want the main process to have the highest priority on the system.

If I increase the separation between the priority of the communications app and that of the other foreground apps enough, then I can indeed receive a call in a timely fashion.  But this increases the separation between the b2g app and regular foreground apps, which isn't something we want to do, I think.

But perhaps by merely limiting how many foreground tabs the browser app keeps, I can make the communications app compete against only one other foreground process.  Perhaps that would be sufficient to receive the call in a timely fashion.
Marking status-b2g18 and status-b2g18-v1.0.0 as affected, please update the status to fixed once this is verified landed on v1-train/mozilla-b2g18 and v1.0.0/mozilla-b2g18_v_1_0_0
Depends on: 836647
Depends on: 835795
Depends on: 835563
Blocks: 836638
For those following along at home, I'm pretty sure we need to fix bug 836647 in order to fix this bug.  I'm less certain that we need to fix bug 835563 (which itself depends on bug 835795), but I'd guess we probably need to fix those as well.

Bug 836638 is nearly the same bug as this one, but it's slightly different because we can't give the clock app high foreground priority just because it's the clock app, whereas I'm comfortable giving the communications app high priority.  Fixing the alarm case is more likely to require a fix to bug 835563.
No longer depends on: 835795
Depends on: 836654
Can the phone app safely be killed while we're on a call?
Flags: needinfo?(jones.chris.g)
No, that disconnects the call.
Flags: needinfo?(jones.chris.g)
Depends on: 839312
Justin, please, what is the status on this ?
Flags: needinfo?(justin.lebar+bug)
(In reply to David Scravaglieri [:scravag] from comment #8)
> Justin, please, what is the status on this ?

Please see the dependent bugs.
Flags: needinfo?(justin.lebar+bug)
We've retriaged, and plan to leave this as a blocker (since LMK of comms will likely block certification). Other LMK bugs will not block, but may be accepted with approval.
Depends on: 842679
Whiteboard: [target 28/2]
All of the dependent bugs are fixed here.  All that remains to be done is to verify that this bug is in fact fixed.
Keywords: qawanted
Assigning myself temporarily - I'm going to go hunt for owners on these bugs.
QA Contact: jsmith
QA Contact: jsmith
The phone still doesn't ring when called (right after the "bust processes" is selected). 
Reproduces using the nightly build: 20130225070200, as well as the master build: 20130225030609.
Thanks for testing this.

In order to understand what's going on, I need the output of adb logcat and adb shell dmesg.  (In general, any bug report to developers should include the output of adb logcat.)

Also, do you eventually get a missed call notification?
i didn't get any notification, but i did get a "ghost" incoming call screen eventually.
(In reply to ahubenya from comment #15)
> i didn't get any notification, but i did get a "ghost" incoming call screen
> eventually.

Aha!  That very likely indicates that the comms app is not being killed, which is what this bug was tracking.  There's still an issue, but that's a separate bug.
It sounds like comment 15 is all of the QA we're going to get here?  If so, I guess we should close this bug.

Ahubenya, I'd appreciate if you could file a new bug for the issue described in comment 15.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
JLebar, 
The new bug has been created for the issue stated in comment 15. 
https://bugzilla.mozilla.org/show_bug.cgi?id=845497
Keywords: qawantedverifyme
This issue has been verified as fixed on the v1.0.1 nightly 20130226031044, as well as the master 20130227030553. The calls go through now while "bust processes" is loading tabs.
Status: RESOLVED → VERIFIED
QA Contact: ahubenya
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.