Closed
Bug 834059
Opened 12 years ago
Closed 12 years ago
LMK incorrectly selecting apps that are in the process of starting
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:tef+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed, b2g18-v1.0.0 fixed, b2g18-v1.0.1 fixed)
People
(Reporter: m1, Assigned: justin.lebar+bug)
References
Details
(Whiteboard: [cr 442171])
Attachments
(1 file)
26.46 KB,
patch
|
cjones
:
review+
|
Details | Diff | Splinter Review |
Looks like apps that are being launched in a fully loaded system may be selected by the LMK before they make it into the foreground.
Some use cases we've seen this occur in:
1) Incoming call notification never arrives (communications app killed).
2) Random apps die unexpectedly as monkey try to start them.
One way to reproduce:
1) Put every app in background (other than communications)
2) Make a call to the device
Every so often should see the communication app getting killed by the LMK.
Logging from an independent monkey run attempted to launch the Calendar application showed that the preallocated process remained at OOM_ADJ=6 as it was transformed into the Calendar application, at which point the system ran out of memory and:
<4>[16239.978646] send sigkill to 28393 (Calendar), adj 6, size 4550
jlebar, do you recall where we flip prelaunch into foreground?
Flags: needinfo?(justin.lebar+bug)
Assignee | ||
Comment 2•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #1)
> jlebar, do you recall where we flip prelaunch into foreground?
Looks like it's only when a top-level window becomes active, which happens when the parent tells the BEC that it's active, which is quite late in the cycle.
I can take this if you'd like.
Flags: needinfo?(justin.lebar+bug)
Sure, sounds good.
We need to ensure that the service-y things like cost control and calendar don't regress and stick in the foreground again.
Assignee | ||
Comment 4•12 years ago
|
||
> We need to ensure that the service-y things like cost control and calendar don't regress
> and stick in the foreground again.
Can I just give it a 5s timeout and stick it in the background if it's not foreground by then?
I'm not 100% sure how foreground/background state is managed for those currently. For the purposes of this bug, I only meant we need to verify they're not relying on the current ~buggy TabChild behavior. (They shouldn't be.)
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → justin.lebar+bug
Updated•12 years ago
|
blocking-b2g: tef? → tef+
I found a reliable way to reproduce this. Writing a test.
This should most likely have a righteous impact on startup time too. Will see if I can get an estimate.
Alas, nothing significant by the stopwatch. We must be getting lucky and setting ourselves to foreground before doing significant work (or not competing for CPU commonly).
Comment 10•12 years ago
|
||
To ensure this bug is updated for those playing along at home, Justin told me that he's working on it but progress has been hampered by build errors. It's highly unlikely he'll have a fix today but he's still working hard on it.
Assignee | ||
Comment 11•12 years ago
|
||
Issues building aside, this has turned out to be much more complicated than I expected, largely because we're trying to do stuff with the ProcessPriorityManager before XPCOM has finished starting up. Which means, for example, we can't read prefs...
Assignee | ||
Comment 12•12 years ago
|
||
This passes the acceptance test for me. Sorry it took me so long to spin this
patch.
A few notes:
* If it's OK with you, I may go back through and clean up the logging. I
skipped on this because I wanted to get the patch up tonight.
* I basically rewrote ProcessPriorityManager; it might be easier to apply this
patch and just look at the resultant file.
* One downside to the way this is structured is that all processes get five
seconds of foreground time when they "start" (either when the preallocated
process becomes an app, or when a new process is started due to the
preallocated process not being there).
So for example, if you load a tab in the background, it has 5s of foreground
time, during which it might OOM the foreground tab. Or if you load an app
and then hit the home button, the app stays in the fg for 5s.
We could change this for browser tabs specifically. It's hard to see how to
change it overall, though.
I originally tried saying that if the app comes into the fg "on its own" and
then goes back into the background, then we give it bg priority. That seemed
to be causing problems with the phone app, because it seemed to be "going
back into the bg" before it showed any UI. As soon as it went back into the
bg, we killed it. But I may have made an error in my testing, and I don't
have time tonight to go back and try it.
* I think we should give the phone app a priority in-between regular fg and the
master process. That's the only way I'll be confident we won't be able to
run into problems here.
For example, the phone app loads merely with foreground priority. That means
it's competing with the existing fg app, which could OOM it. The current
setup does not protect against this.
* I have no idea if this fixes bug 834466, but I kind of doubt it.
Attachment #706702 -
Flags: review?(jones.chris.g)
Comment on attachment 706702 [details] [diff] [review]
Patch, v1
>diff --git a/dom/ipc/ContentChild.cpp b/dom/ipc/ContentChild.cpp
> PBrowserChild*
> ContentChild::AllocPBrowser(const IPCTabContext& aContext,
> const uint32_t& aChromeFlags)
> {
>+ // If we are the preallocated process transforming into an app process,
>+ // we'll have background priority at this point. Give ourselves a
>+ // priority boost for a few seconds, so we don't get killed while we're
>+ // loading our first TabChild.
>+ dom::ipc::TemporarilySetProcessPriorityToForeground();
Nit: |using mozilla::dom::ipc| plz.
>diff --git a/dom/ipc/ContentParent.cpp b/dom/ipc/ContentParent.cpp
>@@ -865,34 +866,44 @@ ContentParent::ContentParent(const nsAString& aAppManifestURL,
>+ // Set the subprocess's priority (bg if we're a preallocated process, fg
>+ // otherwise). We do this first because we're likely /lowering/ its CPU and
>+ // memory priority, which it has inherited from this process.
>+ if (Preferences::GetBool("dom.ipc.processPriorityManager.enabled")) {
>+ ProcessPriority priority;
>+ if (aAppManifestURL == MAGIC_PREALLOCATED_APP_MANIFEST_URL) {
>+ priority = PROCESS_PRIORITY_BACKGROUND;
>+ } else {
>+ priority = PROCESS_PRIORITY_FOREGROUND;
>+ }
I prefer the ?: operator, but up to you.
>diff --git a/dom/ipc/ProcessPriorityManager.cpp b/dom/ipc/ProcessPriorityManager.cpp
> // Enable logging by setting
> //
> // NSPR_LOG_MODULES=ProcessPriorityManager:5
> //
> // in your environment.
>
>-#ifdef PR_LOGGING
>+#ifdef ANDROID
>+#include <android/log.h>
>+#define LOG(fmt, ...) \
>+ __android_log_print(ANDROID_LOG_INFO, \
>+ "Gecko:ProcessPriorityManager", \
>+ fmt, ## __VA_ARGS__)
Will want this off for land.
This sort of contradicts the comment above. Was nspr logging
insufficient for your uses or just inconvenient?
> void
> ProcessPriorityManager::Init()
Add a comment somewhere around here noting that we rely on our process
launcher to initialize the priority appropriately.
Overall looks OK and makes sense. But what happened to the audio
channel handling? Is that subsumed by something else or did it get
lost in the refactoring shuffle?
Attachment #706702 -
Flags: review?(jones.chris.g)
Assignee | ||
Comment 14•12 years ago
|
||
> Will want this off for land.
> This sort of contradicts the comment above. Was nspr logging
> insufficient for your uses or just inconvenient?
NSPR is quite inconvenient on the phone. I don't know if it even goes to logcat if you turn it on, but the easiest way I know of to turn it on is to push a new b2g.sh, and that's more inconvenient than I'd like.
Our logging story on B2G is very, very, bad. Most people use only __android_log_print, and everyone rolls their own macros (some better than others). Just be thankful I didn't do it like nsIdleService:
http://mxr.mozilla.org/mozilla-central/source/widget/xpwidgets/nsIdleService.cpp#246 :)
Anyway, I'll turn it off if you don't want it on. (Personally, I think having some indication of when processes get set to which priorities in the logcat is quite useful for understanding various behaviors, so I intended to leave it on.)
> But what happened to the audio channel handling?
We now call ResetPriority() when the audio channels change. We do a bit of extra work, because we iterate over all top-level windows and then iterate over the audio channels, when we don't in fact need to iterate over the windows. But I expect there will be very few top-level windows in a process, and it's much simpler this way:
> + } else if (!strcmp(aTopic, "inner-window-destroyed") ||
> + !strcmp(aTopic, "audio-channel-agent-changed")) {
> + ResetPriority();
(In reply to Justin Lebar [:jlebar] from comment #14)
> > Will want this off for land.
>
> > This sort of contradicts the comment above. Was nspr logging
> > insufficient for your uses or just inconvenient?
>
> NSPR is quite inconvenient on the phone. I don't know if it even goes to
> logcat if you turn it on
It does.
$ adb shell
# stop b2g
# NSPR_LOG_MODULES="Foo:5" /system/bin/b2g.sh
will get it going.
> Our logging story on B2G is very, very, bad.
s/B2G/Gecko/, my good man! ;)
> Anyway, I'll turn it off if you don't want it on. (Personally, I think
> having some indication of when processes get set to which priorities in the
> logcat is quite useful for understanding various behaviors, so I intended to
> leave it on.)
I'd love to have it on but always-on "info"-level logging bites in the butt pretty often. I guess it makes sense to leave this on for a while in case we see new issues with CAF testing.
> > But what happened to the audio channel handling?
>
> We now call ResetPriority() when the audio channels change.
Whups, I overlooked that! Carry on.
Updated•12 years ago
|
Attachment #706702 -
Flags: review+
Assignee | ||
Comment 16•12 years ago
|
||
Reporter | ||
Comment 17•12 years ago
|
||
Thanks guys! Can this be landed on b2g18 directly in case we don't get an automated uplift before Sunday noon PST
Assignee | ||
Comment 18•12 years ago
|
||
For sure, but I'd like this to cycle on inbound before I land on b2g18, so let's give it a few hours.
Assignee | ||
Comment 19•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
status-b2g18:
--- → fixed
Reporter | ||
Comment 20•12 years ago
|
||
Assignee | ||
Comment 21•12 years ago
|
||
Comment 22•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
status-firefox19:
--- → wontfix
status-firefox20:
--- → wontfix
status-firefox21:
--- → fixed
Target Milestone: --- → B2G C4 (2jan on)
Assignee | ||
Comment 23•12 years ago
|
||
> # NSPR_LOG_MODULES="Foo:5" /system/bin/b2g.sh
Except we don't define PR_LOGGING in our B2G release builds, so we're right back where we started. If I have to modify the code in order to get logging, I'd rather not have to then go into adb shell and reboot gecko every time I reboot the phone.
I guess next time I'll leave the logcat stuff in and ifdef it out. :(
Assignee | ||
Comment 24•12 years ago
|
||
...actually, maybe that's wrong; I was building with a different srcdir!
Yes, that's general suckage with NSPR logging. I don't like it either.
Comment 26•12 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #23)
> > # NSPR_LOG_MODULES="Foo:5" /system/bin/b2g.sh
>
> Except we don't define PR_LOGGING in our B2G release builds, so we're right
> back where we started. If I have to modify the code in order to get
> logging, I'd rather not have to then go into adb shell and reboot gecko
> every time I reboot the phone.
You could edit b2g.sh on the phone so then you don't need to restart gecko.
Alternatively, you can edit gonk-misc/b2g.sh so that it's right from the time you flash.
Here are the steps required for changing b2g.sh on the phone:
https://developer.mozilla.org/en-US/docs/Mozilla/Firefox_OS/Debugging/Customizing_the_b2g.sh_script
Updated•12 years ago
|
status-b2g18-v1.0.0:
--- → fixed
Updated•12 years ago
|
status-b2g18-v1.0.1:
--- → fixed
Comment 28•12 years ago
|
||
Michael Vines if you can Verify this bug out, it would be much appreciated.
Flags: needinfo?(mvines)
Keywords: verifyme
Reporter | ||
Updated•12 years ago
|
Status: RESOLVED → VERIFIED
Reporter | ||
Updated•12 years ago
|
Flags: needinfo?(mvines)
Comment 29•10 years ago
|
||
Per comment 29,I clear "Verifyme".
You need to log in
before you can comment on or make changes to this bug.
Description
•