Closed Bug 834059 Opened 12 years ago Closed 12 years ago

LMK incorrectly selecting apps that are in the process of starting

Categories

(Firefox OS Graveyard :: General, defect)

Product:

Component:

Platform:

ARM

Gonk (Firefox OS)

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

(blocking-b2g:tef+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed, b2g18-v1.0.0 fixed, b2g18-v1.0.1 fixed)

Status:

VERIFIED FIXED

Milestone:

B2G C4 (2jan on)

blocking-b2g

tef+

Tracking Flags:

Tracking

Status

firefox19

---

wontfix

firefox20

---

wontfix

firefox21

---

fixed

b2g18

---

fixed

b2g18-v1.0.0

---

fixed

b2g18-v1.0.1

---

fixed

People

(Reporter: m1, Assigned: justin.lebar+bug)

References

Details

(Whiteboard: [cr 442171])

Attachments

(1 file)

Patch, v1 12 years ago Justin Lebar (not reading bugmail) 26.46 KB, patch	cjones : review+	Details \| Diff \| Splinter Review

Michael Vines [:m1] [:evilmachines]

Reporter

Description

•

12 years ago

Looks like apps that are being launched in a fully loaded system may be selected by the LMK before they make it into the foreground. Some use cases we've seen this occur in: 1) Incoming call notification never arrives (communications app killed). 2) Random apps die unexpectedly as monkey try to start them. One way to reproduce: 1) Put every app in background (other than communications) 2) Make a call to the device Every so often should see the communication app getting killed by the LMK. Logging from an independent monkey run attempted to launch the Calendar application showed that the preallocated process remained at OOM_ADJ=6 as it was transformed into the Calendar application, at which point the system ran out of memory and: <4>[16239.978646] send sigkill to 28393 (Calendar), adj 6, size 4550

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 1

•

12 years ago

jlebar, do you recall where we flip prelaunch into foreground?

Flags: needinfo?(justin.lebar+bug)

Justin Lebar (not reading bugmail)

Assignee

Comment 2

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #1) > jlebar, do you recall where we flip prelaunch into foreground? Looks like it's only when a top-level window becomes active, which happens when the parent tells the BEC that it's active, which is quite late in the cycle. I can take this if you'd like.

Flags: needinfo?(justin.lebar+bug)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 3

•

12 years ago

Sure, sounds good. We need to ensure that the service-y things like cost control and calendar don't regress and stick in the foreground again.

Justin Lebar (not reading bugmail)

Assignee

Comment 4

•

12 years ago

> We need to ensure that the service-y things like cost control and calendar don't regress > and stick in the foreground again. Can I just give it a 5s timeout and stick it in the background if it's not foreground by then?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 5

•

12 years ago

I'm not 100% sure how foreground/background state is managed for those currently. For the purposes of this bug, I only meant we need to verify they're not relying on the current ~buggy TabChild behavior. (They shouldn't be.)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Assignee: nobody → justin.lebar+bug

Andrew Overholt [:overholt]

Updated

•

12 years ago

blocking-b2g: tef? → tef+

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 6

•

12 years ago

I found a reliable way to reproduce this. Writing a test.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 7

•

12 years ago

https://wiki.mozilla.org/B2G/Memory_acceptance_criteria#MW4:_Apps_are_successfully_launched_into_the_foreground_under_memory_pressure

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

12 years ago

Depends on: 834466

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

12 years ago

Blocks: 815348

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 8

•

12 years ago

This should most likely have a righteous impact on startup time too. Will see if I can get an estimate.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 9

•

12 years ago

Alas, nothing significant by the stopwatch. We must be getting lucky and setting ourselves to foreground before doing significant work (or not competing for CPU commonly).

Andrew Overholt [:overholt]

Comment 10

•

12 years ago

To ensure this bug is updated for those playing along at home, Justin told me that he's working on it but progress has been hampered by build errors. It's highly unlikely he'll have a fix today but he's still working hard on it.

Justin Lebar (not reading bugmail)

Assignee

Comment 11

•

12 years ago

Issues building aside, this has turned out to be much more complicated than I expected, largely because we're trying to do stuff with the ProcessPriorityManager before XPCOM has finished starting up. Which means, for example, we can't read prefs...

Justin Lebar (not reading bugmail)

Assignee

Comment 12

•

12 years ago

Attached patch Patch, v1 — Details — Splinter Review

This passes the acceptance test for me. Sorry it took me so long to spin this patch. A few notes: * If it's OK with you, I may go back through and clean up the logging. I skipped on this because I wanted to get the patch up tonight. * I basically rewrote ProcessPriorityManager; it might be easier to apply this patch and just look at the resultant file. * One downside to the way this is structured is that all processes get five seconds of foreground time when they "start" (either when the preallocated process becomes an app, or when a new process is started due to the preallocated process not being there). So for example, if you load a tab in the background, it has 5s of foreground time, during which it might OOM the foreground tab. Or if you load an app and then hit the home button, the app stays in the fg for 5s. We could change this for browser tabs specifically. It's hard to see how to change it overall, though. I originally tried saying that if the app comes into the fg "on its own" and then goes back into the background, then we give it bg priority. That seemed to be causing problems with the phone app, because it seemed to be "going back into the bg" before it showed any UI. As soon as it went back into the bg, we killed it. But I may have made an error in my testing, and I don't have time tonight to go back and try it. * I think we should give the phone app a priority in-between regular fg and the master process. That's the only way I'll be confident we won't be able to run into problems here. For example, the phone app loads merely with foreground priority. That means it's competing with the existing fg app, which could OOM it. The current setup does not protect against this. * I have no idea if this fixes bug 834466, but I kind of doubt it.

Attachment #706702 - Flags: review?(jones.chris.g)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 13

•

12 years ago

Comment on attachment 706702 [details] [diff] [review] Patch, v1 >diff --git a/dom/ipc/ContentChild.cpp b/dom/ipc/ContentChild.cpp > PBrowserChild* > ContentChild::AllocPBrowser(const IPCTabContext& aContext, > const uint32_t& aChromeFlags) > { >+ // If we are the preallocated process transforming into an app process, >+ // we'll have background priority at this point. Give ourselves a >+ // priority boost for a few seconds, so we don't get killed while we're >+ // loading our first TabChild. >+ dom::ipc::TemporarilySetProcessPriorityToForeground(); Nit: |using mozilla::dom::ipc| plz. >diff --git a/dom/ipc/ContentParent.cpp b/dom/ipc/ContentParent.cpp >@@ -865,34 +866,44 @@ ContentParent::ContentParent(const nsAString& aAppManifestURL, >+ // Set the subprocess's priority (bg if we're a preallocated process, fg >+ // otherwise). We do this first because we're likely /lowering/ its CPU and >+ // memory priority, which it has inherited from this process. >+ if (Preferences::GetBool("dom.ipc.processPriorityManager.enabled")) { >+ ProcessPriority priority; >+ if (aAppManifestURL == MAGIC_PREALLOCATED_APP_MANIFEST_URL) { >+ priority = PROCESS_PRIORITY_BACKGROUND; >+ } else { >+ priority = PROCESS_PRIORITY_FOREGROUND; >+ } I prefer the ?: operator, but up to you. >diff --git a/dom/ipc/ProcessPriorityManager.cpp b/dom/ipc/ProcessPriorityManager.cpp > // Enable logging by setting > // > // NSPR_LOG_MODULES=ProcessPriorityManager:5 > // > // in your environment. > >-#ifdef PR_LOGGING >+#ifdef ANDROID >+#include <android/log.h> >+#define LOG(fmt, ...) \ >+ __android_log_print(ANDROID_LOG_INFO, \ >+ "Gecko:ProcessPriorityManager", \ >+ fmt, ## __VA_ARGS__) Will want this off for land. This sort of contradicts the comment above. Was nspr logging insufficient for your uses or just inconvenient? > void > ProcessPriorityManager::Init() Add a comment somewhere around here noting that we rely on our process launcher to initialize the priority appropriately. Overall looks OK and makes sense. But what happened to the audio channel handling? Is that subsumed by something else or did it get lost in the refactoring shuffle?

Attachment #706702 - Flags: review?(jones.chris.g)

Justin Lebar (not reading bugmail)

Assignee

Comment 14

•

12 years ago

> Will want this off for land. > This sort of contradicts the comment above. Was nspr logging > insufficient for your uses or just inconvenient? NSPR is quite inconvenient on the phone. I don't know if it even goes to logcat if you turn it on, but the easiest way I know of to turn it on is to push a new b2g.sh, and that's more inconvenient than I'd like. Our logging story on B2G is very, very, bad. Most people use only __android_log_print, and everyone rolls their own macros (some better than others). Just be thankful I didn't do it like nsIdleService: http://mxr.mozilla.org/mozilla-central/source/widget/xpwidgets/nsIdleService.cpp#246 :) Anyway, I'll turn it off if you don't want it on. (Personally, I think having some indication of when processes get set to which priorities in the logcat is quite useful for understanding various behaviors, so I intended to leave it on.) > But what happened to the audio channel handling? We now call ResetPriority() when the audio channels change. We do a bit of extra work, because we iterate over all top-level windows and then iterate over the audio channels, when we don't in fact need to iterate over the windows. But I expect there will be very few top-level windows in a process, and it's much simpler this way: > + } else if (!strcmp(aTopic, "inner-window-destroyed") || > + !strcmp(aTopic, "audio-channel-agent-changed")) { > + ResetPriority();

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 15

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #14) > > Will want this off for land. > > > This sort of contradicts the comment above. Was nspr logging > > insufficient for your uses or just inconvenient? > > NSPR is quite inconvenient on the phone. I don't know if it even goes to > logcat if you turn it on It does. $ adb shell # stop b2g # NSPR_LOG_MODULES="Foo:5" /system/bin/b2g.sh will get it going. > Our logging story on B2G is very, very, bad. s/B2G/Gecko/, my good man! ;) > Anyway, I'll turn it off if you don't want it on. (Personally, I think > having some indication of when processes get set to which priorities in the > logcat is quite useful for understanding various behaviors, so I intended to > leave it on.) I'd love to have it on but always-on "info"-level logging bites in the butt pretty often. I guess it makes sense to leave this on for a while in case we see new issues with CAF testing. > > But what happened to the audio channel handling? > > We now call ResetPriority() when the audio channels change. Whups, I overlooked that! Carry on.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

12 years ago

Attachment #706702 - Flags: review+

Justin Lebar (not reading bugmail)

Assignee

Comment 16

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/42ba5b4cf32a

Michael Vines [:m1] [:evilmachines]

Reporter

Comment 17

•

12 years ago

Thanks guys! Can this be landed on b2g18 directly in case we don't get an automated uplift before Sunday noon PST

Justin Lebar (not reading bugmail)

Assignee

Comment 18

•

12 years ago

For sure, but I'd like this to cycle on inbound before I land on b2g18, so let's give it a few hours.

Justin Lebar (not reading bugmail)

Assignee

Comment 19

•

12 years ago

https://hg.mozilla.org/releases/mozilla-b2g18/rev/0a0efd587a4c

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

status-b2g18: --- → fixed

Michael Vines [:m1] [:evilmachines]

Reporter

Comment 20

•

12 years ago

Oh, http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_0 please. Ref: https://wiki.mozilla.org/Release_Management/B2G_Landing#v1.0.0_2

Justin Lebar (not reading bugmail)

Assignee

Comment 21

•

12 years ago

> Oh, http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_0 please. Thanks. https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_0/rev/5c9f6113c3fa

Ryan VanderMeulen [:RyanVM]

Comment 22

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/42ba5b4cf32a

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

status-firefox19: --- → wontfix

status-firefox20: --- → wontfix

status-firefox21: --- → fixed

Target Milestone: --- → B2G C4 (2jan on)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Depends on: 835563

Justin Lebar (not reading bugmail)

Assignee

Comment 23

•

12 years ago

> # NSPR_LOG_MODULES="Foo:5" /system/bin/b2g.sh Except we don't define PR_LOGGING in our B2G release builds, so we're right back where we started. If I have to modify the code in order to get logging, I'd rather not have to then go into adb shell and reboot gecko every time I reboot the phone. I guess next time I'll leave the logcat stuff in and ifdef it out. :(

Justin Lebar (not reading bugmail)

Assignee

Comment 24

•

12 years ago

...actually, maybe that's wrong; I was building with a different srcdir!

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 25

•

12 years ago

Yes, that's general suckage with NSPR logging. I don't like it either.

Dave Hylands [:dhylands]

Comment 26

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #23) > > # NSPR_LOG_MODULES="Foo:5" /system/bin/b2g.sh > > Except we don't define PR_LOGGING in our B2G release builds, so we're right > back where we started. If I have to modify the code in order to get > logging, I'd rather not have to then go into adb shell and reboot gecko > every time I reboot the phone. You could edit b2g.sh on the phone so then you don't need to restart gecko. Alternatively, you can edit gonk-misc/b2g.sh so that it's right from the time you flash. Here are the steps required for changing b2g.sh on the phone: https://developer.mozilla.org/en-US/docs/Mozilla/Firefox_OS/Debugging/Customizing_the_b2g.sh_script

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Depends on: 835825

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Depends on: 835795

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Depends on: 835829

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

status-b2g18-v1.0.0: --- → fixed

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

No longer depends on: 835795

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

No longer depends on: 835563

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

No longer depends on: 835829

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

12 years ago

Depends on: 836735

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

status-b2g18-v1.0.1: --- → fixed

Comment 27

•

12 years ago

No need for TC in Moztrap for this issue.

Flags: in-moztrap-

Comment 28

•

12 years ago

Michael Vines if you can Verify this bug out, it would be much appreciated.

Flags: needinfo?(mvines)

Keywords: verifyme

Michael Vines [:m1] [:evilmachines]

Reporter

Updated

•

12 years ago

Status: RESOLVED → VERIFIED

Michael Vines [:m1] [:evilmachines]

Reporter

Updated

•

12 years ago

Flags: needinfo?(mvines)

Shally Li（leave from Mozilla）

Updated

•

10 years ago

Keywords: verifyme

Shally Li（leave from Mozilla）

Comment 29

•

10 years ago

Per comment 29,I clear "Verifyme".

You need to log in before you can comment on or make changes to this bug.