Last Comment Bug 899437 - [B2G][LEO] Browser zombie process appears after LMK starts taking action during stability tests (due to Marketplace app; maybe identity?)
: [B2G][LEO] Browser zombie process appears after LMK starts taking action duri...
Status: RESOLVED WORKSFORME
[MemShrink:P1]
:
Product: Firefox OS
Classification: Client Software
Component: General (show other bugs)
: unspecified
: ARM Gonk (Firefox OS)
: -- normal (vote)
: ---
Assigned To: Jed Parsons (use needinfo, please) [:jedp, :jparsons]
:
:
Mentors:
Depends on:
Blocks: 839500
  Show dependency treegraph
 
Reported: 2013-07-29 22:59 PDT by leo.bugzilla.gecko
Modified: 2014-03-19 16:25 PDT (History)
16 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
-


Attachments
Logs about process priorities on launching Marketplace (2.16 KB, text/plain)
2013-07-31 21:29 PDT, Andre Graziani (:graziani)
no flags Details

Description leo.bugzilla.gecko 2013-07-29 22:59:34 PDT
During the stability tests in LEO, a Browser zombie process appears and is never killed.
The tests we run is based on Orangutan and consist of opening an app and put it in background after 10 seconds. We do it recursively with Marketplace, E-mail and Contacts.
The zombie process appears after 2 to 10 hours of tests, with memeater consuming 200M.
The zombie Browser has priority of Foreground app. And its size is something between 10M to 24M (Rss).
Comment 1 leo.bugzilla.gecko 2013-07-29 23:17:20 PDT
This Browser process is launched from the Marketplace app.

Of course, bug 859149 is the root cause of this issue, but, could we avoid launching a new process in this specific case at Gaia level?

Something like, setting "remote" attribute to false for the iframe at main.js of Marketplace.
Comment 2 leo.bugzilla.gecko 2013-07-29 23:44:45 PDT
Hi Christopher,
I am not sure if you are the one in charge of Marketplace... If so, does the comment above make any sense to you?
Thanks.
Comment 3 Jason Smith [:jsmith] 2013-07-30 07:38:34 PDT
I'm wondering if this is related to the problem potentially we're seeing in bug 898969.

Leo - Btw, Chris doesn't work in this area of expertise with LMK. You probably want Justin in this case. I'm going to redirect your needinfo request to Justin.
Comment 4 Justin Lebar (not reading bugmail) 2013-07-30 14:49:16 PDT
> I'm wondering if this is related to the problem potentially we're seeing in bug 898969.

I'd give it long odds of being the same, but it could be.
Comment 5 Justin Lebar (not reading bugmail) 2013-07-31 10:23:55 PDT
> Something like, setting "remote" attribute to false for the iframe at main.js of 
> Marketplace.

I don't think we can do this.

We rely on running apps out of process to meet security and stability requirements.  Moving the marketplace app in-process would make the main process less secure and less stable.  In particular, a memory leak in the main process would crash the whole phone, and a security breach in the marketplace app would pwn the whole phone.

It's also not clear that this is specifically related to the marketplace app.  My guess is that this happened by chance, perhaps because the marketplace app is one of the slower apps to load.

Please don't make this change in the vendor build over my objections here.
Comment 6 Andre Graziani (:graziani) 2013-07-31 21:29:31 PDT
Created attachment 784194 [details]
Logs about process priorities on launching Marketplace

Thanks for the analysis. I am not going to apply any changes without a r+.

But the more I look to this issue, the more i think it is specific of Marketplace.

Attached is a commented log with ProcessPriorityManager log enabled, when I launch Marketplace. At the end I have a Browser process with Foreground priority and no Marketplace. I killed Marketplace, but I don't think it is necessary. So, I don't think it is even related to LMK.

The only different thing I observed, is that MktPlace is displayed but the "loading" image keeps spinning and never really loads the applications.

Btw, the "remote false" suggestion had the intention to avoid launching this Browser process, running everything in the Marketplace process. Not in B2G process.
Comment 7 Justin Lebar (not reading bugmail) 2013-08-01 01:32:38 PDT
Ah, this "remote false" bit confused me.

I bet if you set remote false on this mozbrowser, you won't see any difference in behavior.  If the marketplace app runs out-of-process, then any mozbrowser iframes it contains run in-process automatically.  We have plans to change this, but not for b2g18.

I think you may be seeing the process which backs Persona (identity).  In theory this was fixed by bug 839500, but we may have regressed that.
Comment 8 Justin Lebar (not reading bugmail) 2013-08-01 01:33:07 PDT
(If we did indeed regress bug 839500, that would be very bad.)
Comment 9 Andre Graziani (:graziani) 2013-08-01 18:13:21 PDT
Right. The extra browser process is launched from B2G. And putting "remote false" inside Marketplace doesn't change anything.

The problem seems to be the identity, as you said. The fix from bug 839500 is in Leo.

There might be a specific case when the Persona hangs, so the Browser process is never terminated, and Marketplace keeps waiting for it.
Comment 10 Justin Lebar (not reading bugmail) 2013-08-01 18:22:44 PDT
Thanks, Andre.

Jed, do you have time to look at this in the near future?

Leo, does this block 1.1?
Comment 11 Shelly Lin [:shelly] 2013-08-02 01:02:17 PDT
Hi Leo, bug 811636 fixes the problem of leaving zombie processes in b2g, the patch is checked in to b2g-18 on 7/26, I can't assure it is the fix for this bug, but very possible from your description of issue. Could you verify this bug with the fix in bug 811363? Thanks.
Comment 12 Wayne Chang [:wchang] 2013-08-02 01:48:59 PDT
Partner will check if bug 811636 has been applied while doing this test.
Partner to also check on the blocking severity of this bug.

(In reply to Shelly Lin [:shelly] from comment #11)
> Hi Leo, bug 811636 fixes the problem of leaving zombie processes in b2g, the
> patch is checked in to b2g-18 on 7/26, I can't assure it is the fix for this
> bug, but very possible from your description of issue. Could you verify this
> bug with the fix in bug 811363? Thanks.
Comment 13 Andre Graziani (:graziani) 2013-08-02 02:39:15 PDT
Hi Shelly,

I actually came with this bug while testing for bug 811636.
And I concluded that this issue happens before and after bug 811636.
Btw, that is already in our repository.

Thanks.
Comment 14 Justin Lebar (not reading bugmail) 2013-08-02 10:24:37 PDT
Based on what we've discussed so far, I don't think this bug has to do with bug 859149.  That bug specifically has to do with the effects of removing an <iframe mozbrowser> from the DOM too quickly.  This bug has to do with the marketplace app's identity frame living for too long.
Comment 15 Wayne Chang [:wchang] 2013-08-05 01:14:26 PDT
Triage - not blocking for leo release, but partner would like to leoVB this when patches are ready.
Noming for koi blocking.
Comment 16 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-08-05 09:14:35 PDT
I can take a look at this, yes.
Comment 17 Justin Lebar (not reading bugmail) 2013-08-05 09:30:48 PDT
Similarly to bug 859149 comment 14: If comment 15 means that we're going to take this in Leo's vendor builds but not in Mozilla's b2g18 builds, that seems extremely unsafe to me.

Can you please clarify what the plan is here?  Should we treat this, in terms of priorities, as a leo+ blocker?  Thanks.
Comment 18 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-08-09 21:14:52 PDT
I'm on PTO this coming week, so I just want to relate that Kyle Huey spent a couple hours with Zach Carter and me hacking on this problem.  We have opened bug 903134 (possibly unrelated but a good find!) and bug 903154 to help get more information.
Comment 19 Wayne Chang [:wchang] 2013-08-12 03:13:42 PDT
As in bug 859149 - Leo will take this if this gets triaged for b2g18, landed on b2g18.

(In reply to Justin Lebar [:jlebar] (limited availability 8/9 – 8/12) from comment #17)
> Similarly to bug 859149 comment 14: If comment 15 means that we're going to
> take this in Leo's vendor builds but not in Mozilla's b2g18 builds, that
> seems extremely unsafe to me.
> 
> Can you please clarify what the plan is here?  Should we treat this, in
> terms of priorities, as a leo+ blocker?  Thanks.
Comment 20 Justin Lebar (not reading bugmail) 2013-08-12 03:20:34 PDT
Please confirm that this is added to the list I suggested in bug 859149 comment 18, and that you'll follow up after this bug is resolved on trunk.  I don't want this one to fall through the cracks.

Thanks.
Comment 21 Preeti Raghunath(:Preeti) 2013-09-20 14:21:46 PDT
JParsons

How are we doing with this bug fix?
Comment 22 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-09-27 11:27:38 PDT
Hi, Preeti,

I'm working right now on bug 920851, which may be related in some way.  I could definitely use some help from someone on the platform side who understands the process issues more deeply.  Now that we unfortunately no longer have Justin, is there someone from platform who can step in and help dig for the root cause?
Comment 23 Dave Hylands [:dhylands] 2013-09-27 11:35:06 PDT
So what's the parent pid of the zombie process?

zombie processes hang around until the parent reaps them (by calling wait).

Perhaps the parent is losing track of a child under some circumstances, and is failing to reap.
Comment 24 Preeti Raghunath(:Preeti) 2013-10-02 18:44:03 PDT
Jed,

Can you respond to Dave
Comment 25 Preeti Raghunath(:Preeti) 2013-10-15 21:51:19 PDT
ni Tauni till Jed is back
Comment 26 Preeti Raghunath(:Preeti) 2013-10-23 10:19:07 PDT
Minus as we have shipped a couple of releases with this bug.
Comment 27 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-10-24 00:43:25 PDT
(In reply to Dave Hylands [:dhylands] from comment #23)
> So what's the parent pid of the zombie process?
> 
> zombie processes hang around until the parent reaps them (by calling wait).
> 
> Perhaps the parent is losing track of a child under some circumstances, and
> is failing to reap.

I haven't been able to reproduce this on my own.  I would love to dig in with you next week if you have some time to try to figure out what might be happening here.

FWIW, the sign-in landscape for marketplace is going to change for 1.3 with the move to firefox accounts (See bug 920135), which on FirefoxOS does not launch any UI in an OOP frame.
Comment 28 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2014-03-12 08:54:05 PDT
It's been pretty quiet on this bug.  Perhaps the root problem has been corrected?
Are we still seeing this issue?
Comment 29 Andre Graziani (:graziani) 2014-03-12 21:22:21 PDT
(In reply to Jed Parsons (use needinfo, please) [:jedp, :jparsons] from comment #28)
> It's been pretty quiet on this bug.  Perhaps the root problem has been
> corrected?
> Are we still seeing this issue?

Hi Jed, it has passed several months since I haven't looked at this issue.
Unfortunately currently I cannot run the test scripts right now. So I cannot affirm if it was fixed or not.
Comment 30 leo.bugzilla.gecko 2014-03-12 21:36:39 PDT
by comment 29
Comment 31 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2014-03-19 16:25:17 PDT
I haven't heard of any recurrences of this issue, so I'm going to close this one.  Please reopen if I'm in error.  Thanks! j

Note You need to log in before you can comment on or make changes to this bug.