2.16 KB, text/plain
During the stability tests in LEO, a Browser zombie process appears and is never killed. The tests we run is based on Orangutan and consist of opening an app and put it in background after 10 seconds. We do it recursively with Marketplace, E-mail and Contacts. The zombie process appears after 2 to 10 hours of tests, with memeater consuming 200M. The zombie Browser has priority of Foreground app. And its size is something between 10M to 24M (Rss).
This Browser process is launched from the Marketplace app. Of course, bug 859149 is the root cause of this issue, but, could we avoid launching a new process in this specific case at Gaia level? Something like, setting "remote" attribute to false for the iframe at main.js of Marketplace.
Hi Christopher, I am not sure if you are the one in charge of Marketplace... If so, does the comment above make any sense to you? Thanks.
I'm wondering if this is related to the problem potentially we're seeing in bug 898969. Leo - Btw, Chris doesn't work in this area of expertise with LMK. You probably want Justin in this case. I'm going to redirect your needinfo request to Justin.
> I'm wondering if this is related to the problem potentially we're seeing in bug 898969. I'd give it long odds of being the same, but it could be.
> Something like, setting "remote" attribute to false for the iframe at main.js of > Marketplace. I don't think we can do this. We rely on running apps out of process to meet security and stability requirements. Moving the marketplace app in-process would make the main process less secure and less stable. In particular, a memory leak in the main process would crash the whole phone, and a security breach in the marketplace app would pwn the whole phone. It's also not clear that this is specifically related to the marketplace app. My guess is that this happened by chance, perhaps because the marketplace app is one of the slower apps to load. Please don't make this change in the vendor build over my objections here.
Created attachment 784194 [details] Logs about process priorities on launching Marketplace Thanks for the analysis. I am not going to apply any changes without a r+. But the more I look to this issue, the more i think it is specific of Marketplace. Attached is a commented log with ProcessPriorityManager log enabled, when I launch Marketplace. At the end I have a Browser process with Foreground priority and no Marketplace. I killed Marketplace, but I don't think it is necessary. So, I don't think it is even related to LMK. The only different thing I observed, is that MktPlace is displayed but the "loading" image keeps spinning and never really loads the applications. Btw, the "remote false" suggestion had the intention to avoid launching this Browser process, running everything in the Marketplace process. Not in B2G process.
Ah, this "remote false" bit confused me. I bet if you set remote false on this mozbrowser, you won't see any difference in behavior. If the marketplace app runs out-of-process, then any mozbrowser iframes it contains run in-process automatically. We have plans to change this, but not for b2g18. I think you may be seeing the process which backs Persona (identity). In theory this was fixed by bug 839500, but we may have regressed that.
(If we did indeed regress bug 839500, that would be very bad.)
Right. The extra browser process is launched from B2G. And putting "remote false" inside Marketplace doesn't change anything. The problem seems to be the identity, as you said. The fix from bug 839500 is in Leo. There might be a specific case when the Persona hangs, so the Browser process is never terminated, and Marketplace keeps waiting for it.
Thanks, Andre. Jed, do you have time to look at this in the near future? Leo, does this block 1.1?
Hi Leo, bug 811636 fixes the problem of leaving zombie processes in b2g, the patch is checked in to b2g-18 on 7/26, I can't assure it is the fix for this bug, but very possible from your description of issue. Could you verify this bug with the fix in bug 811363? Thanks.
Partner will check if bug 811636 has been applied while doing this test. Partner to also check on the blocking severity of this bug. (In reply to Shelly Lin [:shelly] from comment #11) > Hi Leo, bug 811636 fixes the problem of leaving zombie processes in b2g, the > patch is checked in to b2g-18 on 7/26, I can't assure it is the fix for this > bug, but very possible from your description of issue. Could you verify this > bug with the fix in bug 811363? Thanks.
Hi Shelly, I actually came with this bug while testing for bug 811636. And I concluded that this issue happens before and after bug 811636. Btw, that is already in our repository. Thanks.
Based on what we've discussed so far, I don't think this bug has to do with bug 859149. That bug specifically has to do with the effects of removing an <iframe mozbrowser> from the DOM too quickly. This bug has to do with the marketplace app's identity frame living for too long.
Triage - not blocking for leo release, but partner would like to leoVB this when patches are ready. Noming for koi blocking.
I can take a look at this, yes.
Similarly to bug 859149 comment 14: If comment 15 means that we're going to take this in Leo's vendor builds but not in Mozilla's b2g18 builds, that seems extremely unsafe to me. Can you please clarify what the plan is here? Should we treat this, in terms of priorities, as a leo+ blocker? Thanks.
I'm on PTO this coming week, so I just want to relate that Kyle Huey spent a couple hours with Zach Carter and me hacking on this problem. We have opened bug 903134 (possibly unrelated but a good find!) and bug 903154 to help get more information.
As in bug 859149 - Leo will take this if this gets triaged for b2g18, landed on b2g18. (In reply to Justin Lebar [:jlebar] (limited availability 8/9 – 8/12) from comment #17) > Similarly to bug 859149 comment 14: If comment 15 means that we're going to > take this in Leo's vendor builds but not in Mozilla's b2g18 builds, that > seems extremely unsafe to me. > > Can you please clarify what the plan is here? Should we treat this, in > terms of priorities, as a leo+ blocker? Thanks.
Please confirm that this is added to the list I suggested in bug 859149 comment 18, and that you'll follow up after this bug is resolved on trunk. I don't want this one to fall through the cracks. Thanks.
JParsons How are we doing with this bug fix?
Hi, Preeti, I'm working right now on bug 920851, which may be related in some way. I could definitely use some help from someone on the platform side who understands the process issues more deeply. Now that we unfortunately no longer have Justin, is there someone from platform who can step in and help dig for the root cause?
So what's the parent pid of the zombie process? zombie processes hang around until the parent reaps them (by calling wait). Perhaps the parent is losing track of a child under some circumstances, and is failing to reap.
Jed, Can you respond to Dave
ni Tauni till Jed is back
Minus as we have shipped a couple of releases with this bug.
(In reply to Dave Hylands [:dhylands] from comment #23) > So what's the parent pid of the zombie process? > > zombie processes hang around until the parent reaps them (by calling wait). > > Perhaps the parent is losing track of a child under some circumstances, and > is failing to reap. I haven't been able to reproduce this on my own. I would love to dig in with you next week if you have some time to try to figure out what might be happening here. FWIW, the sign-in landscape for marketplace is going to change for 1.3 with the move to firefox accounts (See bug 920135), which on FirefoxOS does not launch any UI in an OOP frame.
It's been pretty quiet on this bug. Perhaps the root problem has been corrected? Are we still seeing this issue?
(In reply to Jed Parsons (use needinfo, please) [:jedp, :jparsons] from comment #28) > It's been pretty quiet on this bug. Perhaps the root problem has been > corrected? > Are we still seeing this issue? Hi Jed, it has passed several months since I haven't looked at this issue. Unfortunately currently I cannot run the test scripts right now. So I cannot affirm if it was fixed or not.
by comment 29
I haven't heard of any recurrences of this issue, so I'm going to close this one. Please reopen if I'm in error. Thanks! j