Closed Bug 859149 Opened 13 years ago Closed 5 years ago

<iframe mozbrowser>s' processes don't seem to get killed if the iframe is removed from the DOM too early

Tracking

()

Status:

RESOLVED INVALID

People

(Reporter: justin.lebar+bug, Unassigned)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(3 files)

Testcase with workaround for this bug 13 years ago Justin Lebar (not reading bugmail) 3.63 KB, text/html		Details
zombie.diff 12 years ago Cervantes Yu [:cyu] [:cervantes] 1.65 KB, patch		Details \| Diff \| Splinter Review
WIP: handle EPIPE when the AsyncChannel fails to connect. 12 years ago Cervantes Yu [:cyu] [:cervantes] 798 bytes, patch		Details \| Diff \| Splinter Review

Justin Lebar (not reading bugmail)

Reporter

Description

•

13 years ago

In writing testcases for bug 844323, I noticed that if I remove an <iframe mozbrowser> from the DOM at the right time, its process is never killed! I'll post a link to my WIP testcase which causes this problem, but first I want this bug number so I can reference it in the test.

Justin Lebar (not reading bugmail)

Reporter

Comment 1

•

13 years ago

Attached file Testcase with workaround for this bug — Details

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Whiteboard: [MemShrink]

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Blocks: browser-api

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 2

•

13 years ago

This is most likely the problem that causes a bunch of intermittent oranges in the ipc+idb tests.

Justin Lebar (not reading bugmail)

Reporter

Comment 3

•

13 years ago

Can you link to some of those bugs?

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 4

•

13 years ago

Bug 783913 tracks this I think.

Justin Lebar (not reading bugmail)

Reporter

Comment 5

•

13 years ago

Okay, cool. I don't want to get your hopes up too much: I observed this happening only when I removed the frame very soon after I inserted it into the DOM. Your tests may be hitting something unfortunately different; I guess it remains to be seen.

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Blocks: 783913

Nicholas Nethercote [inactive]

Comment 6

•

13 years ago

jlebar, can you assign a MemShrink priority to this? We're not sure what to give it. Also, would https://bugzilla.mozilla.org/show_bug.cgi?id=842003 help with this?

Justin Lebar (not reading bugmail)

Reporter

Comment 7

•

13 years ago

> Also, would https://bugzilla.mozilla.org/show_bug.cgi?id=842003 help with this? If we're hitting this case in our tests, I'm pretty sure it would turn the tests orange anyway (something about a zombie process). Now that I know more about this bug, I think this is likely a P3, because I've never seen evidence that we're hitting it in B2G. I also think if we were hitting it we'd probably set the oom priority of the process to bg first, which would make it not a huge deal.

Whiteboard: [MemShrink] → [MemShrink:P3]

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Blocks: 867739

Justin Lebar (not reading bugmail)

Reporter

Comment 8

•

13 years ago

This is probably now a P2 or a P1, because I think I'm seeing this in bug 867739.

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Whiteboard: [MemShrink:P3] → [MemShrink]

Justin Lebar (not reading bugmail)

Reporter

Comment 9

•

13 years ago

What's worst about this is that the zombie created in bug 867739 has foreground oom_adj, which means it will essentially never get killed!

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Whiteboard: [MemShrink] → [MemShrink:P2]

Andre Graziani (:graziani)

Comment 10

•

12 years ago

We have observed a Browser zombie process in Leo, that might be related to this issue. We run the Orangutan based tests, that consist of opening an app and put it in background after 10 seconds. We do it recursively with Marketplace, E-mail and Contacts. The zombie process appears after 2 to 10 hours of tests, with memeater consuming 200M. The zombie Browser has priority of Foreground app. And its size is something between 10M to 24M Rss).

leo.bugzilla.gecko

Updated

•

12 years ago

blocking-b2g: --- → leo?

leo.bugzilla.gecko

Comment 11

•

12 years ago

Moving the comment above to a new bug 899437

Blocks: 899437

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

No longer blocks: 899437

Justin Lebar (not reading bugmail)

Reporter

Comment 12

•

12 years ago

Leo, can you clarify whether you've seen zombie processes that aren't related to bug 899437? Such zombie processes would probably show up as app processes, not browser processes, as in bug 899437. You could test for this by launching a bunch of apps but not launching apps which use Persona/Mozilla Identity.

Wayne Chang [:wchang]

Comment 13

•

12 years ago

Triage - not blocking for release, but leo would like to leoVB this when it gets fixed. Noming for koi blocking.

blocking-b2g: leo? → koi?

Justin Lebar (not reading bugmail)

Reporter

Comment 14

•

12 years ago

(In reply to Wayne Chang [:wchang] from comment #13) > Triage - not blocking for release, but leo would like to leoVB this when it > gets fixed. > Noming for koi blocking. Wayne, this means that we'd take this change in Leo's vendor build, but not in Mozilla's 1.1 build? That seems extremely unsafe to me. It also seems logistically very complicated, since Leo would have to resolve any merge conflicts themselves, with no oversight from us. Could you please clarify? Thanks.

Flags: needinfo?(wchang)

Wayne Chang [:wchang]

Comment 15

•

12 years ago

Hi Justin, Sorry if my comment was a bit unclear. Essentially they're saying this wouldn't block the release but if there is a patch that they can take (i.e. a patch ported for v1.1), they will. I'll put forward your concern too to make sure they dont attempt to do what you've described. Thanks for highlighting this. (In reply to Justin Lebar [:jlebar] from comment #14) > (In reply to Wayne Chang [:wchang] from comment #13) > > Triage - not blocking for release, but leo would like to leoVB this when it > > gets fixed. > > Noming for koi blocking. > > Wayne, this means that we'd take this change in Leo's vendor build, but not > in Mozilla's 1.1 build? > > That seems extremely unsafe to me. It also seems logistically very > complicated, since Leo would have to resolve any merge conflicts themselves, > with no oversight from us. > > Could you please clarify? Thanks.

Flags: needinfo?(wchang)

Justin Lebar (not reading bugmail)

Reporter

Comment 16

•

12 years ago

So do we have permission to land this in the b2g18 repository even though this is not leo+? Or do we need to come back and re-nom for blocking leo at that point and wait for triage to re-consider the bug?

Flags: needinfo?(wchang)

Wayne Chang [:wchang]

Comment 17

•

12 years ago

Triage would need to reconsider it, and if that happens and lands on b2g18, leo will take this when it happens.

Flags: needinfo?(wchang)

Justin Lebar (not reading bugmail)

Reporter

Comment 18

•

12 years ago

If release drivers are going to ask that we re-triage bugs like this, I think it should be release drivers' responsibility to make sure that we actually do re-triage bugs like this after they land. If it's up to engineers to remember to renom a bug, we're going to miss things, and that itself is risky to the product. One way you could do this is to keep a list (e.g. an etherpad) of bugs that are in the status of "we'd take a safe patch." Then on a regular basis you'd see if any bugs on this list were closed recently, and follow up asking if the patch is safe enough to take on Leo. This really shouldn't be up to engineers to remember; we're going to forget often enough for it to be problematic.

Flags: needinfo?(wchang)

Andrew Overholt [:overholt]

Comment 19

•

12 years ago

Whenever possible, I'd like to see us block on something (and thus take patches) if it's indeed a release blocker issue and not consider the "safety" of a real or theoretical patch when making blocking decisions.

Justin Lebar (not reading bugmail)

Reporter

Comment 20

•

12 years ago

> Whenever possible, I'd like to see us block on something (and thus take patches) I understand and in fact share release drivers' hesitancy to mark things that don't strictly block release as blockers. As Jonas and I have argued ad nauseum, what's missing is a formal will-take-a-safe-patch status. In the absence of such a formal status, I totally agree with Andrew that it's better to mark bugs as blockers than use an informal system, as is being done here. But just to be clear, I don't think it would be a satisfactory conclusion to this discussion to say that we're just going to mark these bugs as blockers. We've had this discussion about "will-take-a-safe-patch" bugs many times and the conclusion has always been a commitment not to give bugs the status that this bug was just given. I would really like us to stop going in circles like this. Given the history of triage rules changing in this way, I would like release drivers to make a commitment to follow up with bugs that they give this will-take-a-safe-patch status, contingent on their actually giving any bugs this status. Then the question of whether any bugs should have this status in the first place can be a separate issue, which we may or may not discuss again. Does that sound OK?

Cervantes Yu [:cyu] [:cervantes]

Comment 21

•

12 years ago

Attached patch zombie.diff — Details — Splinter Review

Not sure if this is related, but I can create zombie processes with this test. By applying this patch, attaching to the b2g process, and then setting base::gKillApp = true, I can create zombie processes in trying to open new apps (and then fail).

Cervantes Yu [:cyu] [:cervantes]

Comment 22

•

12 years ago

Attached patch WIP: handle EPIPE when the AsyncChannel fails to connect. — Details — Splinter Review

This kills the zombie created in the above test.

Shelly Lin [:shelly]

Comment 23

•

12 years ago

Adding the following lines if(mExistingListener) mExistingListener->OnChannelError(); in |AsyncChannel::ProcessLink::OnChannelError()| can spot the error when connecting channels, I'll provide more details with a fix patch later.

Shelly Lin [:shelly]

Updated

•

12 years ago

Assignee: nobody → slin

Shelly Lin [:shelly]

Comment 24

•

12 years ago

(In reply to Cervantes Yu from comment #21) > Created attachment 792139 [details] [diff] [review] > zombie.diff > > Not sure if this is related, but I can create zombie processes with this > test. By applying this patch, attaching to the b2g process, and then setting > base::gKillApp = true, I can create zombie processes in trying to open new > apps (and then fail). Well, seems like we couldn't reproduce the zombie processes anymore, unfortunately. More closely, |mExistingListener->OnChannelError();| calls the OnChannelError() in GeckoChildProcessHost, which is an empty function now, so I don't see how it can fix the problem. However, this should relate to bug 773925.

Shelly Lin [:shelly]

Comment 25

•

12 years ago

We can only reproduce the hang or zombie processes through gdb, bug 911009 might be a possible fix.

Assignee: slin → nobody

Andrew Overholt [:overholt]

Comment 26

•

12 years ago

(In reply to Shelly Lin [:shelly] from comment #25) > We can only reproduce the hang or zombie processes through gdb, bug 911009 > might be a possible fix. This makes me think we should at least remove the koi nom here and potentially either dupe this or WONTFIX it. What do you think, Shelly?

Flags: needinfo?(slin)

Shelly Lin [:shelly]

Comment 27

•

12 years ago

Yeah, I think it make sense to remove the koi nom here, but maybe we should keep this bug open, tag this bug with IPC related issue.

Flags: needinfo?(slin)

Andrew Overholt [:overholt]

Updated

•

12 years ago

blocking-b2g: koi? → ---

Wayne Chang [:wchang]

Updated

•

12 years ago

Flags: needinfo?(wchang)

Benjamin Smedberg

Updated

•

12 years ago

Component: IPC → DOM: Content Processes

Firefox Bug Husbandry Bot

Comment 28

•

8 years ago

https://bugzilla.mozilla.org/show_bug.cgi?id=1472046 Move all DOM bugs that haven't been updated in more than 3 years and has no one currently assigned to P5. If you have questions, please contact :mdaly.

Priority: -- → P5

Peter Van der Beken [:peterv]

Comment 29

•

5 years ago

mozbrowser is gone.

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → INVALID

You need to log in before you can comment on or make changes to this bug.