Closed Bug 594847 Opened 9 years ago Closed 9 years ago

Handle content-process crashes more gracefully

Categories

(Core :: IPC, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0+ ---

People

(Reporter: cjones, Assigned: mfinkle)

Details

(Whiteboard: [strings])

Attachments

(3 files, 1 obsolete file)

ContentParent currently assumes that there's one and only one ContentParent the entire time firefox-bin is open.  That's not true in the presence of crashes; we want to reload the content process.  ContentParent needs to be fixed to assume that there's one and only one ContentParent at *any given* point in time, though not necessarily the same from one moment to the next.

We also need to fire a notification when the content process crashes so that the frontend can deal with the crash.  Most of this code can probably be cribbed from or shared with dom/plugins.
This absolutely needs to block 2.0 final.  Depending on how successful we are at fixing android crashiness (OOM-iness?), this may need to block b1 as well.
tracking-fennec: --- → ?
tracking-fennec: ? → 2.0+
This involves strings for the crash UI and submission. Does that mean it needs to block a beta?
Whiteboard: [strings]
A notification for a content process hang would be nice too (after some timeout?), though we could do that in frontend.
Chris: Are you sure what you describe in comment 0 is true?  I specifically added code so that if the existing ContentParent is dead, a new one will spawn.  See bug 545237.
Proposal:

- when content crashes - we put up a message on the current tab, telling people something happened and that they should reload (i.e. we don't auto-reload).
- set the other tabs so that they auto-reload when the user switches to them

This means that the user
- will see a message about what happened, and can make a decision (i.e. if the current page is the problem, the user won't get into a reload cycle
- doesn't have to keep re-encountering the problem as he/she goes to other tabs
  - there's a bit of annoyance as those other pages reload, but it doesn't look like we're continually running into a problem

and we're not reloading _everything_ all at once, which has its own problems.

We can use the diagonal-stripe style from the crashed plugin UI for the background when we put up our message.  We should have the titlebar on screen so that the reload button is accessible; the displayed title should be that for the page that will reload when they hit reload (i.e. the "current" page).

First attempt at wording in next comment.
Attached patch platform patch - send notif (obsolete) — Splinter Review
This patch simply sends a notification when the content process closes. The front-end can use this to handle the situation and act appropriately.
Assignee: nobody → mark.finkle
Attachment #481628 - Flags: review?(jones.chris.g)
Comment on attachment 481628 [details] [diff] [review]
platform patch - send notif

>diff --git a/dom/ipc/ContentParent.cpp b/dom/ipc/ContentParent.cpp
>--- a/dom/ipc/ContentParent.cpp
>+++ b/dom/ipc/ContentParent.cpp
>@@ -112,18 +112,24 @@ ContentParent::GetSingleton(PRBool aForc
> 
> void
> ContentParent::ActorDestroy(ActorDestroyReason why)
> {
>     nsCOMPtr<nsIThreadObserver>
>         kungFuDeathGrip(static_cast<nsIThreadObserver*>(this));
>     nsCOMPtr<nsIObserverService>
>         obs(do_GetService("@mozilla.org/observer-service;1"));
>-    if (obs)
>+    if (obs) {
>         obs->RemoveObserver(static_cast<nsIObserver*>(this), "xpcom-shutdown");
>+
>+        nsString context = NS_LITERAL_STRING("");
>+        if (AbnormalShutdown == why)
>+            context.AssignLiteral("abnormal");
>+        obs->NotifyObservers(nsnull, "ipc:content-shutdown", context.get());

You should dispatch this after |mIsAlive = false| below so that if any code tries to create a new content process off the notification, they'll succeed (the old ContentParent needs to think that it's dead).

>+    }
>     nsCOMPtr<nsIThreadInternal>
>         threadInt(do_QueryInterface(NS_GetCurrentThread()));
>     if (threadInt)
>         threadInt->SetObserver(mOldObserver);
>     if (mRunToCompletionDepth)
>         mRunToCompletionDepth = 0;
> 
>     mIsAlive = false;
Attachment #481628 - Flags: review?(jones.chris.g)
Attached patch patch 2Splinter Review
Moves firing the notification until after mIsAlive is set to false
Attachment #481628 - Attachment is obsolete: true
Attachment #481762 - Flags: review?(jones.chris.g)
Attachment #481762 - Flags: review?(jones.chris.g) → review+
Whiteboard: [strings] → [strings] [needs-landing]
Not in love with this, but in the name if incremental improvements, how about:

Sorry!
Something went wrong while displaying a webpage. Tap reload to continue.
Flags: in-testsuite?
Flags: in-litmus?
(In reply to comment #10)
> Not in love with this, but in the name if incremental improvements, how about:
> 
> Sorry!
> Something went wrong while displaying a webpage. Tap reload to continue.

OK. We still need the auto-reload feature, so any open tabs can be marked as "needs-reload"

I'll get an incremental patch for front-end up here soon
Whiteboard: [strings] [needs-landing] → [strings]
(In reply to comment #6)
> Proposal:
> 
> - when content crashes - we put up a message on the current tab, telling people
> something happened and that they should reload (i.e. we don't auto-reload).
> - set the other tabs so that they auto-reload when the user switches to them
> 
> This means that the user
> - will see a message about what happened, and can make a decision (i.e. if the
> current page is the problem, the user won't get into a reload cycle
> - doesn't have to keep re-encountering the problem as he/she goes to other tabs
>   - there's a bit of annoyance as those other pages reload, but it doesn't look
> like we're continually running into a problem
> 
> and we're not reloading _everything_ all at once, which has its own problems.
> 
> We can use the diagonal-stripe style from the crashed plugin UI for the
> background when we put up our message.  We should have the titlebar on screen
> so that the reload button is accessible; the displayed title should be that for
> the page that will reload when they hit reload (i.e. the "current" page).

As I am implementing this feature, I have found two things I don't like with this proposal:
* The "Sorry, we had a problem... Tap reload..." message appears as the user switches tabs since each busted browser tab will have the message. I think this will confuse the user into thinking the browser broke _again_, when it was just a left over remnant error message. Only the active tab during the crash should display a message and allow the user to revive.
* We force the user to explicitly tap reload to revive the browser. I think the act of switching to the tab should be enough of a hint for us to revive the browser.

I realize that reviving when the user switches tabs could actually reload a page that caused the browser to crash in the first place! Honestly, I don't think forcing the user to tap reload mitigates that risk. Trial and error will be their teacher. And the browser should crash on any pages to begin with!
Attached patch patch (mobile)Splinter Review
This patch adds basic resurrection functionality to the front-end:
* Adds a listner for the ipc:content-shutdown notification
* Adds simple resurrection feature to Tab class
  * Destroys/re-creates the browser saving the session store data
* Adds a delayed restore feature to the SessionRestore service. Firefox desktop does something similar
* The workflow is a bit different than Madhava's proposal, mostly for simplicity. I expect we will iterate to make this better:
  * When a crash occurs, a prompt tells the user something bad happened and let's them reload or close the active tab
  * If the user wants to close the active tab, we close it and it goes into the "undo closed tab" UI.
  * As the user selects any "resurrected" tab, we reload/restore that tab.

The bad: the prompt sucks - we need something like the error page Madhava describes.
The not so bad: I like the "select tab to restore" feature. The user can always just close a resurrected tab - which then goes into the "undo closed tab" UI.
Attachment #482009 - Flags: review?(mbrubeck)
Just wanted to add. I think tying this into session restore, as the patch does, is the way to go in the long run. When we restore we can add back more session state as we support it.
Comment on attachment 482009 [details] [diff] [review]
patch (mobile)

Looks great to me, with just one silly nit:

>+tabs.crashWarningMsg=Something went wrong while displaying a webpage.

Our standard style seems to be "web page" instead of "webpage."
Attachment #482009 - Flags: review?(mbrubeck) → review+
fixed nit and pushed:
http://hg.mozilla.org/mobile-browser/rev/e36dca76292d

This is a basic functionality patch. Filed bug 603098 to make the UX flow better.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Attached image screenshot
You need to log in before you can comment on or make changes to this bug.