GeckoSession.ContentDelegate.onCrash() gets called for crashes and when content process gets killed - making it hard to know when to auto-restore
Categories
(GeckoView :: Sandboxing, defect, P1)
Tracking
(firefox67 wontfix, firefox68 wontfix, firefox69 fixed, firefox70 fixed)
People
(Reporter: sebastian, Assigned: awaseem)
References
()
Details
(Keywords: crash, Whiteboard: [geckoview:fenix:m7])
Attachments
(2 files, 2 obsolete files)
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
Details | Review |
There are two situations where the content process can die:
- (1) The content process crashes
- (2) The content process gets killed (by Android or by running
kill -9
on it)
GeckoView API:
- (1) When the content process crashes
- GeckoView notifies our registered
crashHandler
in a different process. - GeckoView calls GeckoSession.ContentDelegate.onCrash() for affected sessions
- GeckoView notifies our registered
- (2) When the content process gets killed
- GeckoView calls GeckoSession.ContentDelegate.onCrash() for affected sessions
This makes sense since (2) is not actually a crash and therefore doesn't invoke the crashHandler
.
AC:
- The
crashHandler
implementation (lib-crash
) handles the crash via a service running in a different process and notifies the app about this non-fatal crash via anIntent
. - The implementation of
GeckoSession.ContentDelegate.onCrash()
sets acrashed
flag on theSession
and notifies its observers.
Fenix:
-
For non-fatal crashes (Intent from
lib-crash
): It shows an in-app UI that let's the user decide whether to report it and whether to restore the crashes tab(s) or not. -
Session.crashed
is not handled by the app right now (like reported in issue 1577).
Problem:
Session.crashed
coming from GeckoSession.ContentDelegate.onCrash()
can be caused by (1) content process crash or (2) content process getting killed.
The app wants to handle those cases differently:
-
(1) is already handled through the in-app crash UI and the
crashed
flag will be used to determine which sessions need to be restored. -
(2) The app (or the code in AC) could automatically restore the crashed sessions since that is expected to happen when Android has killed the process.
When GeckoSession.ContentDelegate.onCrash()
we have no way to know whether this is actually a crash or just the process getting killed.
We could try to correlate onCrash()
getting called with the Intent
coming from the crashHandler
(onCrash()
without ´Intent` coming in: auto-restore). But both actions are happening independently, asynchronously from different processes. That sounds super fragile and error-prone.
When GeckoView calls onCrash()
could it tell us whether that is from an actual crash (or have two different callbacks)?
For example WebView's onRenderProcessGone() receives a RenderProcessGoneDetail object with a didCrash()
method.
Reporter | ||
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Adding [geckoview:fenix:m6]
whiteboard tag because we should decide what we want to do before Fenix MVP.
Comment 2•5 years ago
|
||
Sebastian says Fenix would like this for MVP so the app knows whether to reload the page or show the crash reporter. Maybe add separate onCrash and onKill methods? Or a reason flag to onCrash?
Alvina said she will take this bug.
Comment 3•5 years ago
•
|
||
FYI I don't think there's a way to know why the content process is gone besides looking at the crash reporter.
This is what WebView does: https://cs.chromium.org/chromium/src/android_webview/browser/aw_browser_terminator.cc?l=116-134&rcl=8506cd4563dc9736e314dc7732499cddafc0af89
Also, looking at the WebView docs it says it will report that didCrash() == true
when Android kills the proces because of low memory (ie. case 2 of the OP).
Indicates whether the render process was observed to crash, or whether it was killed by the system. If the render process was killed, this is most likely caused by the system being low on memory.
I'm not sure if having a different behavior between crash and low memory kill is the correct thing to do here. If we lose the content process when the app is in the foreground, that's a serious issue and I would argue we should show the crashed tab UI. We could always refresh the tab on the background (and show the crashreporter prompt if we have a stacktrace).
Assignee | ||
Comment 4•5 years ago
|
||
Chris, we won't have this fix for MVP but let's keep it at high priority so we can ship a fix as soon as it's available.
Updated•5 years ago
|
Comment 6•5 years ago
|
||
Sebastian says this bug is a high priority and he would like this fix for a Fenix 1.0.1 dot release.
Assignee | ||
Comment 7•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 8•5 years ago
|
||
We'd like to fix this bug in GV 69, so we'll need to uplift the fix to Beta.
Pushed by jwillcox@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/81f121f3a7cb Add ContentDelegate.onKill() to differentiate between content process crashes and kills. r=geckoview-reviewers,agi,snorp
Comment 10•5 years ago
|
||
Backed out changeset 81f121f3a7cb (Bug 1557096) for ES lint failure on GeckoViewContent.jsm
Backout link: https://hg.mozilla.org/integration/autoland/rev/b6f7a0ad2f329e0d237aacea2e677faef3e565f3
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256027132&repo=autoland&lineNumber=289
[task 2019-07-11T21:03:26.487Z] Installing eslint for mach using "/usr/local/bin/node /usr/local/bin/npm install --loglevel=error"...
[task 2019-07-11T21:03:26.487Z]
[task 2019-07-11T21:03:26.487Z] eslint installed successfully!
[task 2019-07-11T21:03:26.487Z]
[task 2019-07-11T21:03:26.487Z] NOTE: Your local eslint binary is at /builds/worker/checkouts/gecko/node_modules/.bin/eslint
[task 2019-07-11T21:03:26.487Z]
[task 2019-07-11T21:17:32.074Z] TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/mobile/android/modules/geckoview/GeckoViewContent.jsm:224:15 | Replace browser?&&?aSubject.get("childID")?!=?browser.frameLoader.childID
with ?????????????browser?&&?????????????aSubject.get("childID")?!=?browser.frameLoader.childID???????????
(prettier/prettier)
[taskcluster 2019-07-11 21:17:32.571Z] === Task Finished ===
[taskcluster 2019-07-11 21:17:33.418Z] Unsuccessful task run with exit code: 1 completed in 1268.372 seconds
Comment 11•5 years ago
|
||
Pushed by jwillcox@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4629e855ea33 Add ContentDelegate.onKill() to differentiate between content process crashes and kills. r=geckoview-reviewers,agi,snorp
Comment 12•5 years ago
|
||
bugherder |
Reporter | ||
Comment 13•5 years ago
|
||
Oh, there have been a bunch of regressions linked. Will this block uplifting?
Comment 14•5 years ago
|
||
Backout by rgurzau@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/0fb761cd8fa4 Backed out 2 changesets (bug 1557096, bug 1565410) for multiple regressions linked to Bug 1557096. a=backout CLOSED TREE
Comment 15•5 years ago
|
||
:sebastian Hi, the linked regressions resulted in backing out this bug and Bug 1565410 for conflicting with the backout.
Backout link: https://hg.mozilla.org/mozilla-central/rev/0fb761cd8fa448eb511e8b2520ede5a3230853ed
Reporter | ||
Updated•5 years ago
|
Comment 16•5 years ago
|
||
Alvina, if the Robocop test failures in bug 1565794 are not actually caused by your onCrash fix, can we re-land your patches?
We also need to uplift your patches to GV 69 Beta.
Comment 17•5 years ago
|
||
Sorry about the trouble caused here. Alvina, Agi could you try to reland this? I tried importing the patch here but it's conflicting on this file https://hg.mozilla.org/mozilla-central/diff/a2180268af1d0d8943eb32535b8d3503756eef75/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/doc-files/CHANGELOG.md#l1.13 from Bug 1565410 which is now fixed. Thank you.
Assignee | ||
Comment 18•5 years ago
|
||
I fixed the merge conflict with the changelog, so it should be ready to land now :)
Comment 19•5 years ago
|
||
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6b80e71611e5 Add ContentDelegate.onKill() to differentiate between content process crashes and kills. r=geckoview-reviewers,agi,snorp
Comment 20•5 years ago
|
||
bugherder |
Comment 21•5 years ago
|
||
Please nominate this patch for Beta approval when you get a chance.
Assignee | ||
Comment 22•5 years ago
|
||
Comment on attachment 9073959 [details]
Bug 1557096 - Add ContentDelegate.onKill() to differentiate between content process crashes and kills. a=RyanVM
Beta/Release Uplift Approval Request
- User impact if declined: This change affects how the app knows whether to reload the page or show the crash reporter.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The change is not risky since mostly pre-existing crash event handlers were modified in name rather than anything new being added.
- String changes made/needed:
Comment 23•5 years ago
|
||
Comment on attachment 9073959 [details]
Bug 1557096 - Add ContentDelegate.onKill() to differentiate between content process crashes and kills. a=RyanVM
Improves GeckoView's ability to determine whether a page crashed or was force-killed (and therefore which UI to show). Approved for GV69.
Comment 24•5 years ago
|
||
Conflicts when I tried to uplift this patch:
warning: conflicts while merging mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/ContentDelegateTest.kt!
warning: conflicts while merging mobile/android/geckoview/src/main/java/org/mozilla/geckoview/doc-files/CHANGELOG.md!
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 25•5 years ago
|
||
I'll fix it and push again, thanks
Comment 26•5 years ago
|
||
bugherder uplift |
Comment 27•5 years ago
|
||
backout |
Backed out for apilint failure at mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoAppShell.java:40:
https://hg.mozilla.org/releases/mozilla-beta/rev/ba09b92b6286a1752031d434151bc5d1aa210be4
This causes also test failures:
https://treeherder.mozilla.org/logviewer.html#?job_id=258333418&repo=mozilla-beta
https://treeherder.mozilla.org/logviewer.html#?job_id=258333414&repo=mozilla-beta
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&resultStatus=usercancel%2Ctestfailed%2Cbusted%2Cexception&revision=ca01f11722729d60639e56c7aeb6ea26a1b2d78c
Apilint log: https://treeherder.mozilla.org/logviewer.html#?job_id=258329486&repo=mozilla-beta
/builds/worker/workspace/build/src/mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoAppShell.java:40: error: cannot find symbol
TEST-UNEXPECTED-FAIL | method public static void killContentProcess() | Method missing threading annotation. Needs one of: @MainThread, @UiThread, @WorkerThread, @BinderThread, @AnyThread.
Please fix the issue and resubmit.
Assignee | ||
Comment 28•5 years ago
|
||
Just pushed. Not sure if it's right still, so please let me know!
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 29•5 years ago
|
||
Ryan, Alvina has a patch up for beta here https://phabricator.services.mozilla.com/D39732
Updated•5 years ago
|
Comment 31•5 years ago
|
||
Looks like the apilint job at least is still broken with the Beta patch:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=258849247&repo=mozilla-beta&lineNumber=2968
Comment 32•5 years ago
•
|
||
That is just a warning (confusing I know, it's fixed in m-c) the actual problem is here: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=258849247&repo=mozilla-beta&lineNumber=2979-2984
[task 2019-07-29T19:02:11.507Z] 19:02:11 INFO - ERROR: The api changelog file is out of date. Please update the file at
[task 2019-07-29T19:02:11.508Z] 19:02:11 INFO - /builds/worker/workspace/build/src/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/doc-files/CHANGELOG.md
[task 2019-07-29T19:02:11.508Z] 19:02:11 INFO - and then modify the [api-version] line as following:
[task 2019-07-29T19:02:11.508Z] 19:02:11 INFO - >>>>
[task 2019-07-29T19:02:11.508Z] 19:02:11 INFO - [api-version]: 7b53d1a6703cc7f613de81c422dccdd95b6153fe
[task 2019-07-29T19:02:11.508Z] 19:02:11 INFO - <<<<
Comment 33•5 years ago
|
||
Alvina has addressed the apilint issue. Agi will land Alvina's patch in beta.
Updated•5 years ago
|
Comment 34•5 years ago
|
||
uplift |
Comment 35•5 years ago
|
||
backout |
Backed out on beta for android build bustages at ContentDelegateTest.k
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=259414710&repo=mozilla-beta&lineNumber=34826
Backout: https://hg.mozilla.org/releases/mozilla-beta/rev/151fea2bb6b936b037e1a5c1946cc101f769bd42
Updated•5 years ago
|
Comment 36•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 37•5 years ago
|
||
uplift |
Comment 38•2 years ago
|
||
Moving content process management bugs to the new GeckoView::Sandboxing component.
Description
•