Closed Bug 1630229 Opened 3 months ago Closed 2 months ago

Crash in [@ java.lang.AssertionError: at org.mozilla.gecko.process.GeckoProcessManager$ChildConnection.bind(GeckoProcessManager.java)]

Categories

(GeckoView :: General, defect, P1)

77 Branch
Unspecified
Android

Tracking

(firefox75 unaffected, firefox76 unaffected, firefox77 fixed, firefox78 fixed)

RESOLVED FIXED
mozilla78
Tracking Status
firefox75 --- unaffected
firefox76 --- unaffected
firefox77 --- fixed
firefox78 --- fixed

People

(Reporter: fluffyemily, Assigned: aklotz)

References

Details

(Keywords: crash, Whiteboard: [geckoview:m77][geckoview:m78])

Crash Data

Attachments

(8 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

This bug is for crash report bp-8479b20a-ebbf-4cad-a4bc-424330200415.

Java stack trace:

java.lang.AssertionError
	at org.mozilla.gecko.process.GeckoProcessManager$ChildConnection.bind(GeckoProcessManager.java:19)
	at org.mozilla.gecko.process.GeckoProcessManager.start(GeckoProcessManager.java:7)
	at org.mozilla.gecko.process.GeckoProcessManager.lambda$start$5(GeckoProcessManager.java:2)
	at org.mozilla.gecko.process.-$$Lambda$GeckoProcessManager$49HYY0DiXCj5gwg0g9ecfGUvYqc.run(Unknown Source:20)
	at org.mozilla.gecko.util.XPCOMEventTarget$JNIRunnable.run(XPCOMEventTarget.java:1)

Aaron, this isn't happening a ton, but it seems like it may be the new signature for our old friend.

Flags: needinfo?(aklotz)
Priority: -- → P2

My best guess here is that we need other processes besides content processing calling markAsDead, otherwise Gecko tries to restart the non-content process before the launcher thread is able to process the notification that the binder connection was lost.

Flags: needinfo?(aklotz)
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Priority: P2 → P1
Whiteboard: [geckoview:m77]

When a GMP process's top-level actor is destroyed, we need to relay that
information up to GeckoView's Java layer. We only want to do this when the
process had previously been successfully started.

When the Socket process's top-level actor is destroyed, we need to relay that
information up to GeckoView's Java layer.

Depends on D71405

Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/76a6d3885038
Part 1 - Ensure that GMP process notifies GeckoView when its actor is destroyed; r=jya
https://hg.mozilla.org/integration/autoland/rev/c72a435a5f5c
Part 2 - Ensure that the socket process notifies GeckoView when its actor is destroyed; r=necko-reviewers,valentin
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77

Looks like we're still seeing this. Reopening.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [geckoview:m77] → [geckoview:m77][geckoview:m78]
Keywords: leave-open
Attachment #9144173 - Attachment description: Bug 1630229: Include process type in AssertionError message; r=#geckoview-reviewers → Bug 1630229: Part 3 - Include process type in AssertionError message; r=#geckoview-reviewers
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/877d47b23fff
Part 3 - Include process type in AssertionError message; r=geckoview-reviewers,agi
  1. GeckoProcessManager.ConnectionManager.onStartComplete is called later than
    it ideally should be; it would be better to do this as soon as binding is
    complete, rather than as soon as start is complete. To accomplish this:
  • We rename onStartComplete to onBindComplete and call it as soon as we
    have successfully bound.
  • We call IChildProcess.getPid as soon as we're bound and immediately clean
    up if that fails.
  • This implies that getPid should always have a pid and should not need to
    call into IChildProcess during the remaining lifetime of the connection.
    This allows us to eliminate exception throwing from getPid, and thus we may
    also remove getPidFallible.
  • This also means that we no longer need to explicitly call getPid in
    GeckoProcessManager.preload.
  1. We also use XPCOMEventTarget.runOnLauncherThread so that we do not need to
    bounce through the launcher thread's event queue unnecessarily.

  2. I noticed that we do not unbind the connection if the start fails but we
    are not retrying. We should be unbinding regardless of whether we are going
    to retry.

Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/73098ca189ff
Part 4 - Register ChildConnection after bind instead of start; r=geckoview-reviewers,snorp
Crash Signature: [@ java.lang.AssertionError: at org.mozilla.gecko.process.GeckoProcessManager$ChildConnection.bind(GeckoProcessManager.java)] → [@ java.lang.AssertionError: at org.mozilla.gecko.process.GeckoProcessManager$ChildConnection.bind(GeckoProcessManager.java)] [@ java.lang.AssertionError: at org.mozilla.gecko.process.ServiceAllocator$InstanceInfo.bindService(ServiceAllocator.java)]

Currently unbind() is more or less synchronous, but since those semantics may
change in the future, for the sake of cleanliness we should ensure that our
reaction is linked to unbindResult.

Since this is so common, let's reserve the AssertionError for debug builds
and let our normal exception handling machinery deal with this failure.
GeckoProcessManager will retry unbinding and restarting the process, which
hopefully will mitigate the problem.

Depends on D75414

I'm wondering whether Gecko's response to noticing that the child process was
killed is causing some notifications on the launcher thread to arrive out of
order from others. Removing the kill since we can just let Android handle
process termination.

Depends on D75415

Attachment #9149154 - Attachment description: Bug 1630229: Part 6 - When attempting to bind a defunct connection, throw an IllegalStateException in non-debug builds; r=#geckoview-reviewers → Bug 1630229: Part 6 - When attempting to bind a defunct connection, throw a BindException in non-debug builds; r=#geckoview-reviewers
Attachment #9149155 - Attachment description: Bug 1630229: Part 7 - Remove explcit kill from GeckoProcessManager and just let Android do its thing; r=#geckoview-reviewers → Bug 1630229: Part 7 - Remove explicit kill from GeckoProcessManager and just let Android do its thing; r=#geckoview-reviewers
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0caead34be39
Part 5 - GeckoProcessManager: When start fails, chain resolution of the result promise to the bind shutdown promise; r=geckoview-reviewers,agi
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/42909f2f05d7
Part 6 - When attempting to bind a defunct connection, throw a BindException in non-debug builds; r=geckoview-reviewers,agi
https://hg.mozilla.org/integration/autoland/rev/0dbf4eb14160
Part 7 - Remove explicit kill from GeckoProcessManager and just let Android do its thing; r=geckoview-reviewers,agi

Will need Beta 77 uplift so we can pick this up in Fenix 5.1.

Keywords: leave-open

Comment on attachment 9146957 [details]
Bug 1630229: Part 4 - Register ChildConnection after bind instead of start; r=#geckoview-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes on Android
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Scope of changes is narrow and have been baking on Nightly for a few days.
  • String changes made/needed: None
Attachment #9146957 - Flags: approval-mozilla-beta?
Attachment #9149153 - Flags: approval-mozilla-beta?
Attachment #9149154 - Flags: approval-mozilla-beta?
Attachment #9149155 - Flags: approval-mozilla-beta?

Comment on attachment 9146957 [details]
Bug 1630229: Part 4 - Register ChildConnection after bind instead of start; r=#geckoview-reviewers

Fix for a Fenix top crash, uplift approved for 77 beta, thanks.

Attachment #9146957 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9149153 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9149154 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9149155 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Hi!
While testing Fenix Beta Build 5.1.0-beta.1 with Samsung Galaxy Tab S3 (Android 8), I received a crash, crash report: https://crash-stats.mozilla.org/report/index/458c7010-2311-41a4-9473-ec1ee0200520, that is indicated as being related with this issue.

This was reproducible only once.

Steps to reproduce:

  1. Make Fenix your default browser;
  2. Leave Fenix in background for 10 minutes;
  3. Go to Gmail and tap on a link;

Expected result:
Custom tab opens.

Actual result:
Fenix crashed with the crash report https://crash-stats.mozilla.org/report/index/458c7010-2311-41a4-9473-ec1ee0200520.

Flags: needinfo?(aklotz)

Thank you for the STR! That's very helpful! That particular signature should go away in subsequent builds, but this is an excellent indicator as to the root cause. I will investigate further.

Flags: needinfo?(aklotz)
Regressions: 1639435

Ugh! I'm just going to remove that AssertionError for now and add more stuff for investigating the root cause in follow-up bugs.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I'm keeping this patch as simple as possible so that we can uplift to beta.
I'll sort out the rest of this in follow-up bugs.

Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e562d64f2bf9
Part 8 - Stop throwing AssertionError for binding of defunct ServiceAllocator.InstanceInfo; r=geckoview-reviewers,owlish

Backed out changeset e562d64f2bf9 (bug 1630229) for lints failure on ServiceAllocator.java

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedTaskRun=bLpGoMMYRTC2ZLtZ7PTY5g-0&fromchange=e562d64f2bf9a26db09e338b701f98d11e1d2629&searchStr=linting%2Copt%2Candroid%2Cgradle%2Ctests%2Csource-test-mozlint-android-lints%2Ca%28lints%29&tochange=c8e34e81ac8a1e359a64cc94ebeeb9132989a413

Backout link: https://hg.mozilla.org/integration/autoland/rev/c8e34e81ac8a1e359a64cc94ebeeb9132989a413

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=303253758&repo=autoland&lineNumber=947

[task 2020-05-21T18:37:32.766Z] > Task :geckoview:javadocWithGeckoBinariesDebug
[task 2020-05-21T18:37:32.766Z] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf-8
[task 2020-05-21T18:37:32.766Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlocking.java:355: warning - Tag @link: reference not found: STP
[task 2020-05-21T18:37:32.766Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlocking.java:223: warning - Tag @link: reference not found: setAntiTracking
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlockingController.java:270: warning - Tag @link: reference not found: saveExceptionList
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlockingController.java:280: warning - Tag @link: reference not found: saveExceptionList
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlockingController.java:76: warning - Tag @link: reference not found: toJson
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlockingController.java:368: warning - Tag @link: reference not found: COOKIES_LOADED
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlockingController.java:361: warning - Tag @link: reference not found: COOKIES_LOADED
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlockingController.java:361: warning - Tag @link: reference not found: COOKIES_LOADED
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlockingController.java:368: warning - Tag @link: reference not found: COOKIES_LOADED
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlockingController.java:142: warning - Tag @link: reference not found: toString
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ContentBlockingController.java:154: warning - Tag @link: reference not found: toJson
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/GeckoSession.java:3372: warning - Tag @link: reference not found: SelectionActionDelegateAction
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/GeckoSession.java:3361: warning - Tag @link: reference not found: SelectionActionDelegateAction
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/GeckoSession.java:3372: warning - Tag @link: reference not found: SelectionActionDelegateAction
[task 2020-05-21T18:37:32.767Z] /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/GeckoSession.java:3372: warning - Tag @link: reference not found: SelectionActionDelegateAction
[task 2020-05-21T18:37:32.767Z] 15 warnings
[task 2020-05-21T18:37:32.768Z] > Task :geckoview:javadocJarWithGeckoBinariesDebug
[task 2020-05-21T18:37:32.768Z] > Task :geckoview:sourcesJarWithGeckoBinariesDebug
[task 2020-05-21T18:37:32.768Z] > Task :geckoview:publishWithGeckoBinariesDebugPublicationToMavenRepository
[task 2020-05-21T18:37:32.768Z] Deprecated Gradle features were used in this build, making it incompatible with Gradle 6.0.
[task 2020-05-21T18:37:32.768Z] Use '--warning-mode all' to show the individual deprecation warnings.
[task 2020-05-21T18:37:32.768Z] See https://docs.gradle.org/5.1.1/userguide/command_line_interface.html#sec:command_line_warnings
[task 2020-05-21T18:37:32.768Z] BUILD SUCCESSFUL in 1m 34s
[task 2020-05-21T18:37:32.768Z] 173 actionable tasks: 158 executed, 15 up-to-date
[task 2020-05-21T18:37:32.784Z] Packaging quitter@mozilla.org.xpi...
[task 2020-05-21T18:37:32.980Z] Automation steps completed.
[task 2020-05-21T18:37:32.981Z] 0 compiler warnings present.
[task 2020-05-21T18:37:33.013Z] Overall system resources - Wall time: 194s; CPU: 0%; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2020-05-21T18:37:33.040Z] Your build was successful!
[task 2020-05-21T18:37:33.041Z]  Config object not found by mach.
[task 2020-05-21T18:37:33.041Z] Configure complete!
[task 2020-05-21T18:37:33.041Z] Be sure to run |mach build| to pick up any changes
[task 2020-05-21T18:37:33.041Z] To view resource usage of the build, run |mach resource-usage|.
[task 2020-05-21T18:37:33.041Z] For more information on what to do now, see https://developer.mozilla.org/docs/Developer_Guide/So_You_Just_Built_Firefox
[task 2020-05-21T18:37:33.084Z] + ./mach --log-no-times lint -f treeherder -f json:/builds/worker/mozlint.json --linter android-api-lint --linter android-javadoc --linter android-checkstyle --linter android-lint --linter android-test
[task 2020-05-21T18:39:41.059Z] TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/gecko/process/ServiceAllocator.java:7:8 | Unused import - org.mozilla.geckoview.BuildConfig. (com.puppycrawl.tools.checkstyle.checks.imports.UnusedImportsCheck)
[fetches 2020-05-21T18:39:41.096Z] removing /builds/worker/fetches
[fetches 2020-05-21T18:39:42.763Z] finished
Flags: needinfo?(aklotz)
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9c5b9c33920b
Part 8 - Stop throwing AssertionError for binding of defunct ServiceAllocator.InstanceInfo; r=geckoview-reviewers,owlish
Flags: needinfo?(aklotz)

Comment on attachment 9150780 [details]
Bug 1630229: Part 8 - Stop throwing AssertionError for binding of defunct ServiceAllocator.InstanceInfo; r=#geckoview-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Trivial patch
  • String changes made/needed:
Attachment #9150780 - Flags: approval-mozilla-beta?
Blocks: 1639964
Status: REOPENED → ASSIGNED
Target Milestone: mozilla77 → ---

Comment on attachment 9150780 [details]
Bug 1630229: Part 8 - Stop throwing AssertionError for binding of defunct ServiceAllocator.InstanceInfo; r=#geckoview-reviewers

Fixes a GV crash, approved for 77.0b9.

Attachment #9150780 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Status: ASSIGNED → RESOLVED
Closed: 2 months ago2 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78

Hi! This issue is verified as fixed, since we did not encounter any crashes related with this it while testing Fenix Beta 5.1.0-beta.2.

You need to log in before you can comment on or make changes to this bug.