Closed Bug 1777710 Opened 3 years ago Closed 2 years ago

Intermittent java.lang.SecurityException: Package org.mozilla.fenix has not requested permission android.permission.WRITE_EXTERNAL_STORAGE

Categories

(Testing :: Raptor, defect, P2)

Unspecified
Android
defect

Tracking

(firefox106 wontfix, firefox107 wontfix, firefox108 wontfix, firefox109 fixed)

RESOLVED FIXED
Tracking Status
firefox106 --- wontfix
firefox107 --- wontfix
firefox108 --- wontfix
firefox109 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: sparky)

References

Details

(Keywords: intermittent-failure, Whiteboard: [relops-android][geckoview:2022q4] [geckoview:m108] [stockwell unknown] [geckoview:m109?])

Attachments

(1 file)

Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=383119345&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/X4SmTHl9Rf2Rgebup3rW8g/runs/0/artifacts/public/logs/live_backing.log


[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  adb Granting important runtime permissions to org.mozilla.fenix
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  adb Unable to grant runtime permission android.permission.WRITE_EXTERNAL_STORAGE to org.mozilla.fenix due to args: adb wait-for-device shell su -c 'pm grant org.mozilla.fenix android.permission.WRITE_EXTERNAL_STORAGE', exitcode: 255, stdout:
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  Exception occurred while executing 'grant':
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  java.lang.SecurityException: Package org.mozilla.fenix has not requested permission android.permission.WRITE_EXTERNAL_STORAGE
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at com.android.server.pm.permission.BasePermission.enforceDeclaredUsedAndRuntimeOrDevelopment(BasePermission.java:453)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at com.android.server.pm.permission.PermissionManagerService.grantRuntimePermissionInternal(PermissionManagerService.java:1538)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at com.android.server.pm.permission.PermissionManagerService.grantRuntimePermissionInternal(PermissionManagerService.java:1492)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at com.android.server.pm.permission.PermissionManagerService.grantRuntimePermission(PermissionManagerService.java:1484)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at com.android.server.pm.PackageManagerShellCommand.runGrantRevokePermission(PackageManagerShellCommand.java:2325)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at com.android.server.pm.PackageManagerShellCommand.onCommand(PackageManagerShellCommand.java:259)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at android.os.BasicShellCommandHandler.exec(BasicShellCommandHandler.java:98)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at android.os.ShellCommand.exec(ShellCommand.java:44)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at com.android.server.pm.PackageManagerService.onShellCommand(PackageManagerService.java:26546)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at android.os.Binder.shellCommand(Binder.java:965)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at android.os.Binder.onTransact(Binder.java:839)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at android.content.pm.IPackageManager$Stub.onTransact(IPackageManager.java:5206)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at com.android.server.pm.PackageManagerService.onTransact(PackageManagerService.java:5247)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at android.os.Binder.execTransactInternal(Binder.java:1195)
[task 2022-07-01T14:41:27.382Z] 14:41:24     INFO -  	at android.os.Binder.execTransact(Binder.java:1159)

The Fenix components are only for crashes or security bugs. This will not get any action sitting in the Fenix product. This looks to be a test from the perf team so they are best to look at this. If needed a issue should be opened in https://github.com/mozilla-mobile/fenix/issues

Component: Security: Android → Performance
Product: Fenix → Core
Component: Performance → Raptor
Product: Core → Testing
Severity: S4 → S2
Priority: P5 → P2
See Also: → 1768889

The WRITE_EXTERNAL_STORAGE failure is a red herring, it happens in all tests. The failure or the crash that is found in the logcat logs (that autopush really needs to be disabled):

07-01 14:45:14.206 25151   795 I AutoPushFeature: Trying to check validity of push subscriptions.
07-01 14:45:14.206 25151   795 I AutoPushFeature: Checking now..
07-01 14:45:14.209 25151   795 W App     : Non-fatal error occurred in AutoPushFeature.
07-01 14:45:14.209 25151   795 W App     : mozilla.appservices.push.PushException$CommunicationException: Communication Error: "Connection is unauthorized"
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.FfiConverterTypePushError.read(push.kt:16)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.FfiConverterTypePushError.read(push.kt:1)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.FfiConverter$DefaultImpls.liftFromRustBuffer(push.kt:2)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.FfiConverterRustBuffer$DefaultImpls.liftFromRustBuffer(push.kt:1)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.FfiConverterTypePushError.liftFromRustBuffer(push.kt:2)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.FfiConverterTypePushError.liftFromRustBuffer(push.kt:1)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.FfiConverterRustBuffer$DefaultImpls.lift(push.kt:1)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.FfiConverterTypePushError.lift(push.kt:3)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.FfiConverterTypePushError.lift(push.kt:2)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.PushException$ErrorHandler.lift(push.kt:2)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.PushException$ErrorHandler.lift(push.kt:1)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.appservices.push.PushManager.verifyConnection(push.kt:17)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.components.feature.push.RustPushConnection.verifyConnection(Connection.kt:3)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.components.feature.push.AutoPushFeature.verifyActiveSubscriptions$feature_push_release(AutoPushFeature.kt:6)
07-01 14:45:14.209 25151   795 W App     : 	at mozilla.components.feature.push.AutoPushFeature$tryVerifySubscriptions$1.invokeSuspend(AutoPushFeature.kt:15)
07-01 14:45:14.209 25151   795 W App     : 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:4)
07-01 14:45:14.209 25151   795 W App     : 	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:18)
07-01 14:45:14.209 25151   795 W App     : 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
07-01 14:45:14.209 25151   795 W App     : 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
07-01 14:45:14.209 25151   795 W App     : 	at java.lang.Thread.run(Thread.java:923)

Which I think is being caused by the tcp issue that is found across the logs:

07-01 14:45:20.081  6185  6185 E adbd    : failed to connect to socket 'tcp:2829': Connection refused

And that is preceded by this and other things which don't look right:

07-01 14:36:41.198  4930  6097 W AdbDebuggingManager: Read failed with count -1
07-01 14:36:41.198  4930  4944 D ActivityManager: attachApplicationLocked() app=ProcessRecord{1cc9dc 6105:com.samsung.android.honeyboard/u0a175} app.isolatedEntryPoint=null instr2=null
07-01 14:36:41.198  5274  5274 D SystemUIPostBootTiming: StartPostServicescom.android.systemui.usb.StorageNotification took to complete: 63ms
07-01 14:36:41.200  4930  6097 D AdbDebuggingManager: Closing socket
07-01 14:36:41.201  4930  6097 D AdbDebuggingManager: Creating socket
07-01 14:36:41.203  4930  6097 E AdbDebuggingManager: Caught an exception opening the socket: java.io.IOException: No such file or directory
07-01 14:36:41.203  4930  6097 D AdbDebuggingManager: Closing socket

:aerickson, can you check with bitbar about this issue? It seems to happen intermittently and orange factor is busted for the fenix branch so I can't easily say if it's restricted to only some of the devices.

Here's a link to a passing logcat log: https://firefoxci.taskcluster-artifacts.net/Ym7elw-CRLCx3RfdhqtWYg/0/public/test_info/logcat-R58R355CA0D.log
And here's a link to a failing logcat log: https://firefoxci.taskcluster-artifacts.net/X4SmTHl9Rf2Rgebup3rW8g/0/public/test_info/logcat-RR8NA0BELHX.log

Flags: needinfo?(aerickson)

The TCP socket port that Marionette is using is 2829 (which is the connection refused port): https://searchfox.org/mozilla-central/source/testing/geckodriver/src/android.rs#13

Ok, here's the failure that is killing Fenix:

07-01 14:43:12.852  4930  5731 I GenerationRegistry: mBackingStore.isClosed() : false
07-01 14:43:12.878 12914 24965 I Gecko   : 1656686592878	Marionette	INFO	Marionette enabled
07-01 14:43:12.879 12914 24969 I AutofillCCAddressesStorage: 'Warming up storage' took 28 ms
07-01 14:43:12.882 12914 24965 I Gecko   : 1656686592882	RemoteAgent	DEBUG	Setting recommended pref browser.search.update to false
07-01 14:43:12.882 12914 24965 I Gecko   : 1656686592882	RemoteAgent	DEBUG	Setting recommended pref browser.tabs.disableBackgroundZombification to false
07-01 14:43:12.883 12914 24965 I Gecko   : 1656686592883	RemoteAgent	DEBUG	Setting recommended pref network.http.prompt-temp-redirect to false
07-01 14:43:12.884 12914 24965 I Gecko   : 1656686592884	RemoteAgent	DEBUG	Setting recommended pref signon.autofillForms to false
07-01 14:43:12.885 12914 24965 I Gecko   : 1656686592885	Marionette	TRACE	Received observer notification final-ui-startup
07-01 14:43:12.886 12914 24942 I libglean_ffi: glean_core::ping: Collecting baseline
07-01 14:43:12.887 12914 24965 I Gecko   : 1656686592887	Marionette	INFO	Listening on port 2829
07-01 14:43:12.888 12914 24965 I Gecko   : 1656686592888	Marionette	DEBUG	Marionette is listening
07-01 14:43:12.888 12914 24942 I libglean_ffi: glean_core::upload::directory: Processing ping at: /data/user/0/org.mozilla.fenix/glean_data/pending_pings/5aece042-6b17-4788-8c46-ea875d9db4a1
07-01 14:43:12.889 12914 24942 I libglean_ffi: glean_core::metrics::ping: The ping 'baseline' was submitted and will be sent as soon as possible
07-01 14:43:12.889 12914 24969 I FenixApplication: 'Storage initialization' took 373 ms
07-01 14:43:12.895  4930  6812 D PackageManager: setEnabledSetting : userId = 0 packageName = org.mozilla.fenix cmp = androidx.work.impl.background.systemalarm.RescheduleReceiver newState = 1 callingPackage = 10270/org.mozilla.fenix
07-01 14:43:12.895 12914 12914 I StartupTypeTelemetry: Recorded start up: cold_view
07-01 14:43:12.898  4930  4995 D FastTrack: setFastTrackState: pid = 12914 ;tid= 25015 ;value= 1
07-01 14:43:12.901  4930  6812 D InputTransport: Input channel constructed: '612380c', fd=505
07-01 14:43:12.901  4930  6812 D InputTransport: Input channel constructed: '612380c', fd=589
07-01 14:43:12.902  4608  4625 I SurfaceFlinger: id=131 createSurf (0x0),-1 flag=80004, 612380c org.mozilla.fenix/org.mozilla.fenix.HomeActivity#0
07-01 14:43:12.903  4930  6812 D WindowManager: isScreenshotDisabledLocked: userId = 0, disabled =false
07-01 14:43:12.903  4930  4964 D SurfaceControl: hide : mNativeObject = 530667138016 - sc.mNativeObject = 528534107712 - Surface(name=Input Consumer recents_animation_input_consumer)/@0x37fa39a
07-01 14:43:12.903  4930  6812 D InputTransport: Input channel destroyed: '612380c', fd=589
07-01 14:43:12.903  4930  4964 D SurfaceControl: nativeSetFlags Done : Surface(name=Input Consumer recents_animation_input_consumer)/@0x37fa39a
07-01 14:43:12.903 12914 12914 D InputTransport: Input channel constructed: '612380c', fd=142
07-01 14:43:12.906 12914 12914 I ViewRootImpl@7ec9d6e[HomeActivity]: setView = com.android.internal.policy.DecorView@f9bbd12 TM=true
07-01 14:43:12.920 12914 12914 E ExceptionHandler: Uncaught exception handled: 
07-01 14:43:12.920 12914 12914 E ExceptionHandler: java.security.UnrecoverableKeyException: Failed to obtain information about key
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at android.security.keystore.AndroidKeyStoreProvider.getKeyCharacteristics(AndroidKeyStoreProvider.java:252)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at android.security.keystore.AndroidKeyStoreProvider.loadAndroidKeyStoreKeyFromKeystore(AndroidKeyStoreProvider.java:378)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at android.security.keystore.AndroidKeyStoreSpi.engineGetKey(AndroidKeyStoreSpi.java:120)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at java.security.KeyStore.getKey(KeyStore.java:1062)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at mozilla.components.lib.dataprotect.Keystore.getKey(Keystore.kt:5)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at mozilla.components.lib.dataprotect.SecurePreferencesImpl23.generateManagedKeyIfNecessary(SecureAbove22Preferences.kt:2)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at mozilla.components.lib.dataprotect.SecurePreferencesImpl23.getString(SecureAbove22Preferences.kt:1)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at mozilla.components.lib.dataprotect.SecureAbove22Preferences.getString(SecureAbove22Preferences.kt:1)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at mozilla.components.service.fxa.SecureAbove22AccountStorage.read(AccountStorage.kt:1)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at mozilla.components.service.fxa.manager.FxaAccountManager$accountOnDisk$2.invoke(FxaAccountManager.kt:6)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:5)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at mozilla.components.service.fxa.manager.FxaAccountManager.internalStateSideEffects(FxaAccountManager.kt:121)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at mozilla.components.service.fxa.manager.FxaAccountManager.stateActions(FxaAccountManager.kt:27)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at mozilla.components.service.fxa.manager.FxaAccountManager.access$processQueue(FxaAccountManager.kt:48)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at mozilla.components.service.fxa.manager.FxaAccountManager$start$2.invokeSuspend(FxaAccountManager.kt:4)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:4)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:18)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at java.lang.Thread.run(Thread.java:923)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	Suppressed: kotlinx.coroutines.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@90a9d5d, Dispatchers.Main]
07-01 14:43:12.920 12914 12914 E ExceptionHandler: Caused by: android.security.KeyStoreException: Key not found
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at android.security.KeyStore.getKeyStoreException(KeyStore.java:1432)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	at android.security.keystore.AndroidKeyStoreProvider.getKeyCharacteristics(AndroidKeyStoreProvider.java:254)
07-01 14:43:12.920 12914 12914 E ExceptionHandler: 	... 19 more
07-01 14:43:12.923 12914 12914 I mozac/CrashReporter: Received crash: UncaughtExceptionCrash(timestamp=1656686592920, throwable=java.security.UnrecoverableKeyException: Failed to obtain information about key, breadcrumbs=[Breadcrumb(message=onCreate(), data={recreated=false, intent=android.intent.action.VIEW, instance=38852305}, category=HomeActivity, level=INFO, type=DEFAULT, date=Fri Jul 01 14:43:12 UTC 2022), Breadcrumb(message=Changing to fragment browserFragment, isCustomTab: false, data={}, category=DestinationChanged, level=INFO, type=DEFAULT, date=Fri Jul 01 14:43:12 UTC 2022), Breadcrumb(message=onAttach(), data={instance=100068553, activityInstance=38852305, activityName=HomeActivity}, category=BrowserFragment, level=INFO, type=DEFAULT, date=Fri Jul 01 14:43:12 UTC 2022), Breadcrumb(message=onCreateView(), data={customTabSessionId=null, instance=100068553, activityInstance=38852305, activityName=HomeActivity}, category=BrowserFragment, level=INFO, type=DEFAULT, date=Fri Jul 01 14:43:12 UTC 2022), Breadcrumb(message=onStart(), data={instance=38852305}, category=HomeActivity, level=INFO, type=DEFAULT, date=Fri Jul 01 14:43:12 UTC 2022), Breadcrumb(message=onResume(), data={instance=38852305}, category=HomeActivity, level=INFO, type=DEFAULT, date=Fri Jul 01 14:43:12 UTC 2022)], uuid=d336d702-b47b-4c60-a970-9bbc86b253f4)
07-01 14:43:12.938  4930  6812 D StorageManagerService: getExternalStorageMountMode: uid=10270 packageName=org.mozilla.fenix
07-01 14:43:12.938  4930  6812 D CompatibilityChangeReporter: Compat change id reported: 135634846; UID 10270; state: DISABLED
07-01 14:43:12.939  4930  4984 D CompatibilityChangeReporter: Compat change id reported: 143937733; UID 10270; state: ENABLED
07-01 14:43:12.953  4499  4499 D Zygote  : Forked child process 25118
07-01 14:43:12.955  4930  4984 I ActivityManager: Start proc 25118:org.mozilla.fenix:gpu/u0a270 for service {org.mozilla.fenix/org.mozilla.gecko.process.GeckoChildProcessServices$gpu}
07-01 14:43:12.960 12914 12914 I Process : Sending signal. PID: 12914 SIG: 9
07-01 14:43:12.981 25118 25118 E zilla.fenix:gp: Not starting debugger since process cannot load the jdwp agent.
07-01 14:43:12.987 25118 25118 D ActivityThread: setConscryptValidator
07-01 14:43:12.987 25118 25118 D ActivityThread: setConscryptValidator - put
07-01 14:43:12.989  4930  6812 I ActivityManager: DSS OFF for org.mozilla.fenix
07-01 14:43:12.990  4930  6812 D ActivityManager: attachApplicationLocked() app=ProcessRecord{35151d1 25118:org.mozilla.fenix:gpu/u0a270} app.isolatedEntryPoint=null instr2=null
07-01 14:43:12.991  4930  6812 D CompatibilityChangeReporter: Compat change id reported: 136274596; UID 10270; state: ENABLED
07-01 14:43:12.994 25118 25118 D ActivityThread: handleBindApplication()++ app=org.mozilla.fenix:gpu
07-01 14:43:13.006  4930  7295 D ConnectivityService: ConnectivityService NetworkRequestInfo binderDied(NetworkRequest [ TRACK_DEFAULT id=79, [ Capabilities: INTERNET&NOT_RESTRICTED&TRUSTED Uid: 10270 AdministratorUids: [] RequestorUid: 10270 RequestorPackageName: org.mozilla.fenix] ], android.os.BinderProxy@80fbf10)
07-01 14:43:13.006  4930  5132 D ConnectivityService: releasing NetworkRequest [ TRACK_DEFAULT id=79, [ Capabilities: INTERNET&NOT_RESTRICTED&TRUSTED Uid: 10270 AdministratorUids: [] RequestorUid: 10270 RequestorPackageName: org.mozilla.fenix] ] (release request)
07-01 14:43:13.007  4930  5062 D PowerManagerService: [api] handleWakeLockDeath : release WakeLock : PARTIAL_WAKE_LOCK              'fiid-sync' ACQ=-903ms (uid=10270 pid=12914 pkg=org.mozilla.fenix)
07-01 14:43:13.008  4499  4499 I Zygote  : Process 12914 exited due to signal 9 (Killed)
07-01 14:43:13.007  4930  5731 I WindowManager: WIN DEATH: Window{612380c u0 org.mozilla.fenix/org.mozilla.fenix.HomeActivity}

(removing the ni? for aerickson for now)

Flags: needinfo?(aerickson)

:kbrosnan/:jonalmeida, can either of you look into the crash in comment #9?

Flags: needinfo?(kbrosnan)
Flags: needinfo?(jonalmeida942)

Forwarding to Arturo or Jon to have a look.

Flags: needinfo?(kbrosnan) → needinfo?(amejiamarmol)

Thank you!

:aerickson, could you ask Bitbar to look into some of these a51 devices? I keep seeing Knox/Keystore issues in the failing logs and it seems to be preventing all connections. Here's a failing log with plenty of them: https://firefoxci.taskcluster-artifacts.net/ROBdVRAPTGmjTIdh3uSgHA/0/public/test_info/logcat-R58R35204PW.log

Flags: needinfo?(aerickson)
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → DUPLICATE

Are there specific devices to look at? I think they should be consistent. I don't think we've changed anything regarding knox on any of them.

What's the particular error you're concerned with? There's a ton of lines that mention knox or keystore. Is it the one below?

What jobs are failing (I'd like to check out one that's working also)?

Thanks.

08-30 21:40:06.360  4928  5649 E system_server: TimaKeyStore: TZ_KEYSTORE_initialize failed : 12
08-30 21:40:06.376  3180  3256 E g       : == KG Agent == TimaKeystore initialization error
08-30 21:40:06.376  3180  3256 E g       : java.security.cert.CertificateException: TimaKeystore initialization error
08-30 21:40:06.376  3180  3256 E g       : 	at com.sec.tima.TimaKeyStore.engineLoad(TimaKeyStore.java:348)
08-30 21:40:06.376  3180  3256 E g       : 	at java.security.KeyStore.load(KeyStore.java:1544)
08-30 21:40:06.376  3180  3256 E g       : 	at com.samsung.android.kgclient.d.g.a(Unknown Source:140)
08-30 21:40:06.376  3180  3256 E g       : 	at com.samsung.android.kgclient.d.d.getPrivateKey(Unknown Source:58)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.NativeSsl.setCertificate(NativeSsl.java:241)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.NativeSsl.chooseClientCertificate(NativeSsl.java:230)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.ConscryptEngine.clientCertificateRequested(ConscryptEngine.java:1660)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.NativeCrypto.ENGINE_SSL_read_direct(Native Method)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.NativeSsl.readDirectByteBuffer(NativeSsl.java:568)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.ConscryptEngine.readPlaintextDataDirect(ConscryptEngine.java:1095)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.ConscryptEngine.readPlaintextData(ConscryptEngine.java:1079)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.ConscryptEngine.unwrap(ConscryptEngine.java:876)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.ConscryptEngine.unwrap(ConscryptEngine.java:747)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.ConscryptEngine.unwrap(ConscryptEngine.java:712)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket(ConscryptEngineSocket.java:849)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.access$100(ConscryptEngineSocket.java:722)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.ConscryptEngineSocket.doHandshake(ConscryptEngineSocket.java:238)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.org.conscrypt.ConscryptEngineSocket.startHandshake(ConscryptEngineSocket.java:217)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.io.RealConnection.connectTls(RealConnection.java:1450)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:1399)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.io.RealConnection.connect(RealConnection.java:1343)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:221)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:144)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:106)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:400)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:333)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:465)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.huc.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:131)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getOutputStream(HttpURLConnectionImpl.java:262)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getOutputStream(DelegatingHttpsURLConnection.java:219)
08-30 21:40:06.376  3180  3256 E g       : 	at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:30)
08-30 21:40:06.376  3180  3256 E g       : 	at com.samsung.android.kgclient.d.e.a(Unknown Source:370)
08-30 21:40:06.376  3180  3256 E g       : 	at com.samsung.android.kgclient.d.e.a(Unknown Source:90)
08-30 21:40:06.376  3180  3256 E g       : 	at com.samsung.android.kgclient.d.e.a(Unknown Source:71)
08-30 21:40:06.376  3180  3256 E g       : 	at com.samsung.android.kgclient.data.a.a(Unknown Source:73)
08-30 21:40:06.376  3180  3256 E g       : 	at com.samsung.android.kgclient.g.b.b(Unknown Source:226)
08-30 21:40:06.376  3180  3256 E g       : 	at com.samsung.android.kgclient.g.b.a(Unknown Source:71)
08-30 21:40:06.376  3180  3256 E g       : 	at com.samsung.android.kgclient.service.b.b(Unknown Source:151)
08-30 21:40:06.376  3180  3256 E g       : 	at com.samsung.android.kgclient.service.KGIntentService.onHandleIntent(Unknown Source:80)
08-30 21:40:06.376  3180  3256 E g       : 	at android.app.IntentService$ServiceHandler.handleMessage(IntentService.java:77)
08-30 21:40:06.376  3180  3256 E g       : 	at android.os.Handler.dispatchMessage(Handler.java:106)
08-30 21:40:06.376  3180  3256 E g       : 	at android.os.Looper.loop(Looper.java:246)
08-30 21:40:06.376  3180  3256 E g       : 	at android.os.HandlerThread.run(HandlerThread.java:67)
Flags: needinfo?(aerickson) → needinfo?(gmierz2)

Nvm, I think I was looking at a Pixel 2 log before because I'm seeing the same things in the passing logs for a51 :(

Here's the final section of the log in the failing run though: https://treeherder.mozilla.org/jobs?repo=fenix&revision=310a029c37f2c87d0bf33b31f87654bb39e3ff9b&selectedTaskRun=SNiUnZiaSI-eGFgXBFO2Zw.0

08-31 19:20:53.041 24707  1207 I Gecko   : 1661973653041	Marionette	DEBUG	0 <- [1,106,null,{"cause":"shutdown","forced":false}]
08-31 19:20:53.062 24707  1212 I SessionPrioritizationMiddleware: Update the tab  c4212ea9-97a9-425a-b62d-b5caa6fc2e47 priority to DEFAULT
08-31 19:20:53.074 24707  1214 I SessionStorage/AutoSave: Save: New tab selected
08-31 19:20:53.083  4592  4614 E BufferQueueProducer: [GeckoSurface#0](id:11f0000001f5,api:0,p:-1,c:4592) disconnect: not connected (req=1)
08-31 19:20:53.084  1346  1399 W libEGL  : EGLNativeWindowType 0xd113b008 disconnect failed
08-31 19:20:53.169  4304  4304 I io_stats: !@   8,0 r 245437 5154840 w 33141 1003812 d 19415 586716 f 16284 20350 iot 73404 64992 th 102400 0 0 pt 0 inp 0 0 1988.024
08-31 19:20:53.170 24707  1207 W google-breakpad: ExceptionHandler::GenerateDump cloned child 
08-31 19:20:53.170 24707  1207 W google-breakpad: 1819
08-31 19:20:53.170 24707  1207 W google-breakpad: 
08-31 19:20:53.171 24707  1207 W google-breakpad: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
08-31 19:20:53.171  1819  1207 W google-breakpad: ExceptionHandler::WaitForContinueSignal waiting for continue signal...
08-31 19:20:53.549  6460  6812 I SDHMS:G : SIOP:: AP:296(373,100) BAT:321(321,0) USB:0(0,0) CHG:367(367,0) WIFI:371(371,0) LRP:354(354,0) LRF:346(346) LRB:354(354) 
08-31 19:20:53.557  4927  4927 D WifiP2pService: GopsReceiver : received : com.samsung.intent.action.CHECK_SIOP_LEVEL
08-31 19:20:53.557  4927  4927 I WifiP2pService: siop_level = -1
08-31 19:20:53.560 15598 15641 I SystemBroadcastReceiver: [#CMH#] onReceive completed :  0
08-31 19:20:53.561 15598 15598 I PolicyManager: [#CMH#] onReceive action = EVENT_SIOP
08-31 19:20:53.562  4927  4927 D WifiP2pService: GopsReceiver : received : com.samsung.intent.action.CHECK_SIOP_LEVEL time : 5ms
08-31 19:20:53.597  4927  5301 V MdnieScenarioControlService: burn_in_prevention_mode() mBrightnessAllAppServerState : true , enabled : true, mBurnInDisableAppEnabled : false , mAutoBrightnessMode : false, mHighBrightnessModeEnabled : false , mBurnInPreventionACL : false , platform_brightness_value : 0 , BURN_IN_PREVENTION_SUPPORT_VALUE : 143 , mBurnInPreventionEnabled : false , mGameAppLauncher : false
08-31 19:20:53.597  4927  5301 V MdnieScenarioControlService: Disabled Burn-In Prevention Mode
08-31 19:20:53.598  4927  5301 D SemDisplaySolutionManagerService: uid : 1000 , pid : 4927 , name : burnin_prevention_off
08-31 19:20:53.599  4927  5301 D SemDisplaySolutionManagerService: where : application what : DEFAULT setScreenBrightnessScaleFactor(1.0) , mBrightnessCompensation : 1.0 , PMS value : 1.0
08-31 19:20:53.599  4927  5301 D PowerManagerService: [api] setScreenBrightnessScaleFactor: scaleFactor: 1.0 (uid: 1000 pid: 4927)
08-31 19:20:53.600  4927  5301 D SemDisplaySolutionManagerService: Normal Mode setMultipleScreenBrightness , application, DEFAULT, 1.0
08-31 19:20:53.690 21350 21350 I ContentCaptureService: unbinding from com.samsung.android.smartsuggestions.services.ContentCaptureService
08-31 19:20:53.710  4927  8178 I ActivityManager: Killing 28647:com.osp.app.signin/u0a106 (adj 999): empty #25
08-31 19:20:53.759  4491  4491 I Zygote  : Process 28647 exited due to signal 9 (Killed)
08-31 19:20:53.763  4927  4977 I libprocessgroup: Successfully killed process cgroup uid 10106 pid 28647 in 46ms
08-31 19:20:53.768  4631  4631 I CameraService: UidPolicy: No processes running for a uid=10106 disabled=0
08-31 19:20:53.781  5435  5736 D DhcpClient: Received packet: 6e:2c:6b:4e:fe:3a ACK: your new IP /10.7.205.110, netmask /255.255.254.0, gateways [/10.7.205.1] DNS servers: /10.7.20.108 /10.7.20.100 , lease time 600
08-31 19:20:53.850  4592  4592 I SurfaceFlinger: SFWD update time=1988705856692
08-31 19:20:55.180  4304  4304 I io_stats: !@   8,0 r 245996 5160812 w 33143 1003820 d 19415 586716 f 16284 20350 iot 73556 65157 th 102400 0 0 pt 0 inp 0 0 1990.034
08-31 19:20:55.197 24707 24707 I Choreographer: Skipped 119 frames!  The application may be doing too much work on its main thread.
08-31 19:20:55.197 24707  1372 I SessionStorage/AutoSave: Save: Periodic
08-31 19:20:55.219  4457  4457 E audit   : type=1701 audit(1661973655.215:4530): auid=4294967295 uid=10266 gid=10266 ses=4294967295 subj=u:r:untrusted_app:s0:c10,c257,c512,c768 pid=24707 comm="Gecko" exe="/system/bin/app_process32" sig=11 res=1
08-31 19:20:55.290  1559  1582 I Gecko   : Exiting due to channel error.
08-31 19:20:55.291  1278  1442 I Gecko   : Exiting due to channel error.
08-31 19:20:55.293  4927  5082 W InputDispatcher: channel '5c6c027 org.mozilla.fenix/org.mozilla.fenix.HomeActivity (server)' ~ Consumer closed input channel or an error occurred.  events=0x9, fd=647
08-31 19:20:55.293  4927  5082 E InputDispatcher: channel '5c6c027 org.mozilla.fenix/org.mozilla.fenix.HomeActivity (server)' ~ Channel is unrecoverably broken and will be disposed!
08-31 19:20:55.296  4927  6395 D InputTransport: Input channel destroyed: 'ClientS', fd=419
08-31 19:20:55.297  4927  6395 I ActivityManager: Process org.mozilla.fenix (pid 24707) has died: fg  TOP (316,1052)
08-31 19:20:55.297  4592  4614 I SurfaceFlinger: id=2587 Removed SurfaceView - org.mozilla.fenix/org.mozilla.fenix.HomeActivity@86df518@0#0 (69)
08-31 19:20:55.297  4592  4614 I SurfaceFlinger: id=2588 Removed Background for -SurfaceView - org.mozilla.fenix/org.mozilla.fenix.HomeActivity@86df518@0#0 (69)
08-31 19:20:55.297  4592  4614 I SurfaceFlinger: id=2591 Removed GeckoSurface#0 (69)
08-31 19:20:55.297  4927  6413 I WindowManager: WIN DEATH: Window{5c6c027 u0 org.mozilla.fenix/org.mozilla.fenix.HomeActivity}

And one from a passing test: https://treeherder.mozilla.org/jobs?repo=fenix&revision=310a029c37f2c87d0bf33b31f87654bb39e3ff9b&selectedTaskRun=L2QtZnQ0TXmBOOh7ScO_eQ.0

08-31 18:47:09.750 21152 30151 I Gecko   : 1661971629750	Marionette	DEBUG	0 <- [1,107,null,{"cause":"shutdown","forced":false}]
08-31 18:47:09.767 21152 30153 I SessionPrioritizationMiddleware: Update the tab  db0f3f49-6bec-4009-a991-6ddee5e6643a priority to DEFAULT
08-31 18:47:09.771 21152 30181 I SessionStorage/AutoSave: Save: New tab selected
08-31 18:47:09.785  4597  5219 E BufferQueueProducer: [GeckoSurface#0](id:11f50000032b,api:0,p:-1,c:4597) disconnect: not connected (req=1)
08-31 18:47:09.785 30294 30347 W libEGL  : EGLNativeWindowType 0xe9b96c08 disconnect failed
08-31 18:47:09.842  4310  4310 I io_stats: !@   8,0 r 227504 4899248 w 43835 1368384 d 155391 111740720 f 21524 26863 iot 202536 194362 th 102400 0 0 pt 0 inp 0 0 2806.657
08-31 18:47:09.849 21152 30151 W google-breakpad: ExceptionHandler::GenerateDump cloned child 
08-31 18:47:09.849 21152 30151 W google-breakpad: 30666
08-31 18:47:09.849 21152 30151 W google-breakpad: 
08-31 18:47:09.849 21152 30151 W google-breakpad: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
08-31 18:47:09.849 30666 30151 W google-breakpad: ExceptionHandler::WaitForContinueSignal waiting for continue signal...
08-31 18:47:09.884  4461  4461 E audit   : type=1400 audit(1661971629.879:7367): avc:  denied  { ioctl } for  pid=30667 comm="su" path="/dev/pts/0" dev="devpts" ino=3 ioctlcmd=0x5404 scontext=u:r:magisk_client:s0 tcontext=u:object_r:devpts:s0 tclass=chr_file permissive=0 SEPF_SM-A515F_11_0007 audit_filtered
08-31 18:47:09.884  4461  4461 E audit   : type=1300 audit(1661971629.879:7367): arch=c00000b7 syscall=29 success=no exit=-13 a0=0 a1=5404 a2=7fe4570030 a3=8 items=0 ppid=30664 pid=30667 auid=4294967295 uid=2000 gid=2000 euid=2000 suid=2000 fsuid=2000 egid=2000 sgid=2000 fsgid=2000 tty=pts0 ses=4294967295 comm="su" exe="/system/bin/magisk" subj=u:r:magisk_client:s0 key=(null)
08-31 18:47:09.885  4461  4461 E audit   : type=1327 audit(1661971629.879:7367): proctitle=7375002D6300616D20666F7263652D73746F70206F72672E6D6F7A696C6C612E66656E6978
08-31 18:47:09.949  4931  6803 I ActivityManager: Force stopping org.mozilla.fenix appid=10266 user=0: from pid 30674
08-31 18:47:09.951  4931  6803 I ActivityManager: Killing 30552:org.mozilla.fenix:tab10/u0a266 (adj 0): stop org.mozilla.fenix due to from pid 30674
08-31 18:47:09.955  4931  6803 D ActivityManager: proc ProcessRecord{1bbcd8b 30552:org.mozilla.fenix:tab10/u0a266} already removed. so we skip next process.
08-31 18:47:09.957  4931  6803 I ActivityManager: Killing 30230:org.mozilla.fenix:tab13/u0a266 (adj 900): stop org.mozilla.fenix due to from pid 30674
08-31 18:47:09.963  4931  6803 D ActivityManager: proc ProcessRecord{9a9cf59 30230:org.mozilla.fenix:tab13/u0a266} already removed. so we skip next process.
08-31 18:47:09.964  4931  6803 I ActivityManager: Killing 30480:org.mozilla.fenix:tab17/u0a266 (adj 0): stop org.mozilla.fenix due to from pid 30674
08-31 18:47:09.968  4931  6803 D ActivityManager: proc ProcessRecord{e96ec08 30480:org.mozilla.fenix:tab17/u0a266} already removed. so we skip next process.
08-31 18:47:09.968  4931  6803 I ActivityManager: Killing 21152:org.mozilla.fenix/u0a266 (adj 0): stop org.mozilla.fenix due to from pid 30674
08-31 18:47:09.973  4931  6803 D ActivityManager: proc ProcessRecord{94ac99b 21152:org.mozilla.fenix/u0a266} already removed. so we skip next process.
08-31 18:47:09.974  4931  6803 W ActivityTaskManager: Force removing ActivityRecord{6132085 u0 org.mozilla.fenix/.HomeActivity t2839 f}}: app died, no saved state
08-31 18:47:09.975  4931  6803 I ActivityTaskManager: Removing activity ActivityRecord{6132085 u0 org.mozilla.fenix/.HomeActivity t2839 f}}(appDied)  from stack callers=com.android.server.wm.ActivityStack$RemoveHistoryRecordsForApp.processActivity:565 com.android.server.wm.ActivityStack$RemoveHistoryRecordsForApp.process:480 com.android.server.wm.ActivityStack.handleAppDied:3354 com.android.server.wm.RootWindowContainer.handleAppDied:3625 com.android.server.wm.ActivityTaskManagerService$LocalService.handleAppDied:8231 
08-31 18:47:09.975  4931  6803 I WindowManager: Reparenting to leash, surface=Surface(name=fd5a94a org.mozilla.fenix/org.mozilla.fenix.HomeActivity)/@0x395fee9, leashParent=Surface(name=ActivityRecord{6132085 u0 org.mozilla.fenix/.HomeActivity t2839})/@0xc03bea6
08-31 18:47:09.976  4597  4618 I SurfaceFlinger: id=4239 createSurf (0x0),-1 flag=24000, Surface(name=fd5a94a org.mozilla.fenix/org.mozilla.fenix.HomeActivity)/@0x395fee9 - animation-leash#0
08-31 18:47:09.977  4931  6803 D WindowManager: makeSurface duration=1 leash=Surface(name=Surface(name=fd5a94a org.mozilla.fenix/org.mozilla.fenix.HomeActivity)/@0x395fee9 - animation-leash)/@0xbbeb76e
08-31 18:47:09.979  4931  6803 V WindowManager: Changing focus from Window{fd5a94a u0 org.mozilla.fenix/org.mozilla.fenix.HomeActivity} to null displayId=0 Callers=com.android.server.wm.RootWindowContainer.updateFocusedWindowLocked:566 com.android.server.wm.WindowManagerService.updateFocusedWindowLocked:6255 com.android.server.wm.ActivityRecord.commitVisibility:4921 com.android.server.wm.ActivityRecord.onRemovedFromDisplay:3752 
08-31 18:47:09.979  4931  6803 D StatusBarManagerService: notifyRequestedSystemKey recent=false home=false
08-31 18:47:09.981  4931  6803 D SurfaceControl: hide : mNativeObject = 480917741504 - sc.mNativeObject = 480923274976 - Surface(name=ActivityRecord{6132085 u0 org.mozilla.fenix/.HomeActivity t2839})/@0xc03bea6
08-31 18:47:09.981  4931  6803 D SurfaceControl: nativeSetFlags Done : Surface(name=ActivityRecord{6132085 u0 org.mozilla.fenix/.HomeActivity t2839})/@0xc03bea6
08-31 18:47:09.982  4931  6803 V WindowManager: Setting visibility of Window{fd5a94a u0 org.mozilla.fenix/org.mozilla.fenix.HomeActivity}: false, caller=com.android.server.wm.WindowContainer.sendAppVisibilityToClients:1017 com.android.server.wm.ActivityRecord.setClientVisible:6327 com.android.server.wm.ActivityRecord.postApplyAnimation:4962 com.android.server.wm.ActivityRecord.commitVisibility:4928 com.android.server.wm.ActivityRecord.onRemovedFromDisplay:3752 
08-31 18:47:09.983  4931  6803 D WindowManager: isScreenshotDisabledLocked: userId = 0, disabled =false
08-31 18:47:09.983  4931  6803 D InputDispatcher: Focused application released (0)
Flags: needinfo?(gmierz2)

:jmaher, I think you might have closed this one as a duplicate by accident?

Status: RESOLVED → REOPENED
Flags: needinfo?(jmaher)
Resolution: DUPLICATE → ---

yes, this is clearly wrong, thanks

Flags: needinfo?(jmaher)
No longer blocks: 1788383

Today I took a look at the crash directory paths in case something odd was going on there and I haven't found anything.

Fenix seems to be failing when we're trying to shut it down here: https://github.com/sitespeedio/browsertime/blob/0fbdf3739cb4c1a8fe1e6fa352faaf5ec14f02de/lib/core/engine/iteration.js#L258

I'm going to try adding a condition there to ignore failures/crashes on shutdown for android.

Clearing previous NI; I don't see anything in the most recent logs that are suspect for GeckoView/Android Components/Fenix.

Flags: needinfo?(jonalmeida942)

I found that there's a Shell was granted Superuser rights message that keeps flickering on and off on all failing videos and I have yet to see it on a video in a passing test: https://firefoxci.taskcluster-artifacts.net/SNiUnZiaSI-eGFgXBFO2Zw/0/public/test_info/browsertime-results/bing-search-restaurants/pages/www_bing_com/search/query-93e03990/data/video/1.mp4

Here notice that magisk is responsible for generating these toast displays and it reaches the limit:

08-31 19:19:52.730 24707  1204 I libglean_ffi: glean_core::upload::directory: Processing ping at: /data/user/0/org.mozilla.fenix/glean_data/pending_pings/68ce72d9-d0c0-4207-b617-005cf90250d5
08-31 19:19:52.731 24707 24707 I GeckoRuntime: Adding debug configuration from: /data/local/tmp/org.mozilla.fenix-geckoview-config.yaml
08-31 19:19:52.731 24707  1204 I libglean_ffi: glean_core::metrics::ping: The ping 'metrics' was submitted and will be sent as soon as possible
08-31 19:19:52.734  4927  5639 E NotificationService: Package has already posted 25 toasts. Not showing more. Package=com.topjohnwu.magisk

I think the SCREEN_BRIGHT_WAKE_LOCK is being continuously modified by something (I can't tell what from the logs) because the failing logs are littered with acquires and releases for this lock while the passing logs only have a bit at the start. You can also find it interleaved with Gecko logging:

08-31 19:02:46.692  4927  4969 D PowerManagerService: [api] acquire WakeLock SCREEN_BRIGHT_WAKE_LOCK        'WindowManager' ON_AFTER_RELEASE (uid=1000 pid=4927 pkg=android ws=WorkSource{10063})
08-31 19:02:46.695  4927  4967 D SettingsProvider: isChangeAllowed() : name = enabled_notification_listeners
08-31 19:02:46.695  4927  4967 D SettingsProvider: isChangeAllowed() : name = enabled_notification_assistant
08-31 19:02:46.696  4927  4967 D SettingsProvider: isChangeAllowed() : name = enabled_notification_policy_access_packages
08-31 19:02:46.696  4592  4592 D SurfaceFlinger: Display 0 HWC layers:
08-31 19:02:46.696  4592  4592 D SurfaceFlinger:      DEVICE | 0x7e5ec47580 | 0000 | RGBx_8888    |    0.0    0.0 1080.0 2186.0 |    0   88 1080 2274 | GeckoSurface#0
08-31 19:02:46.696  4592  4592 D SurfaceFlinger:      DEVICE | 0x7ddac15c00 | 0000 | RGBA_8888    |    0.0    0.0 1080.0 2400.0 |    0    0 1080 2400 | org.mozilla.fenix/org.mozilla.fenix.HomeActivity$_12485#0
08-31 19:02:46.696  4592  4592 D SurfaceFlinger:      DEVICE | 0x7ddac14940 | 0000 | RGBA_8888    |    0.0    0.0 1080.0   87.0 |    0    0 1080   87 | StatusBar$_5265#0
08-31 19:02:46.696  4592  4592 D SurfaceFlinger:      DEVICE | 0x7e5ec48980 | 0000 | RGBA_8888    |    0.0    0.0  811.0  108.0 |  134 1998  945 2106 | Toast$_5265#0
08-31 19:02:46.696  4592  4592 D SurfaceFlinger:      DEVICE | 0x7ddac14300 | 0000 | RGBA_8888    |    0.0    0.0 1080.0  126.0 |    0 2274 1080 2400 | NavigationBar0$_5265#0
08-31 19:02:46.696  4592  4592 D SurfaceFlinger:      DEVICE | 0x7ddac13b80 | 0000 | RGBA_8888    |    0.0    0.0 1080.0  125.0 |    0    0 1080  125 | ScreenDecorOverlay$_5265#0
08-31 19:02:46.696  4592  4592 D SurfaceFlinger:      DEVICE | 0x7ddac137c0 | 0000 | RGBA_8888    |    0.0    0.0 1080.0  100.0 |    0 2300 1080 2400 | ScreenDecorOverlayBottom$_5265#0
08-31 19:02:46.696  4592  4592 D SurfaceFlinger: 
08-31 19:02:46.698  4927  4969 D SurfaceControl: hide : mNativeObject = 514164109248 - sc.mNativeObject = 514018767168 - Surface(name=Input Consumer recents_animation_input_consumer)/@0x232ee9e
08-31 19:02:46.698  4927  4969 D SurfaceControl: nativeSetFlags Done : Surface(name=Input Consumer recents_animation_input_consumer)/@0x232ee9e
08-31 19:02:46.700  4927  4969 I WindowManager: Reparenting to original parent: Surface(name=WindowToken{42cd48d android.os.Binder@929f124})/@0x3d58871, destroy=true, surface=Surface(name=5f78700 Toast)/@0xb9f3618
08-31 19:02:46.700  4927  4969 I SurfaceControl: release : mNativeObject = 512039585216 - Surface(name=Surface(name=5f78700 Toast)/@0xb9f3618 - animation-leash)/@0x40706d7
08-31 19:02:46.700  4927  4969 I SurfaceControl: nativeRelease nativeObject s[512039585216]
08-31 19:02:46.700  4927  4969 I SurfaceControl: nativeRelease nativeObject e[512039585216]
08-31 19:02:46.702 27599 27644 I RoutineDebug[3.0.45]: NotificationUpdateService: updateNotification: noti cancel
08-31 19:02:46.702  4592  4615 I Layer   : id=581 removeFromCurrentState Surface(name=5f78700 Toast)/@0xb9f3618 - animation-leash#0 (86)
08-31 19:02:46.703  4592  4615 I Layer   : id=579 removeFromCurrentState 5f78700 Toast#0 (86)
08-31 19:02:46.703  4592  4615 I Layer   : id=580 removeFromCurrentState Toast$_5265#0 (86)
08-31 19:02:46.703  4592  4615 I Layer   : id=579 addToCurrentState 5f78700 Toast#0 (86)
08-31 19:02:46.703  4592  4615 I Layer   : id=580 addToCurrentState Toast$_5265#0 (86)
08-31 19:02:46.704  4927  4969 D SurfaceControl: hide : mNativeObject = 514164109248 - sc.mNativeObject = 514018767168 - Surface(name=Input Consumer recents_animation_input_consumer)/@0x232ee9e
08-31 19:02:46.704  4927  4969 D SurfaceControl: nativeSetFlags Done : Surface(name=Input Consumer recents_animation_input_consumer)/@0x232ee9e
08-31 19:02:46.704  4927  4967 D SettingsProvider: isChangeAllowed() : name = enabled_notification_listeners
08-31 19:02:46.704 27599 27644 I RoutineDebug[3.0.45]: NotificationUpdateService: requestUpdateFaceWidget: 1
08-31 19:02:46.705  4927  4967 D SettingsProvider: isChangeAllowed() : name = enabled_notification_assistant
08-31 19:02:46.705  4927  4967 D SettingsProvider: isChangeAllowed() : name = enabled_notification_policy_access_packages
08-31 19:02:46.705  4592  4615 I SurfaceFlinger: id=581 Removed Surface(name=5f78700 Toast)/@0xb9f3618 - animation-leash#0 (86)
08-31 19:02:46.705  4592  4592 I Layer   : id=581[1] Destroyed Surface(name=5f78700 Toast)/@0xb9f3618 - animation-leash#0
08-31 19:02:46.708  4927  4969 D SurfaceControl: hide : mNativeObject = 514164109248 - sc.mNativeObject = 514018767168 - Surface(name=Input Consumer recents_animation_input_consumer)/@0x232ee9e
08-31 19:02:46.708  4927  4969 D SurfaceControl: nativeSetFlags Done : Surface(name=Input Consumer recents_animation_input_consumer)/@0x232ee9e
08-31 19:02:46.719 12485 27569 I System.out: (HTTPLog)-Static: isSBSettingEnabled false
08-31 19:02:46.720 12485 27569 I System.out: (HTTPLog)-Static: isSBSettingEnabled false

Here's one example of where it's released:

08-31 19:02:48.641  4927  4969 D PowerManagerService: [api] release WakeLock SCREEN_BRIGHT_WAKE_LOCK        'WindowManager' ON_AFTER_RELEASE ACQ=-1s947ms (uid=1000 pid=4927 pkg=android ws=WorkSource{10063})
08-31 19:02:48.641  4927  4969 D EdgeLightingManager: hideForWakeLock packageName=android
08-31 19:02:48.642  4927  4969 D EdgeLightingManager: isCallingUserSupported : callingUserId=0, mUserId=0, isDualAppId=false, isKnoxId=false
08-31 19:02:48.642  4927  4969 D EdgeLightingClientManager: isAvailableEdgeLighting: condition=2 mCondition=3 mIsConnected=false0
08-31 19:02:48.642  4927  4969 D EdgeLightingPolicyManager: isAcceptableApplication:  pkg=android , range=4 , includeAllApp=false , userId=0 , infoRange=0 , infoCategory=0
08-31 19:02:48.643  4927  4969 D EdgeLightingManager: hideForWakeLockInternal : return false by isAcceptableApplication.
08-31 19:02:48.643  4927  4969 D PowerManagerService: [api] applyWakeLockFlagsOnReleaseLocked : userActivityNoUpdateLocked is called : SCREEN_BRIGHT_WAKE_LOCK        'WindowManager' ON_AFTER_RELEASE (uid=1000 pid=4927 pkg=android ws=WorkSource{10063})
08-31 19:02:48.644  4592  4997 I SurfaceFlinger: id=580 Removed Toast$_5265#0 (85)
08-31 19:02:48.645 27073 27133 I Gecko   : 1661972568644	Marionette	TRACE	[8] MarionetteCommands actor created for window id 2147483651
08-31 19:02:48.645  4927  4969 I SurfaceControl: release : mNativeObject = 512039588000 - Surface(name=5f78700 Toast)/@0xb9f3618
08-31 19:02:48.645  4927  4969 I SurfaceControl: nativeRelease nativeObject s[512039588000]
08-31 19:02:48.645  4927  4969 I SurfaceControl: nativeRelease nativeObject e[512039588000]
08-31 19:02:48.646  4927  4969 I SurfaceControl: release : mNativeObject = 514268824864 - Surface(name=WindowToken{42cd48d android.os.Binder@929f124})/@0x3d58871

Given that there's code in Geckoview that uses this flag, I'm guessing switching it constantly, and rapidly is causing it to die after a certain amount of time: https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoAppShell.java#580

Logs/tasks are from here:

Here's the final section of the log in the failing run though: https://treeherder.mozilla.org/jobs?repo=fenix&revision=310a029c37f2c87d0bf33b31f87654bb39e3ff9b&selectedTaskRun=SNiUnZiaSI-eGFgXBFO2Zw.0

And one from a passing test: https://treeherder.mozilla.org/jobs?repo=fenix&revision=310a029c37f2c87d0bf33b31f87654bb39e3ff9b&selectedTaskRun=L2QtZnQ0TXmBOOh7ScO_eQ.0

:aerickson, does bitbar have anything that could be causing that flag to switch so much? I don't have any code that plays with this.

EDIT: I see something about the EdgeLightingManager in the logs, could we also get the edge lighting disabled?

Flags: needinfo?(aerickson)

Digging further, I see this:

08-31 17:57:53.281  4926  4949 E CoverManager: getCoverState : coverState is null
08-31 17:57:53.281  4926  4949 D EdgeLightingPolicyManager: isAcceptableApplication:  pkg=com.topjohnwu.magisk , range=512 , includeAllApp=false , userId=0 , infoRange=0 , infoCategory=0
08-31 17:57:53.281  4926  4949 D EdgeLightingManager: showForToast : return false by isAcceptableApplication.
08-31 17:57:53.284  4926  5299 I GenerationRegistry: mBackingStore.isClosed() : false
08-31 17:57:53.298  4926  4949 W ContextImpl: Calling a method in the system process without a qualified user: android.app.ContextImpl.sendBroadcast:1162 com.android.server.notification.sec.DisplayToast.sendIntentForToastDumpLog:156 com.android.server.notification.sec.DisplayToast.outFile:106 com.android.server.notification.sec.DisplayToast.out:62 com.android.server.notification.NotificationManagerService$12.enqueueToastForDex:3467 
08-31 17:57:53.302 18309 18309 D AndroidRuntime: Shutting down VM
08-31 17:57:53.304  4600  5722 I SurfaceFlinger: id=83 createSurf (0x0),-1 flag=80004, WindowToken{626c416 android.os.Binder@7a93531}#0
08-31 17:57:53.333 18330 18330 D Toast   : show: caller = a.qu.run:1 
08-31 17:57:53.333 18330 18330 I Toast   : show: focusDisplayId = 0, isFocusInDesktop = false mCustomDisplayId=-1 isDexDualMode=false
08-31 17:57:53.333 18330 18330 I Toast   : show: isActivityContext = false
08-31 17:57:53.334 18369 18369 D AndroidRuntime: >>>>>> START com.android.internal.os.RuntimeInit uid 0 <<<<<<
08-31 17:57:53.339  4926  4949 D EdgeLightingManager: isCallingUserSupported : callingUserId=0, mUserId=0, isDualAppId=false, isKnoxId=false
08-31 17:57:53.339  4926  4949 D EdgeLightingManager: showForToast : packageName = com.topjohnwu.magisk
08-31 17:57:53.339  4926  4949 D EdgeLightingClientManager: isAvailableEdgeLighting: condition=1 mCondition=3 mIsConnected=false0
08-31 17:57:53.341  4926  4949 D SEP_UNION_CoverManager_CoverManagerAllowLists: isAllowedToUse : cover manager allow lists does not include this App : com.topjohnwu.magisk 
08-31 17:57:53.341  4926  4949 W SEP_UNION_CoverManager_CoverManagerServiceImpl: getCoverStateInternal : caller is invalid 
08-31 17:57:53.341  4926  4949 E CoverManager: getCoverState : coverState is null
08-31 17:57:53.342  4926  4949 D EdgeLightingPolicyManager: isAcceptableApplication:  pkg=com.topjohnwu.magisk , range=512 , includeAllApp=false , userId=0 , infoRange=0 , infoCategory=0
08-31 17:57:53.342  4926  4949 D EdgeLightingManager: showForToast : return false by isAcceptableApplication.
08-31 17:57:53.343 18369 18369 I AndroidRuntime: Using default boot image
08-31 17:57:53.343 18369 18369 I AndroidRuntime: Leaving lock profiling enabled
08-31 17:57:53.343  4926  4949 W ContextImpl: Calling a method in the system process without a qualified user: android.app.ContextImpl.sendBroadcast:1162 com.android.server.notification.sec.DisplayToast.sendIntentForToastDumpLog:156 com.android.server.notification.sec.DisplayToast.outFile:106 com.android.server.notification.sec.DisplayToast.out:62 com.android.server.notification.NotificationManagerService$12.enqueueToastForDex:3467 

I'm not sure what Magick is trying to do or doing, but I think it's continuously retrying even after failure. I think this is the failure that's causing it:

08-31 17:57:53.341  4926  4949 D SEP_UNION_CoverManager_CoverManagerAllowLists: isAllowedToUse : cover manager allow lists does not include this App : com.topjohnwu.magisk 

I don't see this in passing logs either. From the failures I've looked at, devices 10, 23, and 25 are affected.

I've reached out to Bitbar about checking Magisk on those devices, disabling edge lighting, and whether their agent does anything with SCREEN_BRIGHT_WAKE_LOCK.

Is https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1777710&startday=2022-08-29&endday=2022-09-04&tree=all the correct chart for this issue? Have failures stopped as of 9/1?

Flags: needinfo?(aerickson)

That chart looks correct, but the cron hasn't ran on the fenix branch in a few days so it looks like there have been no failures.

EDIT: Last night I was looking some more and found that we do this: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/performance_tuning.py#58
But so far I don't think it's related to the SCREEN_BRIGHT_WAKE_LOCK.

Flags: needinfo?(amejiamarmol)

We are running Magisk v24.3 (https://github.com/topjohnwu/Magisk/releases) on the a51s which is the last release before a major rewrite in v25 (could potentially upgrade). Bitbar can disable the toast notifications that Magisk shows and the 'tapjacking protection' feature and we can see if either helps. Thoughts?

When does the cron run again? If we create jobs via mach try how many do we need to know if it's fixed?

Flags: needinfo?(gmierz2)

At the moment, the failure is very frequent so if you make a push to try for this 'browsertime 'a51 'tp6m 'fenix 'shipp !live !prof, you should be able to hit it at least once. I just landed a patch to start the fenix cron today and it should run sometime in the next 12 hours.

I think we should try disabling both of those and see if they help. Based on what I've read so far, it sounds like the tapjacking protection could be causing this issue.

Regarding upgrading, we should do the upgrade sooner rather than later, but that might turn into a rabbit hole given the big changes. If disabling the two things you mentioned doesn't work, maybe then we could look into upgrading Magisk? Otherwise, I can spend some more time looking around to see what might be triggering that wake lock. Do you know if they've disabled ambient display/lighting?

Flags: needinfo?(gmierz2) → needinfo?(aerickson)

(In reply to Greg Mierzwinski [:sparky] from comment #29)

At the moment, the failure is very frequent so if you make a push to try for this 'browsertime 'a51 'tp6m 'fenix 'shipp !live !prof, you should be able to hit it at least once. I just landed a patch to start the fenix cron today and it should run sometime in the next 12 hours.

OK, great.

I think we should try disabling both of those and see if they help. Based on what I've read so far, it sounds like the tapjacking protection could be causing this issue.

Regarding upgrading, we should do the upgrade sooner rather than later, but that might turn into a rabbit hole given the big changes. If disabling the two things you mentioned doesn't work, maybe then we could look into upgrading Magisk? Otherwise, I can spend some more time looking around to see what might be triggering that wake lock. Do you know if they've disabled ambient display/lighting?

I agree regarding upgrading. I'll have Bitbar disable the toast and the tapjacking and we can go from there.

Bitbar said that edge lighting should be disabled, but they are verifying. Is that what you mean by ambient display/lighting?

Flags: needinfo?(aerickson)

No, the edge lighting is something else (the edge of the a51s can light up for notifications apparently based on the EdgeLightingManager).

The ambient display setting changes the screen brightness based on the ambient light around them. Disabling "ambient display/lighting" should keep them at a constant brightness.

Aaah, OK. I've asked Bitbar if they disable 'Adaptive brightness' (Settings > Display).

Oh haha sorry! I wasn't sure what the proper name was for this.

Bitbar disables adaptive brightness and sets the brightness to 1 (max?) during setup.

adb commands used:

adb shell settings put system screen_brightness_mode 0
adb shell settings put system screen_brightness 1
Whiteboard: [relops-android]

Hi :aerickson, are there are updates from Bitbar regarding the changes?

Flags: needinfo?(aerickson)

Whoops sorry! I see the response in the other bug.

Flags: needinfo?(aerickson)

Digging further today and the changes made for tapjacking protection didn't help. We no longer have the SCREEN_BRIGHT_WAKE_LOCK flipping constantly though but we do have a "total cookie protection" popup being shown now (see bug 1791998).

So, I'm back to the solution of ignoring failures on shutdown for our android browsers because it looks like that's what's happening here:

09-22 06:49:41.464 30029  7359 I Gecko   : 1663829381464	Marionette	DEBUG	0 <- [1,106,null,{"cause":"shutdown","forced":false}]
09-22 06:49:41.478 30029  7365 I SessionPrioritizationMiddleware: Update the tab  ffb858dd-a2d0-4279-9da9-20931dbbe74e priority to DEFAULT
09-22 06:49:41.485 30029  7370 I SessionStorage/AutoSave: Save: New tab selected
09-22 06:49:41.497  4614  4629 E BufferQueueProducer: [GeckoSurface#0](id:120600000032,api:0,p:-1,c:4614) disconnect: not connected (req=1)
09-22 06:49:41.497  7557  7624 W libEGL  : EGLNativeWindowType 0xcb8bc008 disconnect failed
09-22 06:49:41.557 30029  7359 W google-breakpad: ExceptionHandler::GenerateDump cloned child 
09-22 06:49:41.557 30029  7359 W google-breakpad: 8004
09-22 06:49:41.557 30029  7359 W google-breakpad: 
09-22 06:49:41.557 30029  7359 W google-breakpad: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
09-22 06:49:41.557  8004  7359 W google-breakpad: ExceptionHandler::WaitForContinueSignal waiting for continue signal...
09-22 06:49:41.617  4614  4614 I SurfaceFlinger: SFWD update time=1353509833455
09-22 06:49:41.840  4314  4314 I io_stats: !@   8,0 r 194488 3500568 w 22093 665392 d 13623 322812 f 12354 15427 iot 41104 36401 th 102400 0 0 pt 0 inp 0 0 1353.730
09-22 06:49:43.300 30029 30029 I Choreographer: Skipped 103 frames!  The application may be doing too much work on its main thread.
09-22 06:49:43.318  4466  4466 E audit   : type=1701 audit(1663829383.311:2234): auid=4294967295 uid=10270 gid=10270 ses=4294967295 subj=u:r:untrusted_app:s0:c14,c257,c512,c768 pid=30029 comm="Gecko" exe="/system/bin/app_process32" sig=11 res=1
09-22 06:49:43.393  4936  5092 W InputDispatcher: channel 'de71a8f Panel:org.mozilla.fenix/org.mozilla.fenix.HomeActivity (server)' ~ Consumer closed input channel or an error occurred.  events=0x9, fd=420
09-22 06:49:43.393  4936  5092 E InputDispatcher: channel 'de71a8f Panel:org.mozilla.fenix/org.mozilla.fenix.HomeActivity (server)' ~ Channel is unrecoverably broken and will be disposed!
09-22 06:49:43.394  4936  5092 W InputDispatcher: channel '66844c6 Pop-Up Window (server)' ~ Consumer closed input channel or an error occurred.  events=0x9, fd=421
09-22 06:49:43.394  4936  5092 E InputDispatcher: channel '66844c6 Pop-Up Window (server)' ~ Channel is unrecoverably broken and will be disposed!
09-22 06:49:43.394  4936  5092 W InputDispatcher: channel '98c94e1 org.mozilla.fenix/org.mozilla.fenix.HomeActivity (server)' ~ Consumer closed input channel or an error occurred.  events=0x9, fd=625
09-22 06:49:43.394  4936  5092 E InputDispatcher: channel '98c94e1 org.mozilla.fenix/org.mozilla.fenix.HomeActivity (server)' ~ Channel is unrecoverably broken and will be disposed!
09-22 06:49:43.394  7784  7819 I Gecko   : Exiting due to channel error.
09-22 06:49:43.397  7444  7670 I Gecko   : Exiting due to channel error.
09-22 06:49:43.403  4936  6920 D InputTransport: Input channel destroyed: 'ClientS', fd=419
09-22 06:49:43.404  4614  4916 I SurfaceFlinger: id=277 Removed Bounds for - org.mozilla.fenix/org.mozilla.fenix.HomeActivity@0#0 (74)
09-22 06:49:43.404  4502  4502 I Zygote  : Process 30029 exited due to signal 11 (Segmentation fault)
09-22 06:49:43.404  4614  4916 I SurfaceFlinger: id=278 Removed SurfaceView - org.mozilla.fenix/org.mozilla.fenix.HomeActivity@afc3667@0#0 (74)
09-22 06:49:43.404  4614  4916 I SurfaceFlinger: id=279 Removed Background for -SurfaceView - org.mozilla.fenix/org.mozilla.fenix.HomeActivity@afc3667@0#0 (74)
09-22 06:49:43.404  4614  4628 I SurfaceFlinger: id=282 Removed GeckoSurface#0 (74)
09-22 06:49:43.405  4936  8010 I ActivityManager: Process org.mozilla.fenix (pid 30029) has died: fg  TOP (1426,3440)
09-22 06:49:43.406  4936  7247 I WindowManager: WIN DEATH: Window{98c94e1 u0 org.mozilla.fenix/org.mozilla.fenix.HomeActivity}

I started testing this out a few weeks ago and I retriggered the try run I made for it: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=e072d61889eead365c2d5e745639b84c1c47e5d3

Whiteboard: [relops-android][stockwell disable-recommended] → [relops-android][stockwell needswork:owner]
Whiteboard: [relops-android][stockwell disable-recommended] → [relops-android][stockwell needswork:owner]
OS: Unspecified → Android
Whiteboard: [relops-android][stockwell needswork:owner] → [relops-android] [stockwell needswork:owner] [geckoview:2022q4?] [geckoview:m108?]

108

Whiteboard: [relops-android] [stockwell needswork:owner] [geckoview:2022q4?] [geckoview:m108?] → [relops-android] [stockwell needswork:owner] [geckoview:2022q4] [geckoview:m108]

Jonathan said he will look into this bug and determine whether we need we to fix something in Fenix.

Flags: needinfo?(jonalmeida942)

Note that the failure has been "disabled" as we are currently ignoring shutdown failures. I've pushed a patch with the flag removed to see how often the failure is currently happening because it's hard to find in the passing logs: https://treeherder.mozilla.org/jobs?repo=try&revision=f30c5e5862efa2c239ac3c6cc6f66d8ae14a67ab

At the moment, the failure rate for A51 has decreased dramatically after we ignored the shutdown failures, and reset our bitbar docker hosts: https://treeherder.mozilla.org/jobs?repo=fenix&revision=8ac2dc317b01f7bd03a93779c5c2eb6f5df8071f

(In reply to Greg Mierzwinski [:sparky] from comment #49)

Note that the failure has been "disabled" as we are currently ignoring shutdown failures. I've pushed a patch with the flag removed to see how often the failure is currently happening because it's hard to find in the passing logs: https://treeherder.mozilla.org/jobs?repo=try&revision=f30c5e5862efa2c239ac3c6cc6f66d8ae14a67ab

The failures are still happening if we remove the flag. All 4 fenix failures in that try run were caused by this issue. This seems lower than it was before for some reason, but it's still not low enough to remove the flag.

Sparky, what are the next steps for this bug? I assume ignoring shutdown failures isn't a good long-term solution. :)

Does an Android engineer need to debug any particular errors?

Flags: needinfo?(gmierz2)
Whiteboard: [relops-android][geckoview:2022q4] [geckoview:m108][stockwell unknown] → [relops-android][geckoview:2022q4] [geckoview:m108] [stockwell unknown] [geckoview:m109?]

Thanks for checking! There's a couple things left to do here:
(1) Get rid of the red herring failure (the signature of this bug). :afinder is currently working on this between tasks.
(2) As you said, find an android engineer to look into the segmentation fault failure.

The failure is the same as from comment #40 (see below), and it's a segmentation fault. It's odd that we aren't getting any crash reports though, I'm not sure if this is expected or not. We have code for this here: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/browsertime/android.py#210-232

We're definitely checking the minidumps folder in the tasks, but it comes up empty. I'm not sure if a segfault might prevent it from being generated:

[task 2022-10-20T23:17:59.732Z] 23:17:59     INFO -  adb shell_bool: adb -s RR8R10B7DWY wait-for-device shell su -c 'test -d /sdcard/Android/data/org.mozilla.fenix/files/test_root/org.mozilla.fenix-geckodriver-profile/minidumps', timeout: None, timedout: None, exitcode: 1, output:

:cpeterson, I think it would be helpful for us to know if there should be a crash report on a segmentation fault? If so, then Fenix isn't generating one and we'd need to figure out whats going on there.

(In reply to Greg Mierzwinski [:sparky] from comment #40)

09-22 06:49:41.464 30029  7359 I Gecko   : 1663829381464	Marionette	DEBUG	0 <- [1,106,null,{"cause":"shutdown","forced":false}]
09-22 06:49:41.478 30029  7365 I SessionPrioritizationMiddleware: Update the tab  ffb858dd-a2d0-4279-9da9-20931dbbe74e priority to DEFAULT
09-22 06:49:41.485 30029  7370 I SessionStorage/AutoSave: Save: New tab selected
09-22 06:49:41.497  4614  4629 E BufferQueueProducer: [GeckoSurface#0](id:120600000032,api:0,p:-1,c:4614) disconnect: not connected (req=1)
09-22 06:49:41.497  7557  7624 W libEGL  : EGLNativeWindowType 0xcb8bc008 disconnect failed
09-22 06:49:41.557 30029  7359 W google-breakpad: ExceptionHandler::GenerateDump cloned child 
09-22 06:49:41.557 30029  7359 W google-breakpad: 8004
09-22 06:49:41.557 30029  7359 W google-breakpad: 
09-22 06:49:41.557 30029  7359 W google-breakpad: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
09-22 06:49:41.557  8004  7359 W google-breakpad: ExceptionHandler::WaitForContinueSignal waiting for continue signal...
09-22 06:49:41.617  4614  4614 I SurfaceFlinger: SFWD update time=1353509833455
09-22 06:49:41.840  4314  4314 I io_stats: !@   8,0 r 194488 3500568 w 22093 665392 d 13623 322812 f 12354 15427 iot 41104 36401 th 102400 0 0 pt 0 inp 0 0 1353.730
09-22 06:49:43.300 30029 30029 I Choreographer: Skipped 103 frames!  The application may be doing too much work on its main thread.
09-22 06:49:43.318  4466  4466 E audit   : type=1701 audit(1663829383.311:2234): auid=4294967295 uid=10270 gid=10270 ses=4294967295 subj=u:r:untrusted_app:s0:c14,c257,c512,c768 pid=30029 comm="Gecko" exe="/system/bin/app_process32" sig=11 res=1
09-22 06:49:43.393  4936  5092 W InputDispatcher: channel 'de71a8f Panel:org.mozilla.fenix/org.mozilla.fenix.HomeActivity (server)' ~ Consumer closed input channel or an error occurred.  events=0x9, fd=420
09-22 06:49:43.393  4936  5092 E InputDispatcher: channel 'de71a8f Panel:org.mozilla.fenix/org.mozilla.fenix.HomeActivity (server)' ~ Channel is unrecoverably broken and will be disposed!
09-22 06:49:43.394  4936  5092 W InputDispatcher: channel '66844c6 Pop-Up Window (server)' ~ Consumer closed input channel or an error occurred.  events=0x9, fd=421
09-22 06:49:43.394  4936  5092 E InputDispatcher: channel '66844c6 Pop-Up Window (server)' ~ Channel is unrecoverably broken and will be disposed!
09-22 06:49:43.394  4936  5092 W InputDispatcher: channel '98c94e1 org.mozilla.fenix/org.mozilla.fenix.HomeActivity (server)' ~ Consumer closed input channel or an error occurred.  events=0x9, fd=625
09-22 06:49:43.394  4936  5092 E InputDispatcher: channel '98c94e1 org.mozilla.fenix/org.mozilla.fenix.HomeActivity (server)' ~ Channel is unrecoverably broken and will be disposed!
09-22 06:49:43.394  7784  7819 I Gecko   : Exiting due to channel error.
09-22 06:49:43.397  7444  7670 I Gecko   : Exiting due to channel error.
09-22 06:49:43.403  4936  6920 D InputTransport: Input channel destroyed: 'ClientS', fd=419
09-22 06:49:43.404  4614  4916 I SurfaceFlinger: id=277 Removed Bounds for - org.mozilla.fenix/org.mozilla.fenix.HomeActivity@0#0 (74)
09-22 06:49:43.404  4502  4502 I Zygote  : Process 30029 exited due to signal 11 (Segmentation fault)
09-22 06:49:43.404  4614  4916 I SurfaceFlinger: id=278 Removed SurfaceView - org.mozilla.fenix/org.mozilla.fenix.HomeActivity@afc3667@0#0 (74)
09-22 06:49:43.404  4614  4916 I SurfaceFlinger: id=279 Removed Background for -SurfaceView - org.mozilla.fenix/org.mozilla.fenix.HomeActivity@afc3667@0#0 (74)
09-22 06:49:43.404  4614  4628 I SurfaceFlinger: id=282 Removed GeckoSurface#0 (74)
09-22 06:49:43.405  4936  8010 I ActivityManager: Process org.mozilla.fenix (pid 30029) has died: fg  TOP (1426,3440)
09-22 06:49:43.406  4936  7247 I WindowManager: WIN DEATH: Window{98c94e1 u0 org.mozilla.fenix/org.mozilla.fenix.HomeActivity}
Flags: needinfo?(gmierz2) → needinfo?(cpeterson)

(In reply to Greg Mierzwinski [:sparky] from comment #55)

Thanks for checking! There's a couple things left to do here:
(1) Get rid of the red herring failure (the signature of this bug). :afinder is currently working on this between tasks.
(2) As you said, find an android engineer to look into the segmentation fault failure.

The failure is the same as from comment #40 (see below), and it's a segmentation fault. It's odd that we aren't getting any crash reports though, I'm not sure if this is expected or not. We have code for this here: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/browsertime/android.py#210-232

This bug is in the Android Foundation team's current sprint (109), but we don't have an engineer assigned yet.

Sounds like we need to determine:

  1. What's causing the segfault?
  2. Why aren't we saving minidumps?

Do you know why the Intermittent Failures Robot changed this bug's status (in comment 52) from [stockwell needswork:owner] to [stockwell unknown]? What does "unknown" mean?

How urgent is this bug fix? Orange Factor says there are 17 failures last week. I don't know know if that is a lot or a few.

Flags: needinfo?(cpeterson) → needinfo?(gmierz2)

(In reply to Chris Peterson [:cpeterson] from comment #56)

Do you know why the Intermittent Failures Robot changed this bug's status (in comment 52) from [stockwell needswork:owner] to [stockwell unknown]? What does "unknown" mean?

The sheriffs may have more context, but from what I've seen, the intermittent bot sets it to unknown after the failure rate has dropped below a threshold on a bug that was flagged by a sheriff.

How urgent is this bug fix? Orange Factor says there are 17 failures last week. I don't know know if that is a lot or a few.

It's not too urgent at the moment, and it's also unclear if this failure exists outside of CI environments (Peter Hedenskog from Wikimedia has seen it too). We use a 15-failure threshold in our triage process, but this bug is a combination of multiple issues since the signature is a red herring (and we're working on resolving this). At the moment, the segfault failures are being ignored in our tests but from the last test we did the failure rate was still quite high (comment #51).

Flags: needinfo?(gmierz2)

This patch fixes an issue where WRITE_EXTERNAL_STORAGE is attempting to be set on Android 11+ devices, but it no longer exists on these devices and it sometimes throws an error/warning that becomes a red-herring.

Assignee: nobody → gmierz2
Keywords: leave-open
Pushed by gmierz2@outlook.com: https://hg.mozilla.org/integration/autoland/rev/983d6ea4e4e1 Don't request WRITE_EXTERNAL_STORAGE permission on android 11. r=gbrown

Clearing NI; can't see any actionable feedback I can add here.

Flags: needinfo?(jonalmeida942)

Resolving as fixed by comment 62

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
See Also: → 1813821
Duplicate of this bug: 1813821
See Also: → 1933552
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: