Closed Bug 1277214 Opened 3 years ago Closed 3 years ago

Autophone - Mochitest Exceptions on Android 4.x

Categories

(Firefox for Android :: General, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 49
Tracking Status
firefox49 --- fixed
fennec 49+ ---

People

(Reporter: bc, Assigned: mcomella)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(6 files)

Beginning with 

https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=4b66897490ab2ab196c44ad9485dd529fe96441c&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3

-> Bug 1270191 - Ensure Java telemetry uploader is not run in automation

Autophone began failing the Mochitest Unit tests on Android 4.2 and Android 4.4 with an exception. There does not appear to be a problem with Android 6.0.1. We don't run these tests on Android 5.x so I can't make a statement about it.

It isn't clear to me what the failure is though it does appear that Fennec fails to start during the Mochitest tests though Fennec appears to work fine on the S1S2, the other reftests and the robocop autophone tests.

At the same time, the debug Rwv test on Android 4.2 and debug Rov test on Android 4.4 began failing due to unexpected assertions though we've seen these assertions elsewhere before today.

ASSERTION: Existing entry in StartupCache.: 'false', file /builds/slave/fx-team-and-api-15-d-000000000/build/src/startupcache/StartupCache.cpp, line 358

ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file /builds/slave/fx-team-and-api-15-d-000000000/build/src/obj-firefox/dist/include/nsCOMPtr.h, line 415

Is there something about bug 1270191 that is api level dependent?
(In reply to Bob Clary [:bc:] from comment #0)
> Is there something about bug 1270191 that is api level dependent?

Nothing that I can tell.

It's worth noting there were some crashes in the shutdown of robocop tests when I was initially trying to land bug 1270191, but rebasing onto the latest fx-team seemed to have fixed that. Perhaps it's related? I don't know why the java code might be causing crashes though.

I did modify build/mobile/remoteautomation.py to add an env var so perhaps that is causing issues? Either by 1) providing too many extras (e.g. too large an Intent) to the application, 2) by being improperly formatted, or 3) by something assuming the env var are hard-coded in some way that adding an env var breaks.
Assignee: nobody → michael.l.comella
There appears to be nothing particularly obvious in the logs (though the fact that it seems to fail on the same test each time could be telling).

Bob, is it possible to run these tests locally? It'd be very helpful in trying to reproduce these failures.

Note to self: it seems I can run these tests on try: https://wiki.mozilla.org/Auto-tools/Projects/Autophone/Autophone_for_developers#Submitting_try_builds_to_Autophone
Flags: needinfo?(bob)
I did move our env var handling code off of SafeIntent (which catches RuntimeException & OOM exceptions) to Intent – perhaps it's related? It looks like the existing code is still using SafeIntent (e.g. [1]).

I should look into the history here of why we used it.

[1]: https://dxr.mozilla.org/mozilla-central/rev/4d63dde701b47b8661ab7990f197b6b60e543839/mobile/android/base/java/org/mozilla/gecko/GeckoApp.java#1070
(In reply to Michael Comella (:mcomella) from comment #3)
> I should look into the history here of why we used it.

bug 1090385: it looks like we just wrapped all external intent handling, nothing specific to our env* code. In theory, another application can pass us an extra with an env* key, so perhaps we should use SafeIntent in all cases. Here's a try push to see if it helps the mochitest failures:
Looking at the logs for the Rwv failure:

05-31 18:30:50.816 D/GeckoLoader( 5375): Gecko environment env0: MOZ_CRASHREPORTER=1
05-31 18:30:50.816 D/GeckoLoader( 5375): env1: XPCOM_DEBUG_BREAK=stack
05-31 18:30:50.816 D/GeckoLoader( 5375): env2: R_LOG_VERBOSE=1
05-31 18:30:50.816 D/GeckoLoader( 5375): env3: MOZ_DISABLE_SWITCHBOARD=1
05-31 18:30:50.816 D/GeckoLoader( 5375): env4: MOZ_DISABLE_NONLOCAL_CONNECTIONS=1
05-31 18:30:50.816 D/GeckoLoader( 5375): env5: R_LOG_DESTINATION=stderr
05-31 18:30:50.816 D/GeckoLoader( 5375): env6: MOZ_CRASHREPORTER_NO_REPORT=1
05-31 18:30:50.816 D/GeckoLoader( 5375): env7: MOZ_DISABLE_TELEMETRY=1
05-31 18:30:50.816 D/GeckoLoader( 5375): env8: NO_EM_RESTART=1
05-31 18:30:50.816 D/GeckoLoader( 5375): env9: MOZ_PROCESS_LOG=/tmp/tmp_LTTsZpidlog
05-31 18:30:50.816 D/GeckoLoader( 5375): env10: R_LOG_LEVEL=6
05-31 18:30:50.816 D/GeckoLoader( 5375): env11: null

Maybe we don't correctly process env with two numerical digits. My added env var made the last relevant env var be 10, rather than 9.
You can run Autophone from the tree via the command

mach autophone

It will download and configure autophone for you providing prompt along the way. gbrown added this in bug 1147918
Flags: needinfo?(bob)
SafeIntentUtils breaks the env var algorithm:

        while (envValue != null) {
...
            envValue = intent.getStringExtra("env" + i);

SafeIntentUtils returns null when there is an error – a runtime exception or OOM – so we'd early return and miss some of our env vars. We should instead be checking containsKey or similar.

Summary of (likely) tangential things I think we should do, probably in separate bugs:
 * Handle all env vars with SafeIntents (because we don't know where these intents might be coming from and we'd rather not crash)
 * Fix the env* access algorithm

And directions to investigate:
 * Using SafeIntent for extracting env values
 * env* with two digits

Regarding the API split:
 * The code that Intent.getStringExtra uses under the hood, Bundle, appears to have changed between 4.4 [1] & the latest version [2]
 * It's possible that the devices in use for 6.0 are more powerful and are not dealing with the same OOM constraints that could be occurring on 4.4 (which we weren't previously crashing on because we were not using SafeIntentUtils).

[1]: note that the getString method is in BaseBundle now: http://androidxref.com/6.0.1_r10/xref/frameworks/base/core/java/android/os/Bundle.java
[2]: there is no BaseBundle: http://androidxref.com/4.4.4_r1/xref/frameworks/base/core/java/android/os/Bundle.java
(In reply to Michael Comella (:mcomella) from comment #8)
>  * env* with two digits

My try push (comment 9) that re-used MOZ_DISABLE_SWITCHBOARD rather than introducing a new env var went to orange permafail for Mdb, rather than purple permafail so maybe we're making progress. Note that 6.0 is also failing. The error:

3 INFO TEST-UNEXPECTED-ERROR | dom/browser-element/mochitest/test_browserElement_NoAttr.html | Assertion count 24 is greater than expected range 0-0 assertions.

fwiw, 4.4 also has log spam for:

06-01 11:49:45.585 E/UsbDebuggingManager(  789): Communication error:
06-01 11:49:45.585 E/UsbDebuggingManager(  789): java.io.IOException: Connection refused
06-01 11:49:45.585 E/UsbDebuggingManager(  789): 	at android.net.LocalSocketImpl.connectLocal(Native Method)
06-01 11:49:45.585 E/UsbDebuggingManager(  789): 	at android.net.LocalSocketImpl.connect(LocalSocketImpl.java:287)
06-01 11:49:45.585 E/UsbDebuggingManager(  789): 	at android.net.LocalSocket.connect(LocalSocket.java:130)
06-01 11:49:45.585 E/UsbDebuggingManager(  789): 	at com.android.server.usb.UsbDebuggingManager.listenToSocket(UsbDebuggingManager.java:75)
06-01 11:49:45.585 E/UsbDebuggingManager(  789): 	at com.android.server.usb.UsbDebuggingManager.run(UsbDebuggingManager.java:111)
06-01 11:49:45.585 E/UsbDebuggingManager(  789): 	at java.lang.Thread.run(Thread.java:841)

But maybe that's unrelated.
It looks like liuche's push prior to mine [1] also has the permafail (re comment 10). Sounds like I should land this and call it a day. I filed bug 1277390 to fix this properly.

[1]: https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=a05fae3adc541c24e86692bd87d69c9acaf6aa6c&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3
(In reply to Michael Comella (:mcomella) from comment #8)
> Summary of (likely) tangential things I think we should do, probably in
> separate bugs:
>  * Handle all env vars with SafeIntents (because we don't know where these
> intents might be coming from and we'd rather not crash)

I'll do that in this bug.

>  * Fix the env* access algorithm

I'll fix that for IntentUtils in this bug, but we should fix this for GeckoApp/GeckoLoader in bug 1277117.
Review commit: https://reviewboard.mozilla.org/r/57070/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/57070/
Attachment #8758964 - Flags: review?(gkruglov)
Attachment #8758965 - Flags: review?(gkruglov)
Attachment #8758966 - Flags: review?(gkruglov)
Attachment #8758967 - Flags: review?(gkruglov)
Attachment #8758968 - Flags: review?(gkruglov)
Attachment #8758969 - Flags: review?(gkruglov)
I feel this better follows the util class pattern: IntentUtils acts on Intents
as StringUtils would act on Strings.

Review commit: https://reviewboard.mozilla.org/r/57074/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/57074/
In the previous implementation, we'd stop reading when the value would return
null, however, this breaks with SafeIntents where null is returned if a value
throws a runtime exception - i.e. we might stop reading at env2 if it throws
even though env3+ exist.

Review commit: https://reviewboard.mozilla.org/r/57078/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/57078/
We hit an issue where adding a new env var, MOZ_DISABLE_TELEMETRY, added env10
and caused crashes. I suspect the issue is that there are is now a double-digit
number of env vars (bug 1277390). Here, we do the quick fix by removing
MOZ_DISABLE_TELEMETRY & repurposing MOZ_DISABLE_SWITCHBOARD to be generic.

While we're at it, we simplify the code by making the setDisabled methods a
strict getter without checking for how many times they're called.

Review commit: https://reviewboard.mozilla.org/r/57080/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/57080/
Comment on attachment 8758964 [details]
MozReview Request: Bug 1277214 - Add javadoc to explain SafeIntent. r=grisha

https://reviewboard.mozilla.org/r/57070/#review53778
Attachment #8758964 - Flags: review?(gkruglov) → review+
Comment on attachment 8758965 [details]
MozReview Request: Bug 1277214 - Move SafeIntent to its own class. r=grisha

https://reviewboard.mozilla.org/r/57072/#review53780

::: mobile/android/base/java/org/mozilla/gecko/util/SafeIntent.java:6
(Diff revision 1)
> +/*
> + * This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, you can obtain one at http://mozilla.org/MPL/2.0/.
> + */
> +

only skimmed this, assuming functionality didn't change
Attachment #8758965 - Flags: review?(gkruglov) → review+
Comment on attachment 8758966 [details]
MozReview Request: Bug 1277214 - Move remaining SafeIntentUtils functions to IntentUtils. r=grisha

https://reviewboard.mozilla.org/r/57074/#review53786

Can't comment on deleted file, but: also remove mention of 'mozglue/SafeIntentUtils.java' from moz.build? https://dxr.mozilla.org/mozilla-central/source/mobile/android/base/moz.build#88

::: mobile/android/base/java/org/mozilla/gecko/util/IntentUtils.java:69
(Diff revision 1)
> +
> +    public static String getStringExtraSafe(final Intent intent, final String name) {
> +        return new SafeIntent(intent).getStringExtra(name);
> +    }
> +
> +    public static boolean getBooleanExtraSafe(Intent intent, String name, boolean defaultValue) {

nit: any reason these aren't final?
Attachment #8758966 - Flags: review?(gkruglov)
Comment on attachment 8758966 [details]
MozReview Request: Bug 1277214 - Move remaining SafeIntentUtils functions to IntentUtils. r=grisha

https://reviewboard.mozilla.org/r/57074/#review53788
Attachment #8758966 - Flags: review+
Comment on attachment 8758968 [details]
MozReview Request: Bug 1277214 - Stop reading env vars when the key no longer exists. r=grisha

https://reviewboard.mozilla.org/r/57078/#review53802

Looks reasonable.

::: mobile/android/base/java/org/mozilla/gecko/util/IntentUtils.java:47
(Diff revision 1)
> -                final String envVarValue = matcher.group(2);
> -                out.put(envVarName, envVarValue);
> -            }
> -            envValue = safeIntent.getStringExtra("env" + i);
>              i += 1;
> +            if (!unsafeIntent.hasExtra(envKey)) {

I guess env* values are always sequential? e.g. we won't have [env1, env3].
Attachment #8758968 - Flags: review?(gkruglov) → review+
Comment on attachment 8758967 [details]
MozReview Request: Bug 1277214 - Use SafeIntent in getEnvVarMap. r=grisha

https://reviewboard.mozilla.org/r/57076/#review53804
Attachment #8758967 - Flags: review?(gkruglov) → review+
Comment on attachment 8758969 [details]
MozReview Request: Bug 1277214 - Move MOZ_DISABLE_* to single MOZ_IN_AUTOMATION env var. r=grisha

https://reviewboard.mozilla.org/r/57080/#review53806

::: mobile/android/base/java/org/mozilla/gecko/telemetry/TelemetryUploadService.java:49
(Diff revision 1)
>   */
>  public class TelemetryUploadService extends IntentService {
>      private static final String LOGTAG = StringUtils.safeSubstring("Gecko" + TelemetryUploadService.class.getSimpleName(), 0, 23);
>      private static final String WORKER_THREAD_NAME = LOGTAG + "Worker";
>  
>      private static final String ENV_VAR_NAME = "MOZ_DISABLE_TELEMETRY";

Probably don't need this anymore?
Attachment #8758969 - Flags: review?(gkruglov) → review+
tracking-fennec: ? → 49+
https://reviewboard.mozilla.org/r/57074/#review53786

> but: also remove mention of 'mozglue/SafeIntentUtils.java' from moz.build?

Done. That being said, there were build dependency issues with `mach build` where `util/` cannot be accessed from `mozglue/` (seems strange to me – I'd think util/ has no deps on anything else and can be accessed from anywhere – filed bug 1277631). My work-around was to just move SafeIntent to mozglue/ but it's not ideal.
https://reviewboard.mozilla.org/r/57078/#review53802

> I guess env* values are always sequential? e.g. we won't have [env1, env3].

I believe they are intended to be – I documented it in the `getEnvVarMap` comment.
You need to log in before you can comment on or make changes to this bug.