Closed Bug 1270191 Opened 4 years ago Closed 4 years ago

Ensure Java telemetry uploader is not run in automation

Categories

(Firefox for Android :: General, defect)

All
Android
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 49
Tracking Status
firefox49 --- fixed

People

(Reporter: mcomella, Assigned: mcomella)

References

(Blocks 3 open bugs)

Details

Attachments

(4 files)

We don't want it affecting our data and wasting resources.
Geoff, we only enable upload for MOZILLA_OFFICIAL builds. The comment here [1] indicates anything built on CI is MOZILLA_OFFICIAL so we're likely trying to upload from automation. Do you happen to know and if so, do you agree?

[1]: https://mxr.mozilla.org/mozilla-central/source/mobile/android/base/AppConstants.java.in#289
Flags: needinfo?(gbrown)
I agree that MOZILLA_OFFICIAL builds run in automated tests in CI. It makes sense to me that non-MOZILLA_OFFICIAL (local developer builds) would have telemetry disabled, but there needs to be an additional barrier to disable telemetry in automated tests in CI. We would typically do that -- distinguish between running in test automation and running in the real-world -- with a Gecko pref set by the testing framework.

I don't recall specifics for telemetry, but I found this, which looks possibly related/useful:

  http://hg.mozilla.org/mozilla-central/annotate/29662e28a9c9/testing/profiles/prefs_general.js#l231

Also, there is the long-stuck bug 1004682, for disabling background data on test emulators.
Flags: needinfo?(gbrown)
Margaret pointed me at bug 1247324 for precedent – the same problem applied to Switchboard.
Code will build on bug 1261210.
Depends on: 1261210
I'll push to try (to make sure this actually works!) once it reopens.
re push in comment 9:

05-20 00:55:20.643  2764  2764 D GeckoExperiments: Switchboard disabled by environment variable: MOZ_DISABLE_SWITCHBOARD
05-20 00:55:20.643  2764  2764 D GeckoTelemetryUploadSer: Telemetry disabled by environment variable: MOZ_DISABLE_TELEMETRY

Looks like my patches worked but all the robocop tests failed! Looking into it, it looks like the tests are passing in Java, but then there is a native crash as the test is shutting down – let's try again.
I made a try push to robocop-1 for the first commit, then the first two, and there were no failures.
Comment on attachment 8754580 [details]
MozReview Request: Bug 1270191 - Add IntentUtils.getEnvVarMap. r=grisha

https://reviewboard.mozilla.org/r/54054/#review51034

::: mobile/android/base/java/org/mozilla/gecko/util/IntentUtils.java:29
(Diff revision 1)
> +    /**
> +     * Returns a list of environment variables and their values. These are parsed from an Intent extra
> +     * with the key -> value format:
> +     *   env# -> ENV_VAR=VALUE
> +     *
> +     * # in env# is expected to be increasing from 0.

because i'm reading a comment and i'm used to seeing "#" used as a comment start token, I completely filtered it out while reading this line.

so in my head it kept reading as "in env# is expected...", which was pretty confusing :-)

::: mobile/android/base/java/org/mozilla/gecko/util/IntentUtils.java:39
(Diff revision 1)
> +        final HashMap<String, String> out = new HashMap<>();
> +        final Pattern envVarPattern = Pattern.compile(ENV_VAR_REGEX);
> +        Matcher matcher = null;
> +
> +        String envValue = intent.getStringExtra("env0");
> +        for (int i = 1; envValue != null; i++) {

I think a while loop is a better fit for what you're doing here

::: mobile/android/base/java/org/mozilla/gecko/util/IntentUtils.java:50
(Diff revision 1)
> +                matcher.reset(envValue);
> +            }
> +
> +            if (matcher.matches()) {
> +                final String envVarName = matcher.group(1);
> +                final String envVarValue = matcher.group(2);

would this work for blank env values?
e.g. EVIL_VAL=

::: mobile/android/base/java/org/mozilla/gecko/util/IntentUtils.java:51
(Diff revision 1)
> +            }
> +
> +            if (matcher.matches()) {
> +                final String envVarName = matcher.group(1);
> +                final String envVarValue = matcher.group(2);
> +                out.put(envVarName, envVarValue);

that's kind of neat. out.put(key,val) reads as output: key,val

::: mobile/android/tests/background/junit4/src/org/mozilla/gecko/util/TestIntentUtils.java:37
(Diff revision 1)
> +        tempMap.put("STRING", "TEXT");
> +        tempMap.put("L_WHITESPACE", " LEFT");
> +        tempMap.put("R_WHITESPACE", "RIGHT ");
> +        tempMap.put("ALL_WHITESPACE", " ALL ");
> +        tempMap.put("WHITESPACE_IN_VALUE", "IN THE MIDDLE");
> +        tempMap.put("WHITESPACE IN KEY", "IS_PROBABLY_NOT_VALID_ANYWAY");

I'd add in something like ("BLANK_VAL", ""). Can value be null?
Attachment #8754580 - Flags: review?(gkruglov) → review+
Comment on attachment 8754581 [details]
MozReview Request: Bug 1270191 - Move Experiments to env var solution. r=grisha

https://reviewboard.mozilla.org/r/54056/#review51036

::: mobile/android/base/java/org/mozilla/gecko/BrowserApp.java:793
(Diff revision 1)
> +     * until Gecko is loaded, and we need to know this before then.
> +     *
> +     * This method should be called early since other initialization
> +     * may depend on its results.
> +     */
> +    private void maybeConfigureForTests(final Intent intent) {

Perhaps a better name for this method might be something like `configureForTestsBasedOnEnvironment`. Otherwise it feels pretty non-determenistic :)

::: mobile/android/base/java/org/mozilla/gecko/util/Experiments.java:66
(Diff revision 1)
>      // Show name of organization (EV cert) instead of full URL in URL bar (Bug 1249594).
>      public static final String URLBAR_SHOW_EV_CERT_OWNER = "urlbar-show-ev-cert-owner";
>  
>      private static volatile Boolean disabled = null;
>  
> -    /**
> +    public static void setDisabledFromEnvVar(@NonNull final Map<String, String> envVarMap) {

Perhaps add a method comment explicitely stating that this may only be called once?

::: mobile/android/base/java/org/mozilla/gecko/util/Experiments.java:70
(Diff revision 1)
> -     */
> -    public static boolean isDisabled(SafeIntent intent) {
>          if (disabled != null) {
> -            return disabled;
> +            throw new IllegalStateException("Disabled state already set");
>          }
> -
> +        disabled = envVarMap.containsKey(ENVVAR_DISABLED);

This relies on a convention that presence of anything for the key indicates "true". So for example if I set env value as:
MOZ_DISABLE_SWITCHBOARD=0

this will still evaluate to true.

I feel that that's very brittle. At the very least, we should outline this behaviour wherever we set these values (if we don't do this already).
Attachment #8754581 - Flags: review?(gkruglov) → review+
Comment on attachment 8754582 [details]
MozReview Request: Bug 1270191 - Disable telemetry upload during tests. r=grisha

https://reviewboard.mozilla.org/r/54058/#review51040

::: build/mobile/remoteautomation.py:95
(Diff revision 1)
>          # network connections to the Switchboard server.
>          env.setdefault('MOZ_DISABLE_SWITCHBOARD', '1')
>  
> +        # Disable Java telemetry by default to
> +        # prevent network connections during testing.
> +        env.setdefault('MOZ_DISABLE_TELEMETRY', '1')

If I wanted to, say, enable telemetry, my first attempt would be something like:

`env.setdefault('MOZ_DISABLE_TELEMTRY', '0')`

which wouldn't work... I'm not sure if trying to evaluate truthiness of values is any better, but this code certainly implies that 1=true, which is misleading.

I'd add in a comment somewhere here describing how we treat these env variables.

::: mobile/android/base/java/org/mozilla/gecko/telemetry/TelemetryUploadService.java:62
(Diff revision 1)
> +     * TelemetryUploadService, so we don't have to synchronize/volatile.
> +     */
> +    private static boolean isDisabledByLaunchingIntent = false;
> +
> +    public static void setDisabledFromEnvVar(final HashMap<String, String> envVarMap) {
> +        isDisabledByLaunchingIntent = envVarMap.containsKey(ENV_VAR_NAME);

Are you expecting that this might be called multiple times?

In the `Experiments.java` you're throwing if `disabled` is already set. Since we seem to be relying a lot of sequencing of calls, perhaps this should follow a similar pattern to avoid introducing subtle bugs in the future?
Attachment #8754582 - Flags: review?(gkruglov) → review+
Generally this looks fine, but I would address the few issues outlined above before landing. There's some wacky stuff here that I feel might be brittle in the long-term.
Snorp, I commented out the line in build/mobile/remoteautomation that adds my new environment variable in automation [1] and the tests are no longer failing. I see Gecko reads env vars at various points [2] – do you know why adding this env var might be causing a crash? With my variable, I get the following env var set:

23:34:51     INFO -  05-19 23:32:35.540 D/GeckoLoader( 1231): Gecko environment env0: MOZ_CRASHREPORTER=1
23:34:51     INFO -  05-19 23:32:35.540 D/GeckoLoader( 1231): env1: R_LOG_VERBOSE=1
23:34:51     INFO -  05-19 23:32:35.549 D/GeckoLoader( 1231): env2: MOZ_DISABLE_SWITCHBOARD=1
23:34:51     INFO -  05-19 23:32:35.569 D/GeckoLoader( 1231): env3: MOZ_DISABLE_NONLOCAL_CONNECTIONS=1
23:34:51     INFO -  05-19 23:32:35.569 D/GeckoLoader( 1231): env4: MOZ_LOG_FILE=/storage/sdcard/tests/mozlog/moz-src/org/mozilla/gecko/tests/testANRReporter.java.log
23:34:51     INFO -  05-19 23:32:35.569 D/GeckoLoader( 1231): env5: R_LOG_DESTINATION=stderr
23:34:51     INFO -  05-19 23:32:35.569 D/GeckoLoader( 1231): env6: MOZ_CRASHREPORTER_NO_REPORT=1
23:34:51     INFO -  05-19 23:32:35.579 D/GeckoLoader( 1231): env7: MOZ_DISABLE_TELEMETRY=1
23:34:51     INFO -  05-19 23:32:35.579 D/GeckoLoader( 1231): env8: R_LOG_LEVEL=6
23:34:51     INFO -  05-19 23:32:35.610 D/GeckoLoader( 1231): env9: null

And things look similar in the commented out patch:

05-20 14:51:48.328   848   871 D GeckoLoader: Gecko environment env0: MOZ_CRASHREPORTER=1
05-20 14:51:48.328   848   871 D GeckoLoader: env1: R_LOG_VERBOSE=1
05-20 14:51:48.348   848   871 D GeckoLoader: env2: MOZ_DISABLE_SWITCHBOARD=1
05-20 14:51:48.348   848   871 D GeckoLoader: env3: MOZ_DISABLE_NONLOCAL_CONNECTIONS=1
05-20 14:51:48.358   848   871 D GeckoLoader: env4: MOZ_LOG_FILE=/storage/sdcard/tests/mozlog/moz-src/org/mozilla/gecko/tests/testGeckoProfile.java.log
05-20 14:51:48.378   848   871 D GeckoLoader: env5: R_LOG_DESTINATION=stderr
05-20 14:51:48.378   848   871 D GeckoLoader: env6: MOZ_CRASHREPORTER_NO_REPORT=1
05-20 14:51:48.378   848   871 D GeckoLoader: env7: R_LOG_LEVEL=6
05-20 14:51:48.388   848   871 D GeckoLoader: env8: null

Maybe we're rolling over to env10 and it doesn't like that?

[1]: https://hg.mozilla.org/try/rev/c19163b302721693a55caba477b1d0193e4adf09
Flags: needinfo?(snorp)
Did you see the TelemetryEnvironment error in some of the logcats?

http://archive.mozilla.org/pub/mobile/try-builds/michael.l.comella@gmail.com-14f366a0501835541d3792cd2a228c69bc7bd9f4/try-android-api-15/try_ubuntu64_vm_armv7_mobile_test-robocop-4-bm123-tests1-linux64-build872.txt.gz

07:19:55     INFO -  05-20 07:19:07.644 E/GeckoConsole( 2202): [JavaScript Error: "1463753947546	Toolkit.Telemetry	ERROR	TelemetryEnvironment::EnvironmentCache - error while initializing: Error: Unable to fetch oldest profile entry: Error: OS.File has been shut down. Rejecting post to DirectoryIterator_prototype_next (resource://gre/modules/ProfileAge.jsm:175:13) JS Stack trace: onFailure@ProfileAge.jsm:175:13 < Handler.prototype.process@Promise-backend.js:940:21 < this.PromiseWalker.walkerLoop@Promise-backend.js:816:7 < this.PromiseWalker.scheduleWalkerLoop/<@Promise-backend.js:750:11 < Spinner.prototype.observe@AsyncShutdown.jsm:551:9" {file: "resource://gre/modules/Log.jsm" line: 753}]
07:19:55     INFO -  05-20 07:19:07.644 E/GeckoConsole( 2202): App_append@resource://gre/modules/Log.jsm:753:9
07:19:55     INFO -  05-20 07:19:07.644 E/GeckoConsole( 2202): Logger.prototype.log@resource://gre/modules/Log.jsm:389:7
07:19:55     INFO -  05-20 07:19:07.644 E/GeckoConsole( 2202): LoggerRepository.prototype.getLoggerWithMessagePrefix/proxy.log@resource://gre/modules/Log.jsm:504:44
07:19:55     INFO -  05-20 07:19:07.644 E/GeckoConsole( 2202): Logger.prototype.error@resource://gre/modules/Log.jsm:397:5
07:19:55     INFO -  05-20 07:19:07.644 E/GeckoConsole( 2202): EnvironmentCache/this._initTask<@resource://gre/modules/TelemetryEnvironment.jsm:778:9
07:19:55     INFO -  05-20 07:19:07.644 E/GeckoConsole( 2202): Spinner.prototype.observe@resourc
(In reply to Geoff Brown [:gbrown] (pto May 28-June 13) from comment #20)
> Did you see the TelemetryEnvironment error in some of the logcats?
> 
> 07:19:55     INFO -  05-20 07:19:07.644 E/GeckoConsole( 2202): [JavaScript
> Error: "1463753947546	Toolkit.Telemetry	ERROR
> TelemetryEnvironment::EnvironmentCache - error while initializing: Error:
> Unable to fetch oldest profile entry: Error: OS.File has been shut down.

Thanks for taking a look!

The Java telemetry uploader doesnt't actually interact with the JS side (TelemetryEnvironment in this case) so I suspect this error is unrelated. I'd expect that nothing I did would cause OS.File to shut down.

Prior to the error, it appears BaseTest requests a quit:

07:13:04     INFO -  05-20 07:11:23.763 I/BaseTest( 1875): Requesting force quit.
07:13:04     INFO -  05-20 07:11:23.913 D/GeckoSuggestedSites( 1875): Number of suggested sites: 8
07:13:04     INFO -  05-20 07:11:29.374 W/GeckoLinker( 1875): /data/app/org.mozilla.fennec-1.apk!/assets/armeabi-v7a/libsoftokn3.so: unhandled flags #8 not handled
...
07:13:04     INFO -  05-20 07:11:36.934 D/GeckoNetworkManager( 1875): Incoming event enableNotifications for state OnNoListeners -> OnWithListeners
07:13:04     INFO -  05-20 07:11:36.944 D/GeckoNetworkManager( 1875): New network state: UP, CELLULAR, CELL_3G
07:13:04     INFO -  05-20 07:11:36.974 D/GeckoNetworkManager( 1875): Incoming event receivedUpdate for state OnWithListeners -> OnWithListeners
07:13:04     INFO -  05-20 07:11:36.995 D/GeckoNetworkManager( 1875): New network state: UP, CELLULAR, CELL_3G
07:13:04     INFO -  05-20 07:11:40.244 W/ResourceType( 1875): Too many attribute references, stopped at: 0x01010099
07:13:04     INFO -  05-20 07:11:41.594 I/dalvikvm( 1875): Could not find method android.media.AudioManager.dispatchMediaKeyEvent, referenced from method org.mozilla.gecko.GeckoInputConnection.performDefaultKeyAction
...
07:13:04     INFO -  05-20 07:11:57.334 I/GeckoPushService( 1875): Handling event: PushServiceAndroidGCM:Configure
07:13:04     INFO -  05-20 07:11:57.334 I/GeckoPushManager( 1875): Updating configuration.
...
07:13:04     INFO -  05-20 07:12:11.633 E/GeckoConsole( 1875): [JavaScript Error: "IndexedDB UnknownErr: IDBDatabase.cpp:603"]
07:13:04     INFO -  05-20 07:12:11.733 I/Gecko   ( 1875): console.error: PushService:
07:13:04     INFO -  05-20 07:12:11.823 E/GeckoConsole( 1875): [JavaScript Error: "stateChangeProcessEnqueue: Error transitioning state UnknownError"]

I wonder if something is going wrong in the shutdown process:
* There were some regressions with the rewritten GeckoNetworkManager (bug 1271318) – perhaps that's related
* Push sounds like it's doing stuff here (which surprises me) – related?
In the previous 3 try builds, I included various forms of env vars to the automation file and they all worked correctly. The crashes we are experiencing might be a side-effect of the earlier changes in my commit that we only see when disabling uploads in try builds.
re comment 26: with the unread build flags & the telemetry uploader disabled in the same way we would have had we read the flags, there is crashing... strange. The next try build (comment 27) only disables the uploader and does not touch the environment variables to make sure it's just disabling the uploader that's causing issues.
I had suspicions that it might be because we don't init client ID so I returned after that (comment 29) but it still crashes. Maybe I can return *during* the UploadService next?
comment 31: we don't crash if we return directly from UploadService.
My latest try push (comment 33) did nothing with env vars and only commented out the UploadTelemetryCallback and there was still a crash.

Thoughts:
* I may have changed something in the Telemetry init process that breaks without the java uploader running (hence the TelemetryEnvironment failures)
* I was pushing from a bad revision of fx-team

Let's do that again (on the latest fx-team).
And it passed! Let's try the full patch series again.
...and it passes. *sigh*

Time to check the review comments and push it. x_x
Flags: needinfo?(snorp)
https://reviewboard.mozilla.org/r/54054/#review51034

> would this work for blank env values?
> e.g. EVIL_VAL=

Added your suggested test case and this does work correctly – the empty string is returned for the second matcher group.
https://reviewboard.mozilla.org/r/54058/#review51040

> If I wanted to, say, enable telemetry, my first attempt would be something like:
> 
> `env.setdefault('MOZ_DISABLE_TELEMTRY', '0')`
> 
> which wouldn't work... I'm not sure if trying to evaluate truthiness of values is any better, but this code certainly implies that 1=true, which is misleading.
> 
> I'd add in a comment somewhere here describing how we treat these env variables.

fwiw, our build system uses env vars by checking for their presence (which requires you to pass some value besides the empty string; 1 is common). This may also be how many systems use env vars in general.

That being said, most other code doesn't do this so it could be clearer – I'll add a comment as you suggest.
The previous code checked:

if (env.startsWith("MOZ_DISABLE_SWITCHBOARD=")) {
    if (!env.endsWith("=")) {

So it would not pass with the empty String but my previous revision permitted
the empty string.

Practically speaking, I don't think it matters because this is only used in
remoteautomation.py where the value is 1, but better safe than sorry.

Review commit: https://reviewboard.mozilla.org/r/56762/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/56762/
Attachment #8758499 - Flags: review?(liuche)
Comment on attachment 8754580 [details]
MozReview Request: Bug 1270191 - Add IntentUtils.getEnvVarMap. r=grisha

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54054/diff/1-2/
Comment on attachment 8754581 [details]
MozReview Request: Bug 1270191 - Move Experiments to env var solution. r=grisha

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54056/diff/1-2/
Comment on attachment 8754582 [details]
MozReview Request: Bug 1270191 - Disable telemetry upload during tests. r=grisha

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54058/diff/1-2/
Comment on attachment 8758499 [details]
MozReview Request: Bug 1270191 - Correct my previous patches to more faithfully reproduce the refactored code. r=liuche

https://reviewboard.mozilla.org/r/56762/#review53428
Attachment #8758499 - Flags: review?(liuche) → review+
Depends on: 1277214
Duplicate of this bug: 1261494
Duplicate of this bug: 1268998
You need to log in before you can comment on or make changes to this bug.