Closed Bug 888052 Opened 7 years ago Closed 4 years ago

De-prioritize cellular networks when communicating with server

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gps, Unassigned)

References

Details

Attachments

(1 file, 6 obsolete files)

FHR currently is agnostic about the current network connection channel with regard to server communication. And, it can arguably afford to do so on desktop Firefox. But on mobile devices, it's a whole other ballgame.

Cellular data is slow and costs users money or goes towards a quota. Firefox Health Report should ideally not communicate with the server over a cellular or similar slow/metered network if it has any choice in the matter.

nalexander: I imagine we'll emulate the logic used by the Android implementation. Can you enlighten us as to what you feel a good implementation looks like?
Flags: needinfo?(nalexander)
Our default strategy is to add a "only do X on wifi" toggle. We also respect the Android background data setting, which includes quota mechanics in some versions, iirc. 

Other schemes run the risk of getting stuck in bad states (e.g. when hotel wireless sucks and I leave my phone on mobile data only for ten straight days, or Sprint users with unlimited data).
(In reply to Richard Newman [:rnewman] from comment #1)
> Our default strategy is to add a "only do X on wifi" toggle. We also respect
> the Android background data setting, which includes quota mechanics in some
> versions, iirc. 

Yes, we punt to the Android platform: you can see the code at http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/background/BackgroundService.java#44

At the end of the day, we call http://developer.android.com/reference/android/net/NetworkInfo.html#isAvailable%28%29 which respects the data roaming platform level setting.
Flags: needinfo?(nalexander)
So the question is really "how likely is it that system data controls will avoid user surprise and disappointment?". On Android, many apps add extra controls (as I alluded to). That's probably not suitable for fxos, but I don't know what its background data settings are like.
(In reply to Richard Newman [:rnewman] from comment #3)
> So the question is really "how likely is it that system data controls will
> avoid user surprise and disappointment?". On Android, many apps add extra
> controls (as I alluded to). That's probably not suitable for fxos, but I
> don't know what its background data settings are like.

From-the-hip: not likely.  For example, we have Bug 802749 filed for syncing only when wifi is available. A recent mobile add-on was floated that didn't actually load tab contents until the tab was viewed, and this was positioned as a bandwidth and memory saver.  We could certainly do better, but I think it's a problem bested "solved" by the platform itself.
Assignee: nobody → smirea
Also relevant to reducing bandwidth usage: Bug 885650 (session bloat).
See Also: → 885650
Attached patch De-prioritize cellular networks; (obsolete) — Splinter Review
After some discussions via the mailing list and IRC with :rnewman, :gps and :bsmedberg it was decided that, at least for b2g, it is a better approach to have FHR submit its data just once a week instead of daily, irrespective of network connection type - which solves a lot of problems.

To facilitate that and future implementations I implemented 2 modes for the DataReportingPolicy: one for b2g and one for everything else (currently) with the ability to add more in the future.
Attachment #787284 - Flags: review?(gps)
Status: NEW → ASSIGNED
Comment on attachment 787284 [details] [diff] [review]
De-prioritize cellular networks;

Review of attachment 787284 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good and near ready to land!

Would like rnewman's opinion on the backoff intervals, just for a 2nd opinion.

::: services/datareporting/policy.jsm
@@ +224,5 @@
>  /**
>   * Manages scheduling of Firefox Health Report data submission.
>   *
> + * Different schedulings (modes) will be used for different platforms.
> + * Currently supported: standard (desktop, android, etc), b2g

Android doesn't use this code base.

@@ +345,5 @@
>     *
>     * THERE ARE POTENTIAL LEGAL IMPLICATIONS OF CHANGING THIS VALUE. Check with
>     * Privacy and/or Legal before modifying.
>     */
> +  IMPLICIT_ACCEPTANCE_INTERVAL_MSEC: 8 * MILLISECONDS_PER_HOUR,

I thought you killed implicit acceptance!

@@ +418,5 @@
> +      SUBMISSION_FREQUENCY: 7 * MILLISECONDS_PER_DAY,
> +      FAILURE_BACKOFF_INTERVALS: [
> +        1 * MILLISECONDS_PER_HOUR,
> +        4 * MILLISECONDS_PER_HOUR,
> +        10 * MILLISECONDS_PER_HOUR,

I'd add more retries in 6 hour intervals for up to 6 days. This may seem counter productive. Let's think about it.

In the best case, we submit on first try and all is good.

If we don't have connection, we fail and keep failing. At this point, we already haven't reported for 7 days. So, each failed attempt means it is longer and longer before clients have reported in. I think we should make an attempt to resubmit in these scenarios. If we get lucky, hopefully the user will be online at the same time next week.

Since FHR records upload attempts, we can always mine the data from clients and prune or expand this list if it isn't working well.

@@ +432,5 @@
> +  set mode(name) {
> +    if (!(name in this.MODES)) {
> +      throw new Error('Unknown mode ' + name);
> +    }
> +    for (var key in this.MODES[name]) {

s/var/let

(don't use var in modern JS!)

::: services/datareporting/tests/xpcshell/test_policy.js
@@ +72,5 @@
> +    value: {},
> +  });
> +  policy.mode = 'test';
> +  do_check_eq(policyPrefs.get("dataSubmissionPolicyMode"), 'test');
> +  do_check_eq(policy.mode, 'test');

Nit: double quotes for string literals. It's what we do.
Attachment #787284 - Flags: review?(rnewman)
Attachment #787284 - Flags: review?(gps)
Attachment #787284 - Flags: feedback+
Comment on attachment 787284 [details] [diff] [review]
De-prioritize cellular networks;

Review of attachment 787284 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/datareporting/policy.jsm
@@ +36,5 @@
> +#ifdef MOZ_WIDGET_GONK
> +const DATAREPORTING_POLICY_MODE = 'b2g';
> +#else
> +const DATAREPORTING_POLICY_MODE = 'standard';
> +#endif

Pull this value from prefs rather than a constant, and use the usual application prefs mechanism to do the switch.

@@ +224,5 @@
>  /**
>   * Manages scheduling of Firefox Health Report data submission.
>   *
> + * Different schedulings (modes) will be used for different platforms.
> + * Currently supported: standard (desktop, android, etc), b2g

Also, let's not fall into the trap of equating devices with capabilities at this level. My high-end B2G phone on wifi beats your tethered Atom netbook running desktop Fx, etc.

Let's call these "daily" and "weekly", if that's what they mean, and then different channels can opt in to different default behaviors (and users have a theoretical possibility to switch between them).

@@ +398,5 @@
> +    standard: {
> +      SUBMISSION_FREQUENCY: 1 * MILLISECONDS_PER_DAY,
> +      /**
> +       * Our backoff schedule in case of submission failure.
> +       *

Lift this comment out of the item. Keep the items small and the MODES comment big.

@@ +418,5 @@
> +      SUBMISSION_FREQUENCY: 7 * MILLISECONDS_PER_DAY,
> +      FAILURE_BACKOFF_INTERVALS: [
> +        1 * MILLISECONDS_PER_HOUR,
> +        4 * MILLISECONDS_PER_HOUR,
> +        10 * MILLISECONDS_PER_HOUR,

Long answer, sorry. And forgive me if I go off on a tangent :)

Protocols include variable backoffs for good reasons: avoiding herds is one, but avoiding coinciding with external events (and adapting to external events of various durations) is another.

We definitely want a retry/backoff interval that avoids us failing just as you get to work in the morning, and thereafter always failing in the same spot. So yes, either a much shorter interval than 24h, or a non-factor of 24, or both.

And as to whether to retry for many days: there are two issues we're trying to avoid here.

One is that you go away for the weekend, fail for a day or two, and we don't upload until next week.

The other is that you go to Namibia for three months, and we flailingly retry every six hours until you get home.

I don't think we can encode both sets of behavior sanely in one table; we need adaptive backoff (and acceleration to recover from backoff).

So I'd suggest having three of days of sparse retries for weekly (probably not six days), and filing a follow-up to adapt more accurately: some users might be best served by hopping to the "monthly" or "fortnightly" upload approach (which perhaps waits a long while, then retries frequently for a whole day or two to try to catch the brief period you're online).

And beyond that is network state awareness, which we have on Android: why do all of this try/fail/poll stuff when some of it is concretely known to the OS, and we can register an observer to be told when the user reaches wifi?

@@ +873,5 @@
>      // If the system clock were ever set to a time in the distant future,
>      // it's possible our next schedule date is far out as well. We know
>      // we shouldn't schedule for more than a day out, so we reset the next
>      // scheduled date appropriately. 3 days was chosen arbitrarily.
> +    if (nextSubmissionDate.getTime() >= nowT + 2 * MILLISECONDS_PER_DAY + this.SUBMISSION_FREQUENCY) {

Comment/code mismatch.

@@ +1144,5 @@
>      return true;
>    },
>  
> +  /**
> +   * Moves the submission date forward one instance from now

s/instance/occurrence.

::: services/datareporting/tests/xpcshell/test_policy.js
@@ +162,5 @@
> +    writable: true,
> +    configurable: true,
> +    value: {
> +      foo: 'bar',
> +      bar: 'baz',

Double quotes.

@@ +168,5 @@
> +    }
> +  });
> +
> +  for (let mode of Object.keys(policy.MODES)) {
> +    dump("Checking mode: " + mode + "\n");

Do you need to keep this dump here?
Attachment #787284 - Flags: review?(rnewman) → feedback+
Blocks: 910963
Attached patch De-prioritize cellular networks; (obsolete) — Splinter Review
Implemented changes.

Regarding the backoff intervals I ended up doing 4 days' worth of checking and then rescheduling for next interval.

Also refactored the backoff test to be flexible and test everything.
Attachment #800515 - Flags: review?(rnewman)
Attachment #800515 - Flags: review?(gps)
Attachment #787284 - Attachment is obsolete: true
Comment on attachment 800515 [details] [diff] [review]
De-prioritize cellular networks;

Review of attachment 800515 [details] [diff] [review]:
-----------------------------------------------------------------

You should implement a prefs observer for frequency changes.

::: b2g/app/b2g.js
@@ +750,5 @@
>  // (only applies when marionette is disabled)
>  // 0 disables the timer.
>  pref("b2g.adb.timeout-hours", 12);
> +
> +#ifdef MOZ_WIDGET_GONK

Is this necessary in b2g/app?

@@ +751,5 @@
>  // 0 disables the timer.
>  pref("b2g.adb.timeout-hours", 12);
> +
> +#ifdef MOZ_WIDGET_GONK
> +// Set the datareporting interval less frequent on mobile devices.

"to be less frequent".

::: browser/app/profile/firefox.js
@@ +1294,5 @@
>  // Necko IPC security checks only needed for app isolation for cookies/cache/etc:
>  // currently irrelevant for desktop e10s
>  pref("network.disable.ipc.security", true);
> +
> +// Perform frequent data submissions on desktop

Nit: closing period.

::: services/datareporting/policy.jsm
@@ +176,5 @@
>   * Manages scheduling of Firefox Health Report data submission.
>   *
> + * Different schedulings (modes) will be used for different platforms.
> + * Currently supported: daily (desktop, etc), weekly (initially b2g).
> + * Unless otherwise defined in DATAREPORTING_POLICY_MODE, the default mode is 'daily'.

And yet the pref fetch defaults to weekly...

@@ +304,5 @@
> +   * Different policy modes, only one of which is in effect per instance.
> +   *
> +   * This was implemented to mainly make a distinction between desktop and
> +   * mobile platforms in order to implement different data submission rates and
> +   * backoff intervals where a constant internet connectinon is not for granted.

"This is primarily intended to allow mobile and desktop platforms to make different tradeoffs in the face of different network environments."

@@ +307,5 @@
> +   * mobile platforms in order to implement different data submission rates and
> +   * backoff intervals where a constant internet connectinon is not for granted.
> +   *
> +   * Each mode re-defines attributes of DataReportingPolicy which will be overwritten
> +   * when the policy mode changes

Closing period.
Attachment #800515 - Flags: review?(rnewman)
Attached patch De-prioritize cellular networks; (obsolete) — Splinter Review
Implemented requested changes.
Attachment #801753 - Flags: review?(rnewman)
Attachment #801753 - Flags: review?(gps)
Attachment #800515 - Attachment is obsolete: true
Attachment #800515 - Flags: review?(gps)
Comment on attachment 801753 [details] [diff] [review]
De-prioritize cellular networks;

Review of attachment 801753 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/datareporting/policy.jsm
@@ +175,5 @@
>   *
> + * Different schedulings (modes) will be used for different platforms.
> + * Currently supported: daily (desktop, etc), weekly (initially b2g).
> + * Unless otherwise defined in DATAREPORTING_POLICY_MODE, the default mode defaults
> + * to "daily" for desktop and "weekly" for android.

* Different schedulings (modes) are used for different platforms.
* Currently supported: "daily" (the default for desktop), "weekly"
* (the default for b2g).
* These modes are selected by specifying the value of the 
* `datareporting.policy.dataSubmissionPolicyMode` pref. If no pref 
* value is set, "weekly" is used.

@@ +242,5 @@
>    this._healthReportPrefs = healthReportPrefs;
>    this._listener = listener;
>    this._timer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> +  // We need to initialize the mode from the prefs.
> +  this.mode = this.mode;

I would rather this be done by splitting out the mode assignment stuff from the prefs write in the setter.

Right now you do two things that are dodgy:

* You read a value from prefs, and if it's bad you throw.
* You read then set, and the set writes to prefs. That writes your default into user prefs unnecessarily, not to mention the excess write.

Neither of these make sense during construction: you should read from prefs and initialize in-memory state, and if a bad value is read it should be logged and the default used.

The setter can do different things to handle runtime changes.

@@ +365,5 @@
> +    }
> +    for (let key in this.MODES[name]) {
> +      this[key] = this.MODES[name][key];
> +    }
> +    return this._prefs.set("dataSubmissionPolicyMode", name);

Setting here always is almost certainly not what you want. You'll be making that pref user-set immediately on startup. See my earlier comment about splitting this in two.
Attachment #801753 - Flags: review?(rnewman) → review-
Attached patch De-prioritize cellular networks; (obsolete) — Splinter Review
Implemented changes. Added an additional test in test_policy.js to cover the added fuctionality.
Attachment #803859 - Flags: review?(rnewman)
Attachment #803859 - Flags: review?(gps)
Attachment #801753 - Attachment is obsolete: true
Attachment #801753 - Flags: review?(gps)
Attached patch De-prioritize cellular networks; (obsolete) — Splinter Review
Nit: re-added automatically trimmmer whitespaces from firefox.js and policy.jsm. All other changes are the ones specified above
Attachment #803869 - Flags: review?(rnewman)
Attachment #803869 - Flags: review?(gps)
Attachment #803859 - Attachment is obsolete: true
Attachment #803859 - Flags: review?(rnewman)
Attachment #803859 - Flags: review?(gps)
Comment on attachment 803869 [details] [diff] [review]
De-prioritize cellular networks;

Review of attachment 803869 [details] [diff] [review]:
-----------------------------------------------------------------

This should get r+ once all the review points are addressed.

::: b2g/app/b2g.js
@@ +764,5 @@
>  pref("gfx.canvas.azure.backends", "skia");
>  pref("gfx.canvas.azure.accelerated", true);
> +
> +// Set the datareporting interval to be less frequent on mobile devices.
> +pref("datareporting.policy.dataSubmissionPolicyMode", "weekly");

This should be behind a #ifdef MOZ_DATA_REPORTING.

::: browser/app/profile/firefox.js
@@ +1305,5 @@
>  // currently irrelevant for desktop e10s
>  pref("network.disable.ipc.security", true);
> +
> +// Perform frequent data submissions on desktop.
> +pref("datareporting.policy.dataSubmissionPolicyMode", "daily");
\ No newline at end of file

Ditto.

::: services/datareporting/policy.jsm
@@ +29,5 @@
>  // is smaller than this, data upload will be disabled until the user is re-notified
>  // about the policy changes.
>  const DATAREPORTING_POLICY_VERSION = 1;
>  
> +const DEFAULT_DATAREPORTING_MODE = "weekly";

Why did you change the default?

@@ +256,5 @@
> +  } catch (ex) {
> +    this._log.error(ex.message);
> +    this._log.debug("Defaulting to mode: " + DEFAULT_DATAREPORTING_MODE);
> +    // We need to update prefs here so invalid mode doesn't cause future problems.
> +    this.mode = DEFAULT_DATAREPORTING_MODE;

Are you missing a call to this._setMode? If so, this likely represents a gap in test coverage!

@@ +728,5 @@
>      // it's possible our next schedule date is far out as well. We know
>      // we shouldn't schedule for more than a day out, so we reset the next
> +    // scheduled date appropriately. Two days after a full submission cycle
> +    // was chosen arbitrary.
> +    if (nextSubmissionDate.getTime() > nowT + 2 * MILLISECONDS_PER_DAY + this.SUBMISSION_FREQUENCY) {

We just went from 3 days, to 1, to 2. Let's keep it at > 1 x SUBMISSION_FREQUENCY.

::: services/datareporting/tests/xpcshell/test_policy.js
@@ +185,5 @@
> +  for (let mode of Object.keys(policy.MODES)) {
> +    dump("Checking mode: " + mode + "\n");
> +    policy.mode = mode;
> +    do_check_eq(policy.mode, mode);
> +    // Check too see that every property was set.

Nit: s/too/to.

@@ +188,5 @@
> +    do_check_eq(policy.mode, mode);
> +    // Check too see that every property was set.
> +    for (let key of Object.keys(policy.MODES[mode])) {
> +      dump("Checking for key equality: " + key + "\n");
> +      do_check_eq(policy[key], policy.MODES[mode][key]);

You could probably do without the dump()s.

@@ +197,5 @@
> +  try {
> +    policy.mode = "this-mode-does-not-exist-and-it-should-fail";
> +    do_throw("Setting an unknown mode should fail.");
> +  } catch (ex) {
> +    do_check_eq(policy.mode, "test");

Not sure why you need the do_throw and catch here.

@@ +228,5 @@
> +  prefs.set("dataSubmissionPolicyMode", "this-mode-does-not-exist-and-it-should-fail");
> +  let [policy, policyPrefs, hrPrefs, listener] = getPolicy(name, ExtendedPolicy);
> +
> +  do_check_eq(policy._setModeExceptions, 1);
> +  do_check_eq(policy.mode, "weekly");

Use the constant from the module.

@@ +465,3 @@
>  
> +      // Interval passed, uploading.
> +      now = new Date(policy.nextDataSubmissionDate.getTime());

Calculate using the policy's time + 1.
Attachment #803869 - Flags: review?(gps) → feedback+
Attached patch De-prioritize cellular networks; (obsolete) — Splinter Review
> > +const DEFAULT_DATAREPORTING_MODE = "weekly";
> 
> Why did you change the default?

Not sure if I get what you are referring to. I kept the default value, I just set it in a constant because I need to use it more than once

> > +    this.mode = DEFAULT_DATAREPORTING_MODE;
> 
> Are you missing a call to this._setMode? If so, this likely represents a gap in test coverage!

set mode() calls this._setMode() and saves prefs.

> > +  do_check_eq(policy.mode, "weekly");
> 
> Use the constant from the module.

This was actually intentional. I wanted to also check to see that the default mode is "weekly". If by any chance one would edit the DEFAULT_DATAREPORTING_MODE constant then this test should also be updated - as per the description on
policy.jsm:181

> * These modes are selected by specifying the value of the
> * `datareporting.policy.dataSubmissionPolicyMode` pref. If no pref
> * value is set, "weekly" is used.
Attachment #803994 - Flags: review?(rnewman)
Attachment #803994 - Flags: review?(gps)
Attachment #803869 - Attachment is obsolete: true
Attachment #803869 - Flags: review?(rnewman)
Comment on attachment 803994 [details] [diff] [review]
De-prioritize cellular networks;

Happy to have gps do the final review here.
Attachment #803994 - Flags: review?(rnewman)
Fixed merge conflicts with the previous 5 patches which this needs to be applied to

Green tbpl: https://tbpl.mozilla.org/?tree=Try&rev=1d6ed6cc08e3
Attachment #807554 - Flags: review?(gps)
Attachment #803994 - Attachment is obsolete: true
Attachment #803994 - Flags: review?(gps)
Comment on attachment 807554 [details] [diff] [review]
De-prioritize cellular networks;

Review of attachment 807554 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with default mode changed to daily. Remember to update the source comment later in policy.jsm!

::: services/datareporting/policy.jsm
@@ +36,5 @@
>    writable: false,
>    value: 1
>  });
>  
> +const DEFAULT_DATAREPORTING_MODE = "weekly";

I think the default reporting mode should be daily and you can remove the preference setting from firefox.js.

::: services/datareporting/tests/xpcshell/test_policy.js
@@ +490,5 @@
> +    // In the end, the submission should be rescheduled for the next interval.
> +    yield listener.lastDataRequest.onSubmissionFailureSoft();
> +    do_check_eq(policy.nextDataSubmissionDate.getTime(),
> +                policy._futureDate(policy.SUBMISSION_FREQUENCY).getTime());
> +  }

Nice rewrite of the test!
Attachment #807554 - Flags: review?(gps) → review+
Depends on: 850709
Blocks: 925587
Assignee: steven.mirea → nobody
Status: ASSIGNED → NEW
FHR is going away per bug 1209088, closing.
Telemetry will get separate uploader code for mobile, this can address mobile-specific concerns there.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.