Closed Bug 828654 Opened 7 years ago Closed 6 years ago

Android service uploader for Firefox Health Report payloads

Categories

(Firefox Health Report Graveyard :: Client: Android, defect, P1)

ARM
Android
defect

Tracking

(firefox22 unaffected, firefox23+ fixed, firefox24+ fixed)

VERIFIED FIXED
Firefox 24
Tracking Status
firefox22 --- unaffected
firefox23 + fixed
firefox24 + fixed

People

(Reporter: rnewman, Assigned: nalexander)

References

()

Details

(Whiteboard: [qa!])

Attachments

(2 files, 9 obsolete files)

84.10 KB, patch
rnewman
: review+
Details | Diff | Splinter Review
67.19 KB, text/plain
Details
This will mean:

* Building an Android Bagheera client (a la Bug 802914).
* Modifying FHR on Fennec to serialize to disk, not upload, and leave enough details for a separate timed service to perform the upload.
* Implementing an uploader that's aware of the usual: background data setting, timers, etc. The Product Announcements code (Bug 793053) is a good example.

A good follow-up would be to implement a non-Gecko serializer that can assemble the report close to submission time, without instantiating GeckoApp.
We need another bug for Fennec FHR UI (settings, data upload notification UI), and probably some additional work to make sure we get data to upload in a timely fashion, particularly on mobile where we typically have very short-lived sessions with no idle periods.

In the future, we might want to avoid at-browse-time document composition entirely, which will involve more work.
Assignee: nobody → liuche
Status: NEW → ASSIGNED
Blocks: 829887
No longer blocks: 718066
Depends on: 833625
Comment on attachment 708254 [details]
Part 0: enabling and packaging data reporting

(snafu, wrong bug...)
Attachment #708254 - Attachment is obsolete: true
Attachment #708254 - Attachment is patch: false
Depends on: 838879
Depends on: 840127
Depends on: 840128
Depends on: 840133
This is a little further down the line (see dependencies), so unassigning for now.
Assignee: liuche → nobody
Status: ASSIGNED → NEW
Depends on: 840129
No longer depends on: 798043
Depends on: 848852
Priority: -- → P3
Assignee: nobody → nalexander
Priority: P3 → P1
Component: Metrics and Firefox Health Report → Client: Desktop
Product: Mozilla Services → Firefox Health Report
Component: Client: Desktop → Client: Android
No longer depends on: 848852
Depends on: 858742, 868274
No longer depends on: 833625, 858742, 868274
Blocks: 868442
Assignee: nalexander → nobody
Comment on attachment 758968 [details] [diff] [review]
Part 0: Move uses-permission into correct manifest snippet. r=rnewman

Review of attachment 758968 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not setting bug flags on 5 patches with this ridiculous interface.  There are some small Fennec pieces that aren't present on github -- sorry that these don't stand out more.
Attachment #758968 - Flags: review?(rnewman)
Depends on: 880127
Assignee: nobody → nalexander
Status: NEW → ASSIGNED
Blocks: 870925
Attachment #758971 - Attachment is obsolete: true
Attachment #758972 - Attachment is obsolete: true
Comment on attachment 761226 [details] [diff] [review]
Bug 840127, Bug 828654 - Part 4: Implement Android Health Report submission. r=rnewman

Okay, this is everything remaining from a-s github.  For landing, I'd like to see everything squashed down to single commit in m-c, and I don't care about a-s.

Don't forget https://github.com/ncalexan/healthreport-upload-test-addon.
Attachment #761226 - Flags: review?(rnewman)
Attachment #758968 - Flags: review?(rnewman)
Attachment #758968 - Flags: review+
Attachment #758968 - Flags: approval-mozilla-aurora?
Attachment #758969 - Flags: review+
Attachment #758969 - Flags: approval-mozilla-aurora?
Attachment #758970 - Flags: review+
Comment on attachment 761224 [details] [diff] [review]
Part 3: Health Report Upload service start/stop/schedule boilerplate. r=rnewman

Review of attachment 761224 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/GeckoPreferences.java
@@ +304,5 @@
>          intent.putExtra("pref", pref);
>          intent.putExtra("branch", GeckoApp.PREFS_NAME);
>          intent.putExtra("enabled", value);
> +
> +        GeckoProfile profile = GeckoProfile.get(context);

Note for the future: this is another one of those order-of-access-to-GeckoProfile warning spots.

::: mobile/android/base/background/healthreport/HealthReportConstants.java.in
@@ +17,5 @@
>  
> +  // Not `final` so we have the option to turn this on at runtime with a magic addon.
> +  public static boolean UPLOAD_FEATURE_DISABLED = false;
> +  public static long MILLISECONDS_PER_DAY = 24 * 60 * 60 * 1000;
> +  public static long MILLISECONDS_PER_SIX_MONTHS = 180 * MILLISECONDS_PER_DAY;

Man, we have a lot of places that define MILLISECONDS_PER_DAY. Oh, Java, how your cowpaths are not paved.

::: mobile/android/base/background/healthreport/upload/HealthReportBroadcastService.java
@@ +44,5 @@
> +  protected void toggleAlarm(final Context context, String profileName, String profilePath, boolean enabled) {
> +    Logger.info(LOG_TAG, (enabled ? "R" : "Unr") + "egistering health report start broadcast receiver...");
> +
> +    // PendingIntents are compared without reference to their extras. Therefore
> +    // even though we pass the profile details to in the action, different

s/in //

@@ +46,5 @@
> +
> +    // PendingIntents are compared without reference to their extras. Therefore
> +    // even though we pass the profile details to in the action, different
> +    // profiles will share the *same* pending intent. In a multi-profile future,
> +    // this will need to be addressed.

File a bug and tie it into [multiprofile] or whatever our whiteboard tag is.

::: mobile/android/services/manifests/HealthReportAndroidManifest_services.xml.in
@@ +6,4 @@
>            -->
> +        <service
> +		        android:exported="false"
> +		        android:name="org.mozilla.gecko.background.healthreport.upload.HealthReportBroadcastService" >

Something's wrong with this indenting.
Attachment #761224 - Flags: review+
Comment on attachment 761226 [details] [diff] [review]
Bug 840127, Bug 828654 - Part 4: Implement Android Health Report submission. r=rnewman

Review of attachment 761226 [details] [diff] [review]:
-----------------------------------------------------------------

I'm going to re-review this after eating.

::: mobile/android/base/background/bagheera/BagheeraClient.java
@@ +236,1 @@
>                                            String obsoleteDocumentID,

Incorrect indenting, Batman.

::: mobile/android/base/background/healthreport/HealthReportConstants.java.in
@@ +14,5 @@
>     * Used for sanity checks.
>     */
>    public static final long EARLIEST_LAST_PING = 1367500000000L;
>  
> +  public static final long MILLISECONDS_PER_DAY = 24 * 60 * 60 * 1000;

!!!!

::: mobile/android/base/background/healthreport/HealthReportUtils.java
@@ +154,5 @@
> +    }
> +    try {
> +      return ExtendedJSONObject.parseJSONObject(s);
> +    } catch (Exception e) {
> +      return new ExtendedJSONObject();

I ain't never seen a better candidate for a li'l log message.

@@ +159,5 @@
> +    }
> +  }
> +
> +  public static void setObsoleteIds(SharedPreferences sharedPrefs, ExtendedJSONObject ids) {
> +    Logger.warn("FOO", "setObsoleteIds " + ids.toJSONString());

Ahem.

@@ +160,5 @@
> +  }
> +
> +  public static void setObsoleteIds(SharedPreferences sharedPrefs, ExtendedJSONObject ids) {
> +    Logger.warn("FOO", "setObsoleteIds " + ids.toJSONString());
> +    sharedPrefs.edit().putString(HealthReportConstants.PREF_OBSOLETE_DOCUMENT_IDS_TO_DELETION_ATTEMPTS_REMAINING, ids.toString()).commit();

Break this line.

::: mobile/android/base/background/healthreport/upload/AndroidSubmissionClient.java
@@ +46,5 @@
> +    return getSharedPreferences().getString(HealthReportConstants.PREF_DOCUMENT_SERVER_NAMESPACE, HealthReportConstants.DEFAULT_DOCUMENT_SERVER_NAMESPACE);
> +  }
> +
> +  public long getLastUploadLocalTime() {
> +    return getSharedPreferences().getLong(HealthReportConstants.PREF_LAST_UPLOAD_LOCAL_TIME, 0);

Might as well do 0L.

::: mobile/android/base/background/healthreport/upload/SubmissionPolicy.java
@@ +96,5 @@
> +        // We don't care what the order is, but let's make testing easier by
> +        // being deterministic. Deleting in random order might avoid failing too
> +        // many times in succession, but we expect only a single pending delete
> +        // in practice.
> +        obsoleteId = Collections.min(ids.keySet());

Telephony services guy says "round robin to avoid blocking for five retries on a bad record", but I accept the expediency here.

@@ +108,5 @@
> +      }
> +
> +      Editor editor = editor();
> +      editor.setLastDeleteRequested(localTime); // Write committed by delegate.
> +      client.delete(localTime, obsoleteId, new DeleteDelegate(editor));

Fuck you, Editor, for not having an isCommitted() method.
> > +    Logger.warn("FOO", "setObsoleteIds " + ids.toJSONString());
> 
> Ahem.

Good lord.  This was some debugging output that was rolled back in a-s but not in m-c.  Can you tell I was tired and unable to manage two VCSs?
Comment on attachment 761226 [details] [diff] [review]
Bug 840127, Bug 828654 - Part 4: Implement Android Health Report submission. r=rnewman

Review of attachment 761226 [details] [diff] [review]:
-----------------------------------------------------------------

Pretty happy with this, so just clear up those nits.

Thanks for pushing on this.

::: mobile/android/base/background/bagheera/BagheeraClient.java
@@ +236,1 @@
>                                            String obsoleteDocumentID,

Incorrect indenting, Batman.

::: mobile/android/base/background/healthreport/HealthReportConstants.java.in
@@ +14,5 @@
>     * Used for sanity checks.
>     */
>    public static final long EARLIEST_LAST_PING = 1367500000000L;
>  
> +  public static final long MILLISECONDS_PER_DAY = 24 * 60 * 60 * 1000;

!!!!

::: mobile/android/base/background/healthreport/HealthReportUtils.java
@@ +154,5 @@
> +    }
> +    try {
> +      return ExtendedJSONObject.parseJSONObject(s);
> +    } catch (Exception e) {
> +      return new ExtendedJSONObject();

I ain't never seen a better candidate for a li'l log message.

@@ +159,5 @@
> +    }
> +  }
> +
> +  public static void setObsoleteIds(SharedPreferences sharedPrefs, ExtendedJSONObject ids) {
> +    Logger.warn("FOO", "setObsoleteIds " + ids.toJSONString());

Ahem.

@@ +160,5 @@
> +  }
> +
> +  public static void setObsoleteIds(SharedPreferences sharedPrefs, ExtendedJSONObject ids) {
> +    Logger.warn("FOO", "setObsoleteIds " + ids.toJSONString());
> +    sharedPrefs.edit().putString(HealthReportConstants.PREF_OBSOLETE_DOCUMENT_IDS_TO_DELETION_ATTEMPTS_REMAINING, ids.toString()).commit();

Break this line.

::: mobile/android/base/background/healthreport/upload/AndroidSubmissionClient.java
@@ +46,5 @@
> +    return getSharedPreferences().getString(HealthReportConstants.PREF_DOCUMENT_SERVER_NAMESPACE, HealthReportConstants.DEFAULT_DOCUMENT_SERVER_NAMESPACE);
> +  }
> +
> +  public long getLastUploadLocalTime() {
> +    return getSharedPreferences().getLong(HealthReportConstants.PREF_LAST_UPLOAD_LOCAL_TIME, 0);

Might as well do 0L.

::: mobile/android/base/background/healthreport/upload/HealthReportBroadcastService.java
@@ +47,5 @@
> +    return getSharedPreferences().getLong(HealthReportConstants.PREF_DELETION_ATTEMPTS_PER_OBSOLETE_DOCUMENT_ID, HealthReportConstants.DEFAULT_DELETION_ATTEMPTS_PER_OBSOLETE_DOCUMENT_ID);
> +  }
> +
> +  // <code>enabled</code> is whether the user has enabled uploading health
> +  // reports; <code>serviceEnabled</code> is whether we should try to upload OR delete.

Either make this a Javadoc comment, or kill the embedded HTML.

::: mobile/android/base/background/healthreport/upload/SubmissionPolicy.java
@@ +96,5 @@
> +        // We don't care what the order is, but let's make testing easier by
> +        // being deterministic. Deleting in random order might avoid failing too
> +        // many times in succession, but we expect only a single pending delete
> +        // in practice.
> +        obsoleteId = Collections.min(ids.keySet());

Telephony services guy says "round robin to avoid blocking for five retries on a bad record", but I accept the expediency here.

@@ +108,5 @@
> +      }
> +
> +      Editor editor = editor();
> +      editor.setLastDeleteRequested(localTime); // Write committed by delegate.
> +      client.delete(localTime, obsoleteId, new DeleteDelegate(editor));

Fuck you, Editor, for not having an isCommitted() method.

@@ +176,5 @@
> +
> +    @Override
> +    public void onSoftFailure(long localTime, String id, String reason, Exception e) {
> +      int failuresToday = getCurrentDayFailureCount();
> +      Logger.warn(LOG_TAG, "Soft failure reported at " + localTime + ": " + reason + " Previously failed " + failuresToday + " today.", e);

I kinda don't want to log a stacktrace on a soft failure. Ponder this.

@@ +224,5 @@
> +      editor
> +        .setNextSubmission(next)
> +        .setLastDeleteFailed(localTime)
> +        .commit();
> +      removeObsoleteId(id);

Let's remove the obsolete ID first.

@@ +240,5 @@
> +      editor
> +        .setNextSubmission(next)
> +        .setLastDeleteSucceeded(localTime)
> +        .commit();
> +      removeObsoleteId(id);

Likewise.

@@ +270,5 @@
> +  public long getMaximumFailuresPerDay() {
> +    return getSharedPreferences().getLong(HealthReportConstants.PREF_MAXIMUM_FAILURES_PER_DAY, HealthReportConstants.DEFAULT_MAXIMUM_FAILURES_PER_DAY);
> +  }
> +
> +  // Authoritative.

What exactly do you mean by that?

@@ +439,5 @@
> +    } catch (ClassCastException e) {
> +      Logger.info(LOG_TAG, "Got exception decrementing obsolete ids counter.", e);
> +    }
> +
> +    Logger.warn("FOO", "decrementObsoleteIdAttempts " + ids.toJSONString());

Foo, eh?
Attachment #761226 - Flags: review?(rnewman) → review+
Nick, if you get to this before me, please tackle nits and fold down to a single patch, then upload it and obsolete these parts.

QA overview:

We'll get a final build of this tomorrow.

FHR should upload once per day, 24 hours after first install. The document uploaded should be freshly generated each time. If an upload fails for a transient reason -- e.g., a network failure -- it should retry in one hour, for a maximum of two retries, and then resume 22 hours after the second retry. If an upload fails for a significant reason (server rejection, problem generating document), it should wait until the next day.

The hourly check should resume after reboot, after SD card reinsertion, and on launching Fennec.

After a successful upload, disabling FHR in Fennec > Settings > Data Choices > FHR should cause a deletion request to be issued.

See Comment 14 for a testing add-on that should reduce the intervals to a bearable duration.
Whiteboard: [qa+]
(In reply to Richard Newman [:rnewman] from comment #20)
> Nick, if you get to this before me, please tackle nits and fold down to a
> single patch, then upload it and obsolete these parts.
> 
> QA overview:
> 
> We'll get a final build of this tomorrow.
> 
> FHR should upload once per day, 24 hours after first install. The document
> uploaded should be freshly generated each time. If an upload fails for a
> transient reason -- e.g., a network failure -- it should retry in one hour,
> for a maximum of two retries, and then resume 22 hours after the second
> retry. If an upload fails for a significant reason (server rejection,
> problem generating document), it should wait until the next day.

This is correct, except we "resume 24 hours after the second retry".  I believe this is the same as desktop.  It is possible that the second retry is 23 hours after the first; doing the blanket "24 hours after" saves fiddling with a counter.
Attachment #758968 - Attachment is obsolete: true
Attachment #758969 - Attachment is obsolete: true
Attachment #758970 - Attachment is obsolete: true
Attachment #761224 - Attachment is obsolete: true
Attachment #761226 - Attachment is obsolete: true
Attachment #758968 - Flags: approval-mozilla-aurora?
Attachment #758969 - Flags: approval-mozilla-aurora?
Attachment #761513 - Flags: review?(rnewman)
Depends on: 811358
(In reply to Nick Alexander :nalexander from comment #22)
> Created attachment 761513 [details] [diff] [review]
> Android service for submitting Firefox Health Report payloads. r=rnewman

Don't land this quite yet, I think some tabs slipped back in.
Fixed tabs that slipped back in.
Attachment #761513 - Attachment is obsolete: true
Attachment #761513 - Flags: review?(rnewman)
Attachment #761630 - Flags: review?(rnewman)
Comment on attachment 761630 [details] [diff] [review]
Android service for submitting Firefox Health Report payloads. r=rnewman

Review of attachment 761630 [details] [diff] [review]:
-----------------------------------------------------------------

Will bake this on m-c before actually doing the uplift, but flagging nonetheless.
Attachment #761630 - Flags: review?(rnewman)
Attachment #761630 - Flags: review+
Attachment #761630 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/integration/mozilla-inbound/rev/4ae3e1cede0a

See Comment 20 for QA.
Target Milestone: --- → Firefox 24
https://hg.mozilla.org/mozilla-central/rev/4ae3e1cede0a
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
See Also: → 882839
We'll approve on Monday barring any critical issues.
Waiting on QA.
Flags: needinfo?(twalker)
Besides bug 883915 on 2.2, this looks good.  I am getting successful uploads around the expected testing interval.  Soft failure retries at expected interval and on  hitting hard failure it delays next retry correctly.
Depends on: 883915
Flags: needinfo?(twalker)
(In reply to Tracy Walker [:tracy] from comment #30)
> Besides bug 883915 on 2.2, this looks good.  I am getting successful uploads
> around the expected testing interval.  Soft failure retries at expected
> interval and on  hitting hard failure it delays next retry correctly.

Can you try this again on 2.2, and post as much log as you can (all the way through hard failure)?  Checking the code, we're hitting the same path we hit on Sync, so I expect this to Just Work.
interestingly, I don't see the protocol issue on rerun:

06-17 13:04:39.029: I/GeckoBrowserApp(404): menu item clicked
06-17 13:04:39.029: D/GeckoSetPrefs(404): Set prefs to https://fhr.data.mozilla.com, 60000
06-17 13:04:39.139: D/GeckoSetPrefs(404): Committed.
06-17 13:04:39.139: D/GeckoPreferences(404): Broadcast: org.mozilla.fennec.HEALTHREPORT_UPLOAD_PREF, android.not_a_preference.healthreport.uploadEnabled, GeckoApp, true
06-17 13:04:39.159: D/GeckoSetPrefs(404): Broadcast pref.
06-17 13:04:39.179: I/GeckoHealth(404): fennec :: HealthReportBroadcastService :: Registering health report start broadcast receiver.
06-17 13:04:39.259: I/GeckoHealth(404): fennec :: BackgroundService :: Setting inexact repeating alarm for interval 60000
06-17 13:04:39.319: I/GeckoHealth(404): fennec :: SubmissionPolicy :: Need to wait 120000 before first upload.
06-17 13:06:39.319: I/GeckoHealth(404): fennec :: GeckoHealthGen :: Generating FHR document from 1355940399322; last ping 1367500000000
06-17 13:06:39.329: I/GeckoHealth(404): fennec :: GeckoProfileInfo :: Restoring ProfileInformationCache from file.
06-17 13:06:40.269: W/GeckoHealth(404): fennec :: SubmissionPolicy :: Soft failure reported at 1371492399322: Got exception during upload. Previously failed 0 today.
06-17 13:06:40.289: I/GeckoHealth(404): fennec :: SubmissionPolicy :: Retrying upload at 1371492459322.
06-17 13:08:39.319: I/GeckoHealth(404): fennec :: GeckoHealthGen :: Generating FHR document from 1355940519315; last ping 1367500000000
06-17 13:08:39.319: I/GeckoHealth(404): fennec :: GeckoProfileInfo :: Restoring ProfileInformationCache from file.
06-17 13:08:39.979: W/GeckoHealth(404): fennec :: SubmissionPolicy :: Soft failure reported at 1371492519315: Got exception during upload. Previously failed 1 today.
06-17 13:08:39.989: I/GeckoHealth(404): fennec :: SubmissionPolicy :: Retrying upload at 1371492579315.
06-17 13:10:39.379: I/GeckoHealth(404): fennec :: GeckoHealthGen :: Generating FHR document from 1355940639385; last ping 1367500000000
06-17 13:10:39.379: I/GeckoHealth(404): fennec :: GeckoProfileInfo :: Restoring ProfileInformationCache from file.
06-17 13:10:40.019: W/GeckoHealth(404): fennec :: SubmissionPolicy :: Soft failure reported at 1371492639385: Got exception during upload. Previously failed 2 today.
06-17 13:10:40.029: W/GeckoHealth(404): fennec :: SubmissionPolicy :: Hard failure reported at 1371492639385: Reached the limit of daily upload attempts. Next upload at 1371492879385.
(In reply to Tracy Walker [:tracy] from comment #32)
> interestingly, I don't see the protocol issue on rerun:

But we're not successfully uploading.  I don't have a 2.2 device, so I'll get you a build with more detailed logging shortly.
from the time the addon was enabled through hard failure and device sleep
Depends on: 884008
Is this worth uplifting in current state? comment 33 sounds like things may not be working as expected.
(In reply to Alex Keybl [:akeybl] from comment #36)
> Is this worth uplifting in current state? comment 33 sounds like things may
> not be working as expected.

Sorry, haven't updated the ticket recently.  Comment 33 has been resolved by Bug 884008.  I believe :tracy will sign off on this today.
Yes, the issue I saw earlier has been corrected by the server side fix.  This looks good to me.
Status: RESOLVED → VERIFIED
Whiteboard: [qa+] → [qa!]
Attachment #761630 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Duplicate of this bug: 840127
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.