Define and implement an API to expose GCM registration details to Fennec

RESOLVED FIXED in Firefox 47

Status

()

defect
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: nalexander, Assigned: nalexander)

Tracking

(Blocks 2 bugs, {dev-doc-complete})

Trunk
Firefox 47
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox44 affected, firefox47 fixed)

Details

Attachments

(7 attachments, 4 obsolete attachments)

58 bytes, text/x-review-board-request
rnewman
: review+
Details
58 bytes, text/x-review-board-request
rnewman
: review+
Details
58 bytes, text/x-review-board-request
rnewman
: review+
Details
58 bytes, text/x-review-board-request
Details
58 bytes, text/x-review-board-request
rnewman
: review+
Details
58 bytes, text/x-review-board-request
Details
58 bytes, text/x-review-board-request
rnewman
: review+
Details
After Bug 1207708 has Fennec building against GCM, we'll need to register as a GCM endpoint early in the Fennec lifecycle.  Further consumers, including any GCM / Mozilla Push bridge, will need to get the GCM registration details and act on them.

This ticket tracks defining and implementing an API for such consumers to use.
Blocks: 1179015
(In reply to Nick Alexander :nalexander from comment #2)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=ad0d8cc82abd

edwong: the Android API 11 build of the above patches should basically work.  It's configured to use an ec2 box over http://; you need dom.push.debug=true (set by default) to use it.  Adjust dom.push.loglevel (set to debug by default) to see things.  Use |adb shell setprop log.tag.GeckoPush DEBUG| to see Java-side debug messaging.  The serverURL above is custom, so using the -dev server won't really work.  Sorry, jrconlin and I are working on it.

With this, I can register for push, send empty payloads, and send encrypted payloads.  (This is tricky, since all the tools expect https and we're http for now.  I've done this manually to verify.)  It's not very well tested, but tell me some obvious issues.

Oh, just realized I removed the debug notification interface.  I'll add that back next week.

Thanks for all your help testing and debugging.
Flags: needinfo?(edwong)
Do you need testing on the build in comment #5, I'll assume so.
Flags: needinfo?(edwong)
Depends on: 1243855
Attachment #8723794 - Flags: review?(kcambridge)
Attachment #8723791 - Flags: review?(rnewman) → review+
Comment on attachment 8723791 [details]
MozReview Request: Bug 1207714 - Pre: Expose synchronous executor to product code. r=rnewman

https://reviewboard.mozilla.org/r/36725/#review33749
Attachment #8723789 - Flags: review?(rnewman) → review+
Comment on attachment 8723789 [details]
MozReview Request: Bug 1207714 - Post: Add rudimentary architecture documentation. r?rnewman

https://reviewboard.mozilla.org/r/36721/#review33751

Excellent docs.
Try build is at:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=53edd3699740

I used the following test pages:

* https://jrconlin.github.io/WebPushDataTestPage/
* https://serviceworke.rs/

In addition, this should work but I haven't tested it:

* https://tests.peter.sh/push-generator/

and this has been recently fixed, but again I haven't tested it:

* https://goroost.com/try-web-push
Known issues:

* we don't wake Gecko yet.
* tapping a notification starts Fennec, but doesn't navigate to the page correctly.  (This isn't Push specific.)
* I haven't finalized how to ask for permissions on Android 23, and how to ask the user to install Google Play Services if it's needed.  This might need UX love.
* there's a permissions bump to navigate here.

Sub-tickets to follow.
Comment on attachment 8723793 [details]
MozReview Request: Bug 1207714 - Part 3: Implement push manager. r?rnewman

https://reviewboard.mozilla.org/r/36729/#review33755

::: mobile/android/base/java/org/mozilla/gecko/push/PushClient.java:42
(Diff revision 1)
> +            if (result instanceof AutopushClientException) {

This approach is only correct if only one of the delegate methods is called. Otherwise, it's possible for a failure result to be silently replaced by a real result. I'm pretty sure that's not desirable, so perhaps check for `null` in the `handle*` methods?

Or make this private and/or document the assumptions.

::: mobile/android/base/java/org/mozilla/gecko/push/PushClient.java:65
(Diff revision 1)
> +        // XXX comments about form;

Mmm?

::: mobile/android/base/java/org/mozilla/gecko/push/PushManager.java:72
(Diff revision 1)
> +            return new HashMap<>();

I wonder if this is worth defining a read-only `EmptyMap` class, subclassing `AbstractMap`.

::: mobile/android/base/java/org/mozilla/gecko/push/PushManager.java:103
(Diff revision 1)
> +        Log.i(LOG_TAG, "Subscribing to channel for service: " + service + "; for profile named: " + profileName);

Leaking the service name?

::: mobile/android/base/java/org/mozilla/gecko/push/PushManager.java:142
(Diff revision 1)
> +        final PushSubscription subscription = registration.removeSubscription(chid);

I worry that we do this -- removing the local state -- before we successfully remove the remote state.

Doing it the other way around has its own issues, of course.

Is there a third way (e.g., persisting the registrations for channels that we need to unsubscribe from)?

Or can you add a comment about why I shouldn't be worried (e.g., without the local subscription we'll ignoring incoming messages, and after 60 days the server will expire the channel)?

::: mobile/android/base/java/org/mozilla/gecko/push/PushManager.java:150
(Diff revision 1)
> +            Log.e(LOG_TAG, "Subscription does not exist: " + chid);

did not

::: mobile/android/base/java/org/mozilla/gecko/push/PushManager.java:188
(Diff revision 1)
> +                    Log.i(LOG_TAG, "Push configuration autopushEndpoint changed! Was: " + registration.autopushEndpoint + "; now: " + endpoint);

I think your `debug` flag is inverted.

::: mobile/android/base/java/org/mozilla/gecko/push/PushManager.java:269
(Diff revision 1)
> +        } else if (registration.uaid.timestamp + TIME_BETWEEN_AUTOPUSH_UAID_REGISTRATION_IN_MILLIS < now) { // XXX check this constant.

Early return not else.

::: mobile/android/base/java/org/mozilla/gecko/push/PushManager.java:271
(Diff revision 1)
> +                // Logger.pii(LOG_TAG, "Stale uaid; re-registering with autopush endpoint: " + registration.autopushEndpoint);

We'll log nothing in debug mode…

::: mobile/android/base/java/org/mozilla/gecko/push/PushManager.java:279
(Diff revision 1)
> +            final long nextNow = System.currentTimeMillis();

:( :( :(

::: mobile/android/base/java/org/mozilla/gecko/push/PushManager.java:329
(Diff revision 1)
> +                    } else if (registration.uaid.timestamp + TIME_BETWEEN_AUTOPUSH_UAID_REGISTRATION_IN_MILLIS < now) {

I think we want telemetry (server-side or client-side) for how many registrations a particular user has, and how fresh they are.

I doubt we want to make a whole bunch of these requests at once just because a user had their tablet off while they went on vacation. If we have statistics to inform the decision, we might consider batch APIs or some kind of fuzzing here.

::: mobile/android/base/java/org/mozilla/gecko/push/PushManager.java:348
(Diff revision 1)
> +                        Log.w(LOG_TAG, "Startup: cannot advance registration for profileName: " + profileName + "; got permanent autopush error.  Removing registration entirely.", e);

Backoff handling needs to go in here somewhere.

If this is a server-initiated failure that might have a backoff, we should break out of the loop altogether. And if it's a transient network error that will apply to other registrations, we should give up early, too.

::: mobile/android/tests/background/junit4/src/org/mozilla/gecko/push/TestPushManager.java:140
(Diff revision 1)
> + 

Nit: WS

::: mobile/android/tests/background/junit4/src/org/mozilla/gecko/push/TestPushManager.java:258
(Diff revision 1)
> +        // Rather tautological.

Hey, it's test code, right?! :)
Attachment #8723793 - Flags: review?(rnewman) → review+
Comment on attachment 8723792 [details]
MozReview Request: Bug 1207714 - Part 2: Add storage for App-wide push state. r?rnewman

https://reviewboard.mozilla.org/r/36727/#review34059

::: mobile/android/base/java/org/mozilla/gecko/push/Fetched.java:44
(Diff revision 1)
> +        }

`putNull` in the other case?
Attachment #8723792 - Flags: review+
Attachment #8723794 - Flags: review?(rnewman) → review+
Comment on attachment 8723794 [details]
MozReview Request: Bug 1207714 - Implement push service; integrate against Gecko. r?rnewman,kitcambridge

https://reviewboard.mozilla.org/r/36731/#review34063

Lots of TODOs. Don't lose track of them.

::: dom/push/PushService.jsm:25
(Diff revision 1)
> +XPCOMUtils.defineLazyModuleGetter(this, "PushServiceHttp2",

Don't even define these if the widget toolkit is Android.

::: dom/push/PushService.jsm:31
(Diff revision 1)
> +const CONNECTION_PROTOCOLS = AppConstants.MOZ_WIDGET_TOOLKIT != 'android'

I'd rather reverse this conditional.

::: dom/push/PushServiceAndroidGCM.jsm:135
(Diff revision 1)
> +          rs: rs

Nit: trailing comma.

::: dom/push/PushServiceAndroidGCM.jsm:138
(Diff revision 1)
> +        message = base64UrlDecode(data.message);

There's no HMAC?!

::: dom/push/PushServiceAndroidGCM.jsm:224
(Diff revision 1)
> +      console.debug("Ignoring unrecognized request action:", action);

Throw instead?

::: dom/push/PushServiceAndroidGCM.jsm:248
(Diff revision 1)
> +            p256dhPrivateKey: exportedKeys[1]

Trailing comma.

::: dom/push/PushServiceAndroidGCM.jsm:249
(Diff revision 1)
> +          })

Trailing semicolon.

::: dom/push/PushServiceAndroidGCM.jsm:259
(Diff revision 1)
> +  }

Trailing comma.

::: dom/push/PushServiceAndroidGCM.jsm:284
(Diff revision 1)
> +  register.channelID = this.channelID;

Oh god JS `this` makes me cringe.

::: mobile/android/base/java/org/mozilla/gecko/gcm/GcmTokenClient.java:29
(Diff revision 1)
> +    private final HashMap<String, String> tokenCache = new HashMap<>();

Thread safety?

::: mobile/android/base/java/org/mozilla/gecko/push/PushService.java:89
(Diff revision 1)
> +            Log.e(LOG_TAG, "Got exception during startup; ignoring.", e);

Sigh.

::: mobile/android/base/java/org/mozilla/gecko/push/PushService.java:101
(Diff revision 1)
> +            Log.e(LOG_TAG, "Got exception during startup; ignoring.", e);

during refresh.

::: mobile/android/base/java/org/mozilla/gecko/push/PushService.java:205
(Diff revision 1)
> +            Log.e(LOG_TAG, "callback must not be null in " + event);

These can be Log.e(LOG_TAG, " ", event) now, right?

::: mobile/android/base/java/org/mozilla/gecko/push/PushService.java:220
(Diff revision 1)
> +            } else if ("PushServiceAndroidGCM:DumpRegistration".equals(event)) {

Prefer early return for all of these.
Happy for this to iterate in tree, but make sure to walk your "XXX" and "TODO" comments. And land it without the GeckoApp integration (perhaps also without the jsm?) so you can get this out of patches without having to worry if it's complete and correct.
Assignee: nobody → nalexander
Status: NEW → ASSIGNED
(In reply to Nick Alexander :nalexander from comment #19)
> Known issues:
> 
> * we don't wake Gecko yet.

Filed Bug 1252650.

> * tapping a notification starts Fennec, but doesn't navigate to the page
> correctly.  (This isn't Push specific.)

This is Bug 1229835.

> * I haven't finalized how to ask for permissions on Android 23, and how to
> ask the user to install Google Play Services if it's needed.  This might
> need UX love.

So, the permission appears not to be an issue on Android 23 -- the new permission appears to not have a "permission group", so it doesn't require explicit approval from the user.  That's nice!  We'll still want to to test this.

On day 1, I think we'll just fail if Google Play Services isn't available.  We can try to be more clever later.

> * there's a permissions bump to navigate here.

The bump is tracked in Bug 1221678.
https://reviewboard.mozilla.org/r/36729/#review33755

I'm commenting on these here, and I've committed them as a separate patch since I've reworked the underlying code.

> This approach is only correct if only one of the delegate methods is called. Otherwise, it's possible for a failure result to be silently replaced by a real result. I'm pretty sure that's not desirable, so perhaps check for `null` in the `handle*` methods?
> 
> Or make this private and/or document the assumptions.

I've documented the assumptions, and guarded the setters to set at most once.

> Mmm?

Commented through-out.  This refered to the odd executor / callback / imperative blending, which I've noted.

> I wonder if this is worth defining a read-only `EmptyMap` class, subclassing `AbstractMap`.

I'll use https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#EMPTY_MAP

> Leaking the service name?

The service name is something like "webpush" or "sync" or "fxa"; I'm not concerned about this.  It's not the WebPush API consuming service name (origin).

> I worry that we do this -- removing the local state -- before we successfully remove the remote state.
> 
> Doing it the other way around has its own issues, of course.
> 
> Is there a third way (e.g., persisting the registrations for channels that we need to unsubscribe from)?
> 
> Or can you add a comment about why I shouldn't be worried (e.g., without the local subscription we'll ignoring incoming messages, and after 60 days the server will expire the channel)?

This is the only sensible way to do it -- upstream may actually be gone entirely -- and I will add a comment explaining the latter.

> did not

Fixed.

> I think your `debug` flag is inverted.

Gah.

> We'll log nothing in debug mode…

Updated.

> :( :( :(

This is all based on local clocks, and I see no way around that.  In the future we could try to get skew from something else (push messages?)...

> I think we want telemetry (server-side or client-side) for how many registrations a particular user has, and how fresh they are.
> 
> I doubt we want to make a whole bunch of these requests at once just because a user had their tablet off while they went on vacation. If we have statistics to inform the decision, we might consider batch APIs or some kind of fuzzing here.

Most users will have exactly one registration for their default profile.  The service can provide the registration freshness and subscription count easily.  This almost always looks like one request per month / week :)

> Backoff handling needs to go in here somewhere.
> 
> If this is a server-initiated failure that might have a backoff, we should break out of the loop altogether. And if it's a transient network error that will apply to other registrations, we should give up early, too.

There's no real need for backoff here.  This is all happening in response to life-cycle events (App startup) or in response to web sites asking for Push subscriptions.  There's no real need to throttle, at least not up front.
Attachment #8723791 - Attachment description: MozReview Request: Bug 1207714 - Pre: Expose synchronous executor to product code. r?rnewman → MozReview Request: Bug 1207714 - Pre: Expose synchronous executor to product code. r=rnewman
Comment on attachment 8723791 [details]
MozReview Request: Bug 1207714 - Pre: Expose synchronous executor to product code. r=rnewman

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/36725/diff/1-2/
Comment on attachment 8723792 [details]
MozReview Request: Bug 1207714 - Part 2: Add storage for App-wide push state. r?rnewman

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/36727/diff/1-2/
Attachment #8723792 - Attachment description: MozReview Request: Bug 1207714 - State. → MozReview Request: Bug 1207714 - Part 2: Add storage for App-wide push state. r?rnewman
Comment on attachment 8723793 [details]
MozReview Request: Bug 1207714 - Part 3: Implement push manager. r?rnewman

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/36729/diff/1-2/
Attachment #8723793 - Attachment description: MozReview Request: Bug 1207714 - Implement push manager. r?rnewman → MozReview Request: Bug 1207714 - Part 3: Implement push manager. r?rnewman
Attachment #8723789 - Attachment is obsolete: true
Attachment #8723790 - Attachment is obsolete: true
Attachment #8723794 - Attachment is obsolete: true
Attachment #8723795 - Attachment is obsolete: true
rnewman: OK, I've taken the Android pieces here, cleaned them further, and addressed your review comments (mostly in a separate patch, which I'll fold).

This maintains purely Android GCM registration in the Application start-up.  We'll pursue the Gecko integration in a separate ticket.  I think this is ready to land but I'd like a last once over, since so much has moved around.  Thanks!
Blocks: 1253106
(In reply to Nick Alexander :nalexander from comment #27)

> There's no real need for backoff here.  This is all happening in response to
> life-cycle events (App startup) or in response to web sites asking for Push
> subscriptions.  There's no real need to throttle, at least not up front.

Yeah, that's kinda my fear -- if this goes live for our entire install base, and we retry on each launch if the last one failed, then that's (launches per day * 7) times the expected load compared to weekly refresh.

For 1M users launching 3x per day, we'd go from 1M requests/week to 21M requests/week. Multiply that by our ADI in millions.

Backoff isn't for clients -- it's for the service.
Blocks: 1253113
(In reply to Richard Newman [:rnewman] from comment #36)
> (In reply to Nick Alexander :nalexander from comment #27)
> 
> > There's no real need for backoff here.  This is all happening in response to
> > life-cycle events (App startup) or in response to web sites asking for Push
> > subscriptions.  There's no real need to throttle, at least not up front.
> 
> Yeah, that's kinda my fear -- if this goes live for our entire install base,
> and we retry on each launch if the last one failed, then that's (launches
> per day * 7) times the expected load compared to weekly refresh.
> 
> For 1M users launching 3x per day, we'd go from 1M requests/week to 21M
> requests/week. Multiply that by our ADI in millions.
> 
> Backoff isn't for clients -- it's for the service.

Yeah, I suppose you're right.  I considered this, decided it wasn't necessary, and now agree it is: Bug 1253113.
Comment on attachment 8725958 [details]
MozReview Request: Bug 1207714 - Part 1: Register no-op GCM message listeners. r?rnewman

https://reviewboard.mozilla.org/r/37711/#review34549
Attachment #8725958 - Flags: review?(rnewman) → review+
Comment on attachment 8725961 [details]
MozReview Request: Bug 1207714 - Part 4: Add singleton PushService. r?rnewman

https://reviewboard.mozilla.org/r/37715/#review34551

Good enough for me.

::: mobile/android/base/java/org/mozilla/gecko/GeckoApplication.java:170
(Diff revision 1)
> +            // TODO: only run in main process.

ThreadUtils.assertIsOnMainThread()
Attachment #8725961 - Flags: review?(rnewman) → review+
Depends on: 1253876
You need to log in before you can comment on or make changes to this bug.