Closed Bug 822712 (simple-push-b2g) Opened 11 years ago Closed 11 years ago

Simple Push notifications for B2G

Categories

(Firefox OS Graveyard :: General, defect)

defect
Not set
normal

Tracking

(blocking-b2g:-, firefox22 fixed, b2g18+ fixed)

RESOLVED FIXED
blocking-b2g -
Tracking Status
firefox22 --- fixed
b2g18 + fixed

People

(Reporter: nsm, Assigned: nsm)

References

Details

(Keywords: dev-doc-needed)

Attachments

(3 files, 27 obsolete files)

2.16 KB, patch
sicking
: review+
Details | Diff | Splinter Review
12.49 KB, patch
nsm
: review+
Details | Diff | Splinter Review
55.60 KB, patch
nsm
: review+
Details | Diff | Splinter Review
This is an implementation of push notifications using a simple version-updates-only protocol.
While currently only for B2G, our eventual aim is to make this a WebAPI and get it on FF desktop.

The protocol is based on Google's Thialfi [0] notification system. The short version:

3 entities: User Agents (B2G), Push Server's (mozilla) and App Server (3rd party). When an
app running on a UA requires notifications for a particular 'channel' (i.e. new email, upcoming calendar events, IM message), it requests the UA to register a channel ID on its behalf.

The UA asks the Push Server for a channel ID which is now associated with the particular UA. UA notifies web app of channelID, which contacts the app server with this ID. When the app server wants to notify the app of a change, it pings the Push Server using a (channelID, <latest version ID>) pair. Data SHOULD NOT be transmitted, only versions. The Push Server and UA will coordinate to wakeup/notify the app using the System Messages API.

There is an in progress spec [1] and a prototype client side implementation [2].


[0] http://research.google.com/pubs/pub37474.html
[1] https://wiki.mozilla.org/WebAPI/SimplePush
[2] http://hg.mozilla.org/users/nsm.nikhil_gmail.com/push-client/
Should point out that this is not meant for release v1
You could argue that.  763198 is about implementing http://www.w3.org/TR/push-api/.  This is about implementing something simpler and safer.
From what I understand, the main difference between 763198 and this one is that this one is meant to be a ping mechanism, not transport one. And in SimplePush several apps can listem to the same channel. Is this correct? 

So does it make it the subset of 763198 (push notification api)? What is the rationale behind having both?
(In reply to Kamil Leszczuk from comment #4)
> From what I understand, the main difference between 763198 and this one is
> that this one is meant to be a ping mechanism, not transport one. And in
> SimplePush several apps can listem to the same channel. Is this correct? 

Unless an app can 'guess' the channelID for another app in a very large keyspace
it won't be able to listen to that channel. Ideally of course we want the user
agent to sign off individual apps in some way so that even guessing won't allow
this to happen. Good point this. The idea is that one app can have multiple channels,
but one channel can belong to only one app.
I like the idea. It cover most use cases and keeps it simple. 

What is the relation of SimplePush to Push Notification API: will both coexist in B2G? If so, will they be related anyhow? 

Push Notification API is supposed to land on v2. Which date/release are you targeting with SimplePush?
The W3C Push Notification will not be implemented by Mozilla.  We have commented on the webapp mailing list why this is (mostly around ease-of-us, safety, and scalability concerns).  Simple Push is our proposal which we will take to the W3C WebApps WG when we have demonstrated to ourselves that the system is solid.

We plan on shipping Simple Push for v.2 of B2G, and sometime in 2013 for all platforms.
Flags: sec-review?(ptheriault)
Depends on: 839757
IMPORTANT NOTE: This patch is not supposed to land as is. It requires some discussion about system message behaviour in the browser and some permissions issues. Attached only because Push core depends on it by way of some variable name changes.

Main change:
manifest has been renamed to identifier and in the case of the browser the windowId is used instead of the manifest. Again, not meant for landing
Attachment #724214 - Flags: feedback?(justin.lebar+bug)
Protocol documented at https://wiki.mozilla.org/WebAPI/SimplePush/Protocol
JS API is at https://wiki.mozilla.org/WebAPI/SimplePush

Adds permissions "push", and system messages "push" and "push-register"
Introduces navigator.pushNotification (Push.js)

PushService(.js) runs in the parent process on B2G, maintains the local database, talks to the server and delivers notifications to apps via Push.js
Attachment #724216 - Flags: feedback?(justin.lebar+bug)
Attached patch Patch 3: UDP wakeup support (obsolete) — Splinter Review
If informed by the PushServer (via WebSocket Close status 4774) that the server is capable of waking up the phone over UDP, this patch will shutdown the WebSocket and start a UDP socket on port 2442. UDP server socket support has recently landed in m-c (Bug 839757).

When the UDP server socket receives data, the websocket is started up and the push server can relay notifications.
Attachment #724217 - Flags: feedback?(justin.lebar+bug)
Attached patch Patch 4: Registration tests (obsolete) — Splinter Review
Registration only tests for Push. It uses the mochitest WebSocket server and a mock push implementation. Tests are incomplete.

Launch using ./mach mochitest-plain dom/push/tests
Attachment #724219 - Flags: feedback?(justin.lebar+bug)
Attachment #724214 - Flags: feedback?(justin.lebar+bug) → review?(fabrice)
Comment on attachment 724216 [details] [diff] [review]
Patch 2: The core push implementation

jonas, please review the idl.
Attachment #724216 - Flags: superreview?(jonas)
Attachment #724216 - Flags: review?(doug.turner)
Attachment #724216 - Flags: feedback?(justin.lebar+bug)
Attachment #724217 - Flags: feedback?(justin.lebar+bug) → review?(doug.turner)
Attachment #724219 - Flags: feedback?(justin.lebar+bug) → review?(doug.turner)
Comment on attachment 724214 [details] [diff] [review]
Patch 1: Allow system messages to work with webpages

Fabrice:

This needs more discussion. Specifically windowIds may be assigned to different things across browser invocations, so the system message handlers should definitely not be saved anywhere. Right now this only delivers messages which do not require any permissions. Ideally push will require permissions on b2g, in that case this patch will not work on desktop.

For push, we would also like bug 800431 to be fixed in system messages itself, rather than in alarms and push. I was thinking:
Have a variant of sendMessage() which doesn't require a pageURI, and sends the message to pages declared in the manifest messages section.
Updated to use clear() instead of deleteDatabase()
Attachment #724216 - Attachment is obsolete: true
Attachment #724216 - Flags: superreview?(jonas)
Attachment #724216 - Flags: review?(doug.turner)
Attachment #724639 - Flags: superreview?(jonas)
Attachment #724639 - Flags: review?
Comment on attachment 724639 [details] [diff] [review]
Patch 2: The core push implementation

Review of attachment 724639 [details] [diff] [review]:
-----------------------------------------------------------------

Looking good.  One more patch please.

::: b2g/installer/package-manifest.in
@@ +263,5 @@
>  @BINPATH@/components/necko_socket.xpt
>  @BINPATH@/components/necko_strconv.xpt
>  @BINPATH@/components/necko_viewsource.xpt
>  @BINPATH@/components/necko_wifi.xpt
> +@BINPATH@/components/necko_websocket.xpt

This change is for a different bug, right?

::: browser/base/content/browser.js
@@ +1805,5 @@
>  #ifdef MOZ_SERVICES_SYNC
>      // initialize the sync UI
>      gSyncUI.init();
>  #endif
> +

please remove.

::: browser/installer/removed-files.in
@@ +1210,5 @@
>    components/dom_html.xpt
>    components/dom_json.xpt
>    components/dom_loadsave.xpt
>    components/dom_offline.xpt
> +  components/dom_push.xpt

Don't forget these other files:

/b2g/installer/removed-files.in
/mobile/android/installer/removed-files.in

::: configure.in
@@ +4264,5 @@
>  MOZ_USE_NATIVE_POPUP_WINDOWS=
>  MOZ_ANDROID_HISTORY=
>  MOZ_WEBSMS_BACKEND=
>  ACCESSIBILITY=1
> +MOZ_SYS_MSG=1

Please remove this to the other patch.

::: dom/interfaces/push/Makefile.in
@@ +1,5 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +

too many new lines

@@ +15,5 @@
> +GRE_MODULE     = 1
> +
> +XPIDLSRCS = \
> +  nsIDOMPush.idl \
> +  nsIPushService.idl \

let's go ahead and remove this interface for now.  We can always add it back if we have new functionality that we need to expose via xpcom.

::: dom/interfaces/push/nsIDOMPush.idl
@@ +4,5 @@
> +
> +#include "domstubs.idl"
> +interface nsIDOMDOMRequest;
> +
> +[scriptable,uuid(c7ad4f42-faae-4e8b-9879-780a72349945)]

Looks fine, and what sicking and I discussed.  We want to move this to webIDL as soon as possible, but we are implementing in JS and that isn't supported yet.

::: dom/messages/SystemMessagePermissionsChecker.jsm
@@ +76,5 @@
>    },
> +  "push": {
> +  },
> +  "push-register": {
> +  },

Since we are protecting the register() call, i do not think we need to have permission checks for these system messages.

::: dom/push/src/Push.js
@@ +4,5 @@
> +
> +"use strict";
> +
> +function debug(s) {
> +  dump("-*- Push.js: " + s + "\n");

reduce noise here.

@@ +55,5 @@
> +      if (perm != Ci.nsIPermissionManager.ALLOW_ACTION)
> +        return null;
> +    }
> +    else {
> +      // FIXME: at some point this will have a doorhanger permission

we probably want to throw so that this object isn't created -- until we understand what the system event story is on desktop.

@@ +116,5 @@
> +    var req = this.createRequest();
> +    this._cpmm.sendAsyncMessage("Push:Register",
> +                      { pageURL: this._pageURL.spec, identifier: this._identifier, requestID: this.getRequestId(req) });
> +
> +    /*if (this._app) {

This block should be removed (and put into another bug).  We should really fix this in system messages.

::: dom/push/src/PushService.js
@@ +4,5 @@
> +
> +"use strict";
> +
> +function debug(s) {
> +  dump("-*- PushService.js: " + s + "\n");

Let's be less noisy.

@@ +13,5 @@
> +const Cu = Components.utils;
> +
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");
> +Cu.import("resource://gre/modules/Services.jsm");
> +Cu.import("resource://gre/modules/FileUtils.jsm");

is this import needed?

@@ +23,5 @@
> +
> +const kPUSHDB_DB_NAME = "push";
> +const kPUSHDB_DB_VERSION = 1; // Change this if the IndexedDB database format changes
> +const kPUSHDB_STORE_NAME = "push";
> +const kCONNECTION_RETRY_BASE_INTERVAL = 5*1000; // start value of 5 seconds based on HTTP/1.1

Consider making this a preference.

@@ +27,5 @@
> +const kCONNECTION_RETRY_BASE_INTERVAL = 5*1000; // start value of 5 seconds based on HTTP/1.1
> +
> +this.PushDB = function PushDB(aGlobal) {
> +  debug("PushDB()");
> +  this._global = aGlobal;

Can we remove _global?

@@ +41,5 @@
> +  init: function(aGlobal) {
> +    this.initDBHelper(kPUSHDB_DB_NAME, kPUSHDB_DB_VERSION, [kPUSHDB_STORE_NAME], aGlobal);
> +  },
> +
> +  upgradeSchema: function(aTransaction, aDb, aOldVersion, aNewVersion) {

Please get bent or kyle to review the usage of indexedDB

@@ +66,5 @@
> +    this.newTxn(
> +      "readwrite", 
> +      kPUSHDB_STORE_NAME,
> +      function txnCb(aTxn, aStore) {
> +        debug("Going to add " + JSON.stringify(aChannelRecord));

Let's remove any debug() that isn't really needed that does things that eat cycles.  I am worried that we are stringify()'ing things that we really don't need to because no one is ever going to want to see this debug() stmt

@@ +76,5 @@
> +      aSuccessCb, 
> +      aErrorCb
> +    );
> +  },
> +

kill all trailing whitespace.

@@ +173,5 @@
> +        aStore.openCursor().onsuccess = function(event) {
> +          var cursor = event.target.result;
> +          if (cursor) {
> +            aTxn.result.push(cursor.key);
> +            cursor.continue();

This does go back to the event loop every time, right?

@@ +224,5 @@
> +      for (var elem in Components.results)
> +        if (Components.results[elem] == statusCode)
> +          debug(" - " + elem);
> +      tmp.socketError(statusCode);
> +    }

Can you do something like:

if (statusCode != NS_OK) {
  this.socketError(statusCode)
}
_resetWS();


Also, remove the debugging forloop

@@ +251,5 @@
> +      debug("messageType not a string " + reply.messageType);
> +      return;
> +    }
> +
> +    var handler = "_handle" + reply.messageType[0].toUpperCase() + reply.messageType.slice(1).toLowerCase() + "Reply";

good place for a comment about what this does.

@@ +271,5 @@
> +      // requests since the server should've responded to them before the
> +      // connection was closed. 
> +      debug("Pending requests to server on successful WebSocket close");
> +      // FIXME
> +    }

Let't remove all of this debug stuff, and reconnect if we have pending requests?

@@ +291,5 @@
> +PushService.prototype = {
> +  classID : Components.ID("{0ACE8D15-9B15-41F4-992F-C88820421DBF}"),
> +
> +  QueryInterface : XPCOMUtils.generateQI([Ci.nsIPushService,
> +                                          Ci.nsIWebSocketListener,

You do not need nsIPushService or nsIWebSocketListener defined here.

@@ +322,5 @@
> +        break;
> +      case "timer-callback":
> +        if (aSubject == this._requestTimeoutTimer) {
> +          for (var channelID in this._pendingRequests) {
> +            if (Date.now() - this._pendingRequests[channelID].ctime > 10000) {

Sounds like a magic number.  Can you defined this somewhere above.  Maybe a pref even?

@@ +332,5 @@
> +                  this._requestQueue.splice(i, 1);
> +            }
> +          }
> +        }
> +

else, or just return from this method.

@@ +406,5 @@
> +        ppmm.addMessageListener(msgName, this);
> +      }.bind(this));
> +
> +    this._requestTimeoutTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> +    this._retryTimeoutTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);

Do you need to create these right here.  Just do it when you call InitWhatever

@@ +471,5 @@
> +    var uri = Services.io.newURI(serverURL, null, null);
> +    try {
> +      if (uri.scheme === "ws") {
> +        debug("!!!!!!!! USING UNSECURE PROTOCOL. YOU MUSE USE wss:// TO BE SAFE");
> +        this._ws = Cc["@mozilla.org/network/protocol;1?name=ws"].createInstance(Ci.nsIWebSocketChannel);

So, the only time this is valid is if we are testing (maybe a limitation in mochitests?).  In every other case, we should disallow with fire.

@@ +500,5 @@
> +      if (!this._ws)
> +        throw "ws was still undefined";
> +    }
> +
> +    this._retryTimeoutTimer.cancel();

check for null

@@ +520,5 @@
> +      this._currentState = STATE_WAITING_FOR_HELLO;
> +    }
> +
> +    this._db.getAllChannelIDs(sendHelloMessage.bind(this),
> +                              sendHelloMessage.bind(this))

semicolon;

@@ +551,5 @@
> +          // apps that have no prior registrations
> +          // but are in the pending queue, won't get
> +          // a push-register, which is correct.
> +          this._notifyAllAppsRegister();
> +            //.then(finishHandshake());

remove comment

@@ +631,5 @@
> +      }
> +
> +      if (typeof version === "number") {
> +        // FIXME(nikhil): this relies on app update notification being infallible!
> +        // eventually fix this

File follow up bug.  My guess is that there will be situations that the application can not wake and then you'll lose this update, right?

@@ +641,5 @@
> +
> +  _onWSError: function(e) {
> +  },
> +
> +  // FIXME(nikhil): batch acks for efficiency reasons.

File followup.  This probably isn't that big of a deal right now.

@@ +700,5 @@
> +      return;
> +    }
> +
> +    [action, data] = this._requestQueue.shift();
> +    data.messageType = action; // FIXME(nikhil): avoid modifying data

I don't know why this matters.

@@ +701,5 @@
> +    }
> +
> +    [action, data] = this._requestQueue.shift();
> +    data.messageType = action; // FIXME(nikhil): avoid modifying data
> +    try {

if (!this._ws) {
  this._resetWS();
}

@@ +752,5 @@
> +
> +  _notifyAllAppsRegister: function() {
> +    debug("notifyAllAppsRegister()");
> +    let messenger = Cc["@mozilla.org/system-message-internal;1"].getService(Ci.nsISystemMessagesInternal);
> +    messenger.broadcastMessage('push-register', {});

Please create a follow up to investigate a smarter way of doing this.

@@ +800,5 @@
> +    let json = aMessage.data;
> +    this[aMessage.name.slice("Push:".length).toLowerCase()](json, mm);
> +  },
> +
> +  register: function(aPageRecord, aMessageManager) {

What is aPageRecord?

@@ +804,5 @@
> +  register: function(aPageRecord, aMessageManager) {
> +    debug("register()");
> +
> +    let uuidGenerator = Cc["@mozilla.org/uuid-generator;1"].getService(Ci.nsIUUIDGenerator);
> +    var channelID = uuidGenerator.generateUUID().toString().slice(1).slice(0, -1);

add a comment about what kind of string you're wanting. {...} -> ...

@@ +844,5 @@
> +      pushEndpoint: data.pushEndpoint,
> +      pageURL: aPageRecord.pageURL,
> +      identifier: aPageRecord.identifier,
> +      ctime: Date.now(),
> +      mtime: Date.now(),

do we need mtime and ctime?

@@ +869,5 @@
> +      case 409: // CONFLICT CHID already exists
> +        if (typeof aPageRecord._attempts !== "number")
> +          aPageRecord._attempts = 0;
> +
> +        if (aPageRecord._attempts < 3) {

Magic number.  Please define it above.

::: dom/push/src/push-prefs.js
@@ +3,5 @@
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +/*
> +Push server address has to be != localhost, because on real device or emulator,
> +localhost points to the device itself, where there is no push server. */

Remove this comment.

@@ +4,5 @@
> +
> +/*
> +Push server address has to be != localhost, because on real device or emulator,
> +localhost points to the device itself, where there is no push server. */
> +pref("services.push.serverURL", "ws://pushserver:8080"); // TODO: fix ws to wss

Lets find out the real fqdn for this.  Also, lets use wss now.

@@ +5,5 @@
> +/*
> +Push server address has to be != localhost, because on real device or emulator,
> +localhost points to the device itself, where there is no push server. */
> +pref("services.push.serverURL", "ws://pushserver:8080"); // TODO: fix ws to wss
> +//pref("services.push.heartbeatInterval", 1200000); // 20 minutes

Get rid of the comment.

@@ +6,5 @@
> +Push server address has to be != localhost, because on real device or emulator,
> +localhost points to the device itself, where there is no push server. */
> +pref("services.push.serverURL", "ws://pushserver:8080"); // TODO: fix ws to wss
> +//pref("services.push.heartbeatInterval", 1200000); // 20 minutes
> +pref("services.push.heartbeatInterval", 5000); // 5 seconds

Remove please.

::: mobile/android/installer/package-manifest.in
@@ +117,5 @@
>  @BINPATH@/components/dom_events.xpt
>  @BINPATH@/components/dom_file.xpt
>  @BINPATH@/components/dom_geolocation.xpt
>  @BINPATH@/components/dom_media.xpt
> +@BINPATH@/components/dom_messages.xpt

Separate bug?

@@ +367,5 @@
> +@BINPATH@/components/PushService.manifest
> +
> +@BINPATH@/components/SystemMessageInternal.js
> +@BINPATH@/components/SystemMessageManager.js
> +@BINPATH@/components/SystemMessageManager.manifest

Same thing?
Attachment #724639 - Flags: review?
Attachment #724639 - Flags: review-
Comment on attachment 724217 [details] [diff] [review]
Patch 3: UDP wakeup support

Review of attachment 724217 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with those changes.

::: dom/push/src/PushService.js
@@ +8,5 @@
>    dump("-*- PushService.js: " + s + "\n");
> +
> +  Cc["@mozilla.org/consoleservice;1"].
> +    getService(Ci.nsIConsoleService).
> +    logStringMessage("-*- PushService.js: " + s + "\n");

Remove

@@ +220,5 @@
>      // keep a ref, since this will be nulled in resetWS
>      var tmp = this._pushService;
>      this._pushService._resetWS();
>  
> +    if (statusCode == Components.results.NS_OK || statusCode == Components.results.NS_BASE_STREAM_CLOSED) {

Lets add a comment as to why we needed to check for this error.

@@ +1027,5 @@
> +      debug("UDP support disabled");
> +      return;
> +    }
> +
> +    let udpSocket = Components.Constructor("@mozilla.org/network/server-socket-udp;1", "nsIUDPServerSocket", "init");

See if you can switch this over to using Ci directly.

@@ +1042,5 @@
> +      this._beginWSSetup();
> +    } catch(e) {
> +      debug("Attempting to setup ws failed: " + e);
> +    }
> +    

white space.

@@ +1046,5 @@
> +    
> +  },
> +
> +  onStopListening: function(aServ, aStatus) {
> +  },

Add a follow up bug here -- if there is an error opening up the udp listener, it might be due to the port being inuse.  If this is the case, we want to increment the port number and try asyncOpen again.

::: dom/push/src/push-prefs.js
@@ +9,5 @@
>  //pref("services.push.heartbeatInterval", 1200000); // 20 minutes
>  pref("services.push.heartbeatInterval", 5000); // 5 seconds
>  pref("services.push.userAgentID", "");
>  pref("services.push.lastSync", "0"); // UNIX timestamp
> +pref("services.push.udp", false);

Default to true here.

@@ +11,5 @@
>  pref("services.push.userAgentID", "");
>  pref("services.push.lastSync", "0"); // UNIX timestamp
> +pref("services.push.udp", false);
> +pref("services.push.udp.port", 2442);
> +pref("services.push.udp.force-listen", true);

Remove force-listen
Attachment #724217 - Flags: review?(doug.turner) → review+
Comment on attachment 724219 [details] [diff] [review]
Patch 4: Registration tests

lgtm
Attachment #724219 - Flags: review?(doug.turner) → review+
Comment on attachment 724219 [details] [diff] [review]
Patch 4: Registration tests

Review of attachment 724219 [details] [diff] [review]:
-----------------------------------------------------------------

oh.. and these should be run as app tests. :)
Comment on attachment 724639 [details] [diff] [review]
Patch 2: The core push implementation

Review of attachment 724639 [details] [diff] [review]:
-----------------------------------------------------------------

::: b2g/installer/package-manifest.in
@@ +263,5 @@
>  @BINPATH@/components/necko_socket.xpt
>  @BINPATH@/components/necko_strconv.xpt
>  @BINPATH@/components/necko_viewsource.xpt
>  @BINPATH@/components/necko_wifi.xpt
> +@BINPATH@/components/necko_websocket.xpt

No, without the websocket cannot be constructed on the device. We get something like 
I/Gecko   (  107): -*- PushService.js: Exception [Exception... "Component returned failure code: 0x80570018 (NS_ERROR_XPC_BAD_IID) [nsIJSCID.createInstance]"  nsresult: "0x80570018 (NS_ERROR_XPC_BAD_IID)"  location: "JS frame :: jar:file:///system/b2g/omni.ja!/components/PushService.js :: PushService.prototype._beginWSSetup :: line 367"  data: no]
E/GeckoConsole(  107): -*- PushService.js: Exception [Exception... "Component returned failure code: 0x80570018 (NS_ERROR_XPC_BAD_IID) [nsIJSCID.createInstance]"  nsresult: "0x80570018 (NS_ERROR_XPC_BAD_IID)"  location: "JS frame :: jar:file:///system/b2g/omni.ja!/components/PushService.js :: PushService.prototype._beginWSSetup :: line 367"  data: no]
E/GeckoConsole(  107): [JavaScript Error: "this._ws is null" {file: "jar:file:///system/b2g/omni.ja!/components/PushService.js" line: 378}]

Or do you mean to open a new bug just to add this?
just create a new bug that i can slap a r+ on.
Comment on attachment 724639 [details] [diff] [review]
Patch 2: The core push implementation

Review of attachment 724639 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/push/src/PushService.js
@@ +25,5 @@
> +const kPUSHDB_DB_VERSION = 1; // Change this if the IndexedDB database format changes
> +const kPUSHDB_STORE_NAME = "push";
> +const kCONNECTION_RETRY_BASE_INTERVAL = 5*1000; // start value of 5 seconds based on HTTP/1.1
> +
> +this.PushDB = function PushDB(aGlobal) {

Kyle said, "it is hard to know if this is a singleton or not.  This should be a singleton"

@@ +42,5 @@
> +    this.initDBHelper(kPUSHDB_DB_NAME, kPUSHDB_DB_VERSION, [kPUSHDB_STORE_NAME], aGlobal);
> +  },
> +
> +  upgradeSchema: function(aTransaction, aDb, aOldVersion, aNewVersion) {
> +    debug("PushDB.upgradeSchema()")

Kyle said, "we want to check the old version as well.  consider the down grade case.  maybe it is as simple as just ingoring"

@@ +48,5 @@
> +    let objectStore = aDb.createObjectStore(kPUSHDB_STORE_NAME, { keyPath: "channelID" });
> +
> +    objectStore.createIndex("pushEndpoint", "pushEndpoint", { unique: true });
> +    objectStore.createIndex("identifier", "identifier", { unique: false });
> +    objectStore.createIndex("pageURL", "pageURL", { unique: false });

Kyle said, "comment on exactly what each of these are, and why they should or should not be unique"

@@ +59,5 @@
> +   *        Callback function to invoke with result ID.
> +   * @param aErrorCb [optional]
> +   *        Callback function to invoke when there was an error.
> +   */
> +  add: function(aChannelRecord, aSuccessCb, aErrorCb) {

Kyle said, "indexedDB has both add() and put().  They are different things.  Here we have a method named add, but it is calling put on the underlying thing.  don't confuse things.  Please fix"

@@ +85,5 @@
> +   *        Callback function to invoke with result.
> +   * @param aErrorCb [optional]
> +   *        Callback function to invoke when there was an error.
> +   */
> +  remove: function(aChannelID, aSuccessCb, aErrorCb) {

Kyle said, "remove should probably be delete() for consistency with indexedDB"

@@ +146,5 @@
> +      "readonly",
> +      kPUSHDB_STORE_NAME,
> +      function txnCb(aTxn, aStore) {
> +        aTxn.result = [];
> +        var index = aStore.index("identifier");

kyle asks, "can a .mozGetAll() do the right thing here?  If it can, it will be way faster"

@@ +169,5 @@
> +      kPUSHDB_STORE_NAME,
> +      function txnCb(aTxn, aStore) {
> +        aTxn.result = [];
> +
> +        aStore.openCursor().onsuccess = function(event) {

kyle asks, "can a .mozGetAll() do the right thing here, too?"
Attached patch Patch 1: Push IDL (obsolete) — Splinter Review
Obsoleted Patch 2 to avoid conflict, new one coming up
Attachment #724639 - Attachment is obsolete: true
Attachment #724639 - Flags: superreview?(jonas)
Attachment #725188 - Flags: review?(jonas)
Attachment #724214 - Attachment is obsolete: true
Attachment #724214 - Flags: review?(fabrice)
Attached patch Patch 4: Registration tests (obsolete) — Splinter Review
Fix path in test_unregister.
Attachment #724219 - Attachment is obsolete: true
Fixes as per review, and some other minor changes
Attachment #725655 - Flags: review?(doug.turner)
Attached patch Patch 3: UDP wakeup support (obsolete) — Splinter Review
Attachment #724217 - Attachment is obsolete: true
Attachment #725658 - Flags: review?
Attachment #725658 - Flags: review? → review?(doug.turner)
Attached patch Patch 4: Registration tests (obsolete) — Splinter Review
Updated just to be sure the latest file is here
Attachment #725189 - Attachment is obsolete: true
Attachment #725660 - Flags: review?(doug.turner)
Comment on attachment 725658 [details] [diff] [review]
Patch 3: UDP wakeup support

Review of attachment 725658 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/push/src/PushService.js
@@ +408,5 @@
>  
>      this._retryTimeout = this._prefs.get("retryBaseInterval");
>      this._requestTimeout = this._prefs.get("requestTimeout");
>  
> +    this._port = this._prefs.get("udp.port", 2442);

do we need the default here if it is in the prefs?

@@ +996,5 @@
> +      return;
> +    }
> +
> +    this._udpServer = Cc["@mozilla.org/network/server-socket-udp;1"].createInstance(Ci.nsIUDPServerSocket);
> +    this._udpServer.init(this._port, false);

what if this fails?
Attachment #725658 - Flags: review?(doug.turner) → review+
Comment on attachment 725655 [details] [diff] [review]
Patch 2: The core push implementation

Review of attachment 725655 [details] [diff] [review]:
-----------------------------------------------------------------

::: b2g/app/b2g.js
@@ +393,5 @@
>  // WebAlarms
>  pref("dom.mozAlarms.enabled", true);
>  
> +// SimplePush
> +pref("services.push.serverURL", "wss://pushserver:9082");

we want to get a fqdm from mmayo/jr.

::: b2g/installer/package-manifest.in
@@ -263,5 @@
>  @BINPATH@/components/necko_strconv.xpt
>  @BINPATH@/components/necko_viewsource.xpt
>  @BINPATH@/components/necko_wifi.xpt
>  @BINPATH@/components/necko_wyciwyg.xpt
> -@BINPATH@/components/necko_websocket.xpt

is this a removal?

::: browser/base/content/browser.js
@@ +1805,5 @@
>  #ifdef MOZ_SERVICES_SYNC
>      // initialize the sync UI
>      gSyncUI.init();
>  #endif
> +

remove ws

::: dom/messages/SystemMessagePermissionsChecker.jsm
@@ +76,5 @@
>    },
> +  "push": {
> +  },
> +  "push-register": {
> +  },

remove
(In reply to Doug Turner (:dougt) from comment #28)
> Comment on attachment 725655 [details] [diff] [review]

> ::: dom/messages/SystemMessagePermissionsChecker.jsm
> @@ +76,5 @@
> >    },
> > +  "push": {
> > +  },
> > +  "push-register": {
> > +  },
> 
> remove

It's correct : this means that no permission is needed to receive these system messages.
ah. thanks.  So, you need to have an entry even if there is no permission.
@dougt
what would be the recovery path if the UDP socket instantiation failed. We can connect to WebSocket again, but the server would just disconnect a while later with a request to start UDP, when this'll fail again (unless the instantiation failed because of OOM, in which case it might succeed the second time.). We are very likely to get stuck in a loop. Should the client just pretend that it doesn't support UDP wakeup?

What is the likelihood of this instantiation failing
(In reply to Nikhil Marathe from comment #31)
> @dougt
> what would be the recovery path if the UDP socket instantiation failed. We
> can connect to WebSocket again, but the server would just disconnect a while
> later with a request to start UDP, when this'll fail again (unless the
> instantiation failed because of OOM, in which case it might succeed the
> second time.). We are very likely to get stuck in a loop. Should the client
> just pretend that it doesn't support UDP wakeup?
> 
> What is the likelihood of this instantiation failing

I'd say that if you detect that you are going to enter a loop, just send "null" on IP/Port/MCC/MNC, and server won't know how to wake up the device with UDP. Once it's possible to open a UDP port, you can do the registration again.
(In reply to Guillermo López (:willyaranda) from comment #32)
> (In reply to Nikhil Marathe from comment #31)
> > @dougt
> > what would be the recovery path if the UDP socket instantiation failed. We
> > can connect to WebSocket again, but the server would just disconnect a while
> > later with a request to start UDP, when this'll fail again (unless the
> > instantiation failed because of OOM, in which case it might succeed the
> > second time.). We are very likely to get stuck in a loop. Should the client
> > just pretend that it doesn't support UDP wakeup?
> > 
> > What is the likelihood of this instantiation failing
> 
> I'd say that if you detect that you are going to enter a loop, just send
> "null" on IP/Port/MCC/MNC, and server won't know how to wake up the device
> with UDP. Once it's possible to open a UDP port, you can do the registration
> again.

Yes, but this will lead to another timer which simply keeps trying to open UDP if it fails the first time, then has to do a new handshake with the server when it does succeed.
In addition, if, while the timer is retrying this, the phone switches to a network where UDP isn't required, it'll just cancel the timer. This introduces 3-4 new cases to watch out for in testing and get r+ed. Which is why I'd like to know the likelihood of failure, since we are on a deadline.
Attachment #725655 - Attachment is obsolete: true
Attachment #725655 - Flags: review?(doug.turner)
Attachment #726448 - Flags: review?(justin.lebar+bug)
Attachment #726448 - Flags: review?(jst)
Attached patch Patch 3: UDP wakeup support (obsolete) — Splinter Review
Attachment #725658 - Attachment is obsolete: true
Attachment #726449 - Flags: review?(justin.lebar+bug)
Attachment #726449 - Flags: review?(jst)
@jst, @jlebar
Patch in bug 852365 should be applied for push to work in b2g
I'm not sure what the double-r? means here.  Do you want me and jst to review different parts of these patches?  Would you be satisfied with an r+ from either of us?  Or do you want an r+ from both of us?
Ideally, both.  But either will work.

I am much more interesting in getting jlebar's feedback on deltas from thialfi that will hurt us, and jst's feedback on dom integration stuff.
Sorry for obsoleting a r? patch, but there was a small and important fix, which only changes SystemMessagesPermissionsChecker (2 line change).

Full commit message explains the reason
http://hg.mozilla.org/users/nsm.nikhil_gmail.com/push-ws1/rev/1dcc79a39380
Attachment #726448 - Attachment is obsolete: true
Attachment #726448 - Flags: review?(justin.lebar+bug)
Attachment #726448 - Flags: review?(jst)
Attachment #727003 - Flags: review?(justin.lebar+bug)
Attachment #727003 - Flags: review?(jst)
Comment on attachment 725188 [details] [diff] [review]
Patch 1: Push IDL

>+++ b/dom/interfaces/push/nsIDOMPush.idl
>+[scriptable,uuid(c7ad4f42-faae-4e8b-9879-780a72349945)]
>+interface nsIDOMPush : nsISupports
>+{
>+  nsIDOMDOMRequest register();
>+  nsIDOMDOMRequest unregister(in ACString token);
>+  nsIDOMDOMRequest registrations();
>+};

Please add comments to this interface.

It may be too late to nit on the interface name, but it doesn't make a lot of
sense to me to call the methods "register" and "unregister" when you can
"register" multiple times.  "Registration" is a one-time sort of thing.

Better names might be

  newToken()
  deleteToken()
  getTokens()

or something like that.
Attached patch Patch 1: Push IDL (obsolete) — Splinter Review
Updated IDL to rename nsIDOMPush to nsIDOMPushManager, navigator.pushNotification to navigator.push, as per discussion with sicking, and Eduardo from TEF
Attachment #725188 - Attachment is obsolete: true
Attachment #725188 - Flags: review?(jonas)
Attachment #727930 - Flags: review?(jonas)
Changes:
Push.js has been updated to return a non-QIable object from init as suggested by jst.
Fixed a bug in sendMessage call where system messages expects a nsIURI
Fixed a bug in converting indexeddb records to hello message format
Attachment #727003 - Attachment is obsolete: true
Attachment #727003 - Flags: review?(justin.lebar+bug)
Attachment #727003 - Flags: review?(jst)
Attachment #727937 - Flags: review?(justin.lebar+bug)
Attachment #727937 - Flags: review?(jst)
Attached patch Patch 3: UDP wakeup support (obsolete) — Splinter Review
Better handling of socket close errors - Since any server side close causes a TCP level socket error (NS_BASE_STREAM_CLOSED) we need to differentiate between a clean and unclean shutdown.
Attachment #727939 - Flags: review?(justin.lebar+bug)
Attachment #727939 - Flags: review?(jst)
Comment on attachment 726449 [details] [diff] [review]
Patch 3: UDP wakeup support

Forgot to mark as obsolete
Attachment #726449 - Attachment is obsolete: true
Attachment #726449 - Flags: review?(justin.lebar+bug)
Attachment #726449 - Flags: review?(jst)
[NO IMPLEMENTATION CHANGES]
Simplifications to DOMRequest.result

Having subfields to DOMRequest.result doesn't make sense when all of them have only
one subfield.

unregister also passes the pushEndpoint that was unregistered through to the Web API
for so that clients don't need to track it themselves. Also keeps register/unregister
symmetrical.
Attachment #727937 - Attachment is obsolete: true
Attachment #728377 - Attachment is obsolete: true
Attachment #727937 - Flags: review?(justin.lebar+bug)
Attachment #727937 - Flags: review?(jst)
Attachment #728377 - Flags: review?(justin.lebar+bug)
Attachment #728382 - Flags: review?(justin.lebar+bug)
Attachment #728382 - Flags: review?(jst)
A few cosmetic things as I look through this:

* Is style still to wrap lines at 80 characters in JS?  I thought it was, but maybe I missed the memo.

* The ragged edges in the block comments are pretty distracting to me (and also don't match our style).  Can you please ask your editor to compact the blocks to 80 chars and use an empty line to separate paragraphs?

* Sentences begin with a capital letter and end with punctuation, please.  Ok to punctuate sentence fragments like sentences.
Comment on attachment 727930 [details] [diff] [review]
Patch 1: Push IDL

r- as a reminder to add comments here.
Attachment #727930 - Flags: review-
Comment on attachment 727939 [details] [diff] [review]
Patch 3: UDP wakeup support

I'd like someone familiar with the UDP sockets implementation to sign off on this as well.
Attachment #727939 - Flags: review?(honzab.moz)
Comment on attachment 728382 [details] [diff] [review]
Patch 2: The core push implementation

I'm not familiar enough with IDB to feel comfortable reviewing that part.  khuey, would you mind reviewing the PushDB class, or kicking the can off to someone else?
Attachment #728382 - Flags: review?(khuey)
Comment on attachment 728382 [details] [diff] [review]
Patch 2: The core push implementation

Review of attachment 728382 [details] [diff] [review]:
-----------------------------------------------------------------

The IDB usage looks good.  Just nits.

::: dom/push/src/PushService.js
@@ +70,5 @@
> +      kPUSHDB_STORE_NAME,
> +      function txnCb(aTxn, aStore) {
> +        debug("Going to put " + aChannelRecord.channelID);
> +        aStore.put(aChannelRecord).onsuccess = function setTxnResult(aEvent) {
> +          aTxn.result = aEvent.target.result;

Remove this.

@@ +109,5 @@
> +    this.newTxn(
> +      "readonly",
> +      kPUSHDB_STORE_NAME,
> +      function txnCb(aTxn, aStore) {
> +        aTxn.result = undefined;

and this

@@ +113,5 @@
> +        aTxn.result = undefined;
> +
> +        var index = aStore.index("pushEndpoint");
> +        index.get(aPushEndpoint).onsuccess = function setTxnResult(aEvent) {
> +          aTxn.result = aEvent.target.result;

and this

@@ +129,5 @@
> +    this.newTxn(
> +      "readonly",
> +      kPUSHDB_STORE_NAME,
> +      function txnCb(aTxn, aStore) {
> +        aTxn.result = undefined;

and this

@@ +132,5 @@
> +      function txnCb(aTxn, aStore) {
> +        aTxn.result = undefined;
> +
> +        aStore.get(aChannelID).onsuccess = function setTxnResult(aEvent) {
> +          aTxn.result = aEvent.target.result;

and this

@@ +151,5 @@
> +      kPUSHDB_STORE_NAME,
> +      function txnCb(aTxn, aStore) {
> +        var index = aStore.index("manifestURL");
> +        index.mozGetAll().onsuccess = function(event) {
> +          aTxn.result = event.target.result;

and this.

@@ +167,5 @@
> +      "readonly",
> +      kPUSHDB_STORE_NAME,
> +      function txnCb(aTxn, aStore) {
> +        aStore.mozGetAll().onsuccess = function(event) {
> +          aTxn.result = event.target.result;

and this.
Attachment #728382 - Flags: review?(khuey) → review+
Attached patch Patch 1: Push IDL (obsolete) — Splinter Review
Add comments
Attachment #727930 - Attachment is obsolete: true
Attachment #727930 - Flags: review?(jonas)
Attachment #729199 - Flags: review?(justin.lebar+bug)
Attachment #729199 - Flags: review?(jonas)
Formatting and IDB fixes
Attachment #728382 - Attachment is obsolete: true
Attachment #728382 - Flags: review?(justin.lebar+bug)
Attachment #728382 - Flags: review?(jst)
Attachment #729200 - Flags: review?(justin.lebar+bug)
Attachment #729200 - Flags: review?(jst)
Attached patch Patch 3: UDP wakeup support (obsolete) — Splinter Review
Formatting fixes
Attachment #727939 - Attachment is obsolete: true
Attachment #727939 - Flags: review?(justin.lebar+bug)
Attachment #727939 - Flags: review?(jst)
Attachment #727939 - Flags: review?(honzab.moz)
Attachment #729201 - Flags: review?(justin.lebar+bug)
Attachment #729201 - Flags: review?(jst)
Attachment #729200 - Flags: review?(jst) → review+
Comment on attachment 729200 [details] [diff] [review]
Patch 2: The core push implementation

I've found a number of bugs in the error handling here that need to be addressed.  I'm still working through the review, though.

Please don't land this yet; I don't think it's correct.
Comment on attachment 729199 [details] [diff] [review]
Patch 1: Push IDL

Review of attachment 729199 [details] [diff] [review]:
-----------------------------------------------------------------

I think it would be good to allow passing some sort of closure/context register(). This closure/context would then be passed back when there's an incoming notification. Otherwise callers have to do a database lookup for each incoming notification to determine what type of notification it is.

The closure/context would never be sent to the server. It's simply something that we internally associate with a particular endpoint-url.
Comment on attachment 729201 [details] [diff] [review]
Patch 3: UDP wakeup support

r=jst with a followup bug filed on what to do if the UDP port is already in use (as dougt pointed out earlier).
Attachment #729201 - Flags: review?(jst) → review+
Comment on attachment 729199 [details] [diff] [review]
Patch 1: Push IDL

Review of attachment 729199 [details] [diff] [review]:
-----------------------------------------------------------------

applications are probably already doing this in their own schema.  i really don't want to have to be duplicating and serializing this data.
@Jonas: A majority of the apps will only really be using one endpoint. I'm not sure if the added complexity is worth it. Maybe I'm missing something, is there another webAPI that does something like this so I can have a look at what you mean?
Comment on attachment 729200 [details] [diff] [review]
Patch 2: The core push implementation

Sorry this review took me so long, Nikhil.

This patch looks pretty sane to me, but there are enough edge cases that I'm
worried about that I'd like to take another look.

I also have a lot of style / naming nits.  Part of the reason the review took
me so long was that I had to spend a lot of time understanding how the pieces
fit together.  I hope some of the suggestions I made here will make this code
easier to digest.

Another high-level comment is, it's not clear to me what's the rule here for
throwing an exception versus returning silently.

>diff --git a/b2g/app/b2g.js b/b2g/app/b2g.js
>--- a/b2g/app/b2g.js
>+++ b/b2g/app/b2g.js

>@@ -383,16 +383,22 @@ pref("dom.sms.enabled", true);
>+// SimplePush
>+pref("services.push.serverURL", "wss://pushserver:9082");

Perhaps it would be saner to specify the empty string than a bogus server URL?
(Unless this URL is actually meaningful?)

>diff --git a/dom/push/src/Push.js b/dom/push/src/Push.js
>new file mode 100644
>--- /dev/null
>+++ b/dom/push/src/Push.js

>+Push.prototype = {

Please add a brief comment (one or two sentences is fine) explaining what this
class is and which process it lives in.

>+  init: function(aWindow) {
>+    debug("init()");
>+
>+    let principal = aWindow.document.nodePrincipal;
>+
>+    this._pageURL = principal.URI;

If a page changes its URL using pushState or replaceState, I think web authors
will expect the pageURL sent to the server to reflect this.  So maybe we
shouldn't do this (although using the principal as you do here greatly
simplifies the matter for data: URIs).

What is pageURL used for?

>diff --git a/dom/push/src/PushService.js b/dom/push/src/PushService.js
>new file mode 100644
>--- /dev/null
>+++ b/dom/push/src/PushService.js

>+this.PushWebSocketListener.prototype = {

The reason this isn't folded into the PushService class is because you want to
be able to stop the websocket from sending messages to the push service by
clearing PushWebSocketListener._pushService?

If that's right, then it seems to me that this would be clearer if we made
PushWebSocketListener a thinner proxy to PushService.  For example, we could
say that listener method X calls pushService method webSocketX with minimal
pre-processing.

Right now it's not clear how e.g. beginPushHandshake gets called, but if it was
called webSocketOnStart, that would probably help.

>+  onStart: function(context) {
>+    if (!this._pushService) return;

Is this style standard in Mozilla JS?  We certainly don't allow it in C++ (in
most modules, anyway).

>+    debug("onStart()");
>+    this._pushService._beginPushHandshake();
>+  },
>+
>+  onStop: function(context, statusCode) {
>+    if (!this._pushService) return;

>+    debug("onStop(): " + statusCode);
>+
>+    if (statusCode != Components.results.NS_OK) {
>+      debug("Socket error " + statusCode);
>+      this._pushService.socketError(statusCode);
>+    }
>+
>+    this._pushService._resetWS();

Similarly, instead of calling a private method on pushService, it would seem
cleaner to me if we moved all the logic here into e.g.
pushService.webSocketOnStop().

>+  onMessageAvailable: function(context, message) {
>     [...]
>+    this._pushService[handler](reply);

This is kind of scary.  I'd feel better if we had a whitelist of allowed
functions, but at the very least we need a big comment saying that any _handle
function can be invoked by the push server.

>+  onServerClose: function(context, aStatusCode, aReason) {
>+    if (!this._pushService) return;
>+    debug("onServerClose() " + aStatusCode + " " + aReason);
>+    // 1000 is the normal close
>+    if (aStatusCode == 1000 && Object.getOwnPropertyNames(this._pushService._pendingRequests).length > 0) {
>+      // this should never happen. A successful close cannot have pending
>+      // requests since the server should've responded to them before the
>+      // connection was closed.
>+      this._resetWS();
>+      this._beginWSSetup();

this._pushService, I presume?  Also, maybe we should add a debug message here
if it should never happen?

Is it possible to add a testcase which exercises this branch?

I don't understand why we _beginWSSetup() here.  The server just did something
bad, but it clearly wanted to close the connection.  Why re-connect?

>+function PushService()
>+{
>+  debug("PushService Constructor . ");
>+}
>+
>+const STATE_START = 0;
>+const STATE_WAITING_FOR_OPEN = 1;
>+const STATE_WAITING_FOR_HELLO = 2;
>+const STATE_WAITING_FOR_STATE_ACK = 3;
>+const STATE_READY = 4;

State machines are a tricky thing.  Can you please comment here which methods
we can call in which states, and which methods transition us between which
states?  Bonus points if you can make the state and method names line up.

>+PushService.prototype = {

One thing that's pretty unclear as one reads this code is, which functions can
be implicitly invoked by others (either the mm or the push server)?  For
example, it's not obvious to me that the "registrations()" function is what's
called by Push:Registrations.

Can you please add documentation to improve this?

>+  observe: function observe(aSubject, aTopic, aData) {
>       [...]
>+      case "nsPref:changed":
>+        if (aData == "services.push.serverURL") {
>+          debug("services.push.serverURL changed! Stopping websocket. new value " + this._prefs.get("serverURL"));
>+          this._resetWS();
>+        }
>+        break;
>+      case "timer-callback":
>+        if (aSubject == this._requestTimeoutTimer) {
>+          for (var channelID in this._pendingRequests) {

Assert here that we have at least one pendingRequest?  Or otherwise cancel the
timer if the pending list is empty?  We don't want to get into a situation
where this timer fires forever.

>+  set _UAID(newID) {
>+    if (typeof(newID) !== "string")
>+      throw "Got invalid (non-string UAID " + newID;

Close paren.

>+  init: function() {
>     [...]
>+    this._db.getAllChannelIDs(function(channelIDs) {
>+      if (channelIDs.length > 0) {
>+        debug("Found registered channelIDs. Starting WebSocket");
>+        this._beginWSSetup();
>+      }
>+    }.bind(this), function(error) {
>+      debug("db error " + error);
>+    });

Sorry to keep nitting on whitespace.  This would be a lot clearer to me if it
was indented as

>     this._db.getAllChannelIDs(
>       function(channelIDs) {
>         if (channelIDs.length > 0) {
>           debug("Found registered channelIDs. Starting WebSocket");
>           this._beginWSSetup();
>         }
>       }.bind(this),
>       function(error) {
>         debug("db error " + error);
>       }
>    );

>+    this._prefs.observe("serverURL", this);

I think it'd be clearer to call _resetWS() directly, if that's all you're
trying to do with this.

>+  _resetWS: function() {
>+    debug("resetWS()");
>+    this._currentState = STATE_START;
>+    if (this._wsListener)
>+      this._wsListener._pushService = undefined;

I think we canonically use null for this sort of thing, unless there's some
reason to prefer undefined.

>+    try {
>+        this._ws.close(0, null);
>+    } catch (e) {}
>+    this._ws = undefined;

ibid.

_resteWS seems to close the websocket, but does not start it up again.  Perhaps
shutDownWS() or closeWS() would be a better name.  But then I don't see why we
call it from init(), when presumably _ws is already cleared.

>+  shutdown: function() {
>+    debug("shutdown()");
>+    this._db.close();
>+    this._db = null;
>+
>+    // all pending requests (ideally none) are dropped at this point we
>+    // shouldn't have any applications performing registration/unregistration
>+    // or receiving notifications

Run-on sentence.

>+  // aStatusCode is an NS error from Components.results
>+  socketError: function(aStatusCode) {
>+    debug("socketError(): Retry in " + this._retryTimeout + " Try number " + this._retryFailCount);
>+    if (!this._retryTimeoutTimer)
>+      this._retryTimeoutTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
>+
>+    this._retryTimeoutTimer.init(this, this._retryTimeout, Ci.nsITimer.TYPE_ONE_SHOT);
>+    this._retryFailCount++;
>+    this._retryTimeout = this._prefs.get("retryBaseInterval") * Math.pow(2, this._retryFailCount);

Why save this._retryTimeout?  It's only used right here, and we can calculate
it based on _retryFailCount.

>+  },

>+  _beginWSSetup: function() {
>+    debug("beginWSSetup()");
>+    if (this._currentState != STATE_START) {
>+      debug("_beginWSSetup: Not in start state! Current state " + this._currentState);
>+      return;
>+    }
>+
>+    var serverURL = this._prefs.get("serverURL");
>+    if (!serverURL) {
>+      debug("No services.push.serverURL found!");
>+      throw NS_ERROR_MALFORMED_URI;
>+    }

This suggests to me even more strongly that we shouldn't have a default push
server URI specified in the prefs.

>+    var uri = Services.io.newURI(serverURL, null, null);

I think newURI can throw, fyi.  I don't know if you care about catching that
exception here.

>+    try {
>+      if (uri.scheme === "wss") {
>+        this._ws = Cc["@mozilla.org/network/protocol;1?name=wss"].createInstance(Ci.nsIWebSocketChannel);
>+      }
>+      else if (uri.scheme === "ws") {
>+        throw "Push over an insecure connection (ws://) is not allowed!";
>+      }
>+      else {
>+        throw "Unknown scheme!";
>+      }
>+    } catch (e) {
>+      debug("Exception when creating WebSocketChannel " + e);
>+      this._resetWS();
>+      return;
>+    }

This use of a try/catch is confusing because it's not clear whether you expect
the createInstance to throw, or whether you're using try/catch to handle the
simple control flow around uri.scheme.  A better way to write this would be
something like:

     if (uri.scheme === "wss") {
       // createInstance, protected by try/catch if necessary.  I don't think
       // it's necessary?
     }
     else {
       // You could give a different debug message for ws:// versus other
       // schemes if you want.
       debug("Can't create WebSocketChannel with scheme: " + uri.scheme);
       this._resetWS();
       return;
     }

>+    debug("serverURL: " + uri.spec);
>+    this._wsListener = new PushWebSocketListener(this);
>+    this._ws.protocol = "push-notification";
>+    this._ws.asyncOpen(uri, serverURL, this._wsListener, null);
>+    this._currentState = STATE_WAITING_FOR_OPEN;
>+  },
>+
>+  _beginPushHandshake: function() {
>+    debug("beginPushHandshake()");
>+    if (this._currentState != STATE_WAITING_FOR_OPEN) {
>+      debug("NOT in STATE_WAITING_FOR_OPEN. Current state " + this._currentState + ". Skipping");
>+      if (!this._ws)
>+        throw "ws was still undefined";
>+    }

Don't you mean to have a |return| here like in the other state machine checks?

Why are we throwing an exception if !this._ws?  Does throwing an exception from
PushWebSocketListener.onStart have some particular meaning?

>+  _handleHelloReply: function(reply) {
>+    debug("handleHelloReply()");
>+    if (this._currentState != STATE_WAITING_FOR_HELLO) {
>+      debug("Unexpected state (expected STATE_WAITING_FOR_HELLO) " + this._currentState);
>+      this._resetWS();
>+      return;
>+    }
>+
>+    if (typeof reply.uaid !== "string") {
>+      debug("No UAID received or non string UAID received");
>+      this._resetWS();
>+      return;
>+    }
>+
>+    function finishHandshake() {
>+      this._UAID = reply.uaid;

It seems like we shouldn't allow reply.uaid == '', since that has a special
meaning.

>+      this._currentState = STATE_READY;
>+      this._processNextInQueue();
>+    }
>+
>+    if (this._UAID && this._UAID != reply.uaid) {

What if !this._UAID && !reply.uaid?  Also, what if this._UAID && reply.uaid?
Do we place any limits on the UAID sent by the server?  (E.g. what if it sends
a 10mb string?)

>+      debug("got new UAID: all re-register");
>+      this._dropRegistrations()
>+        .then(function() {
>+          // apps that have no prior registrations
>+          // but are in the pending queue, won't get
>+          // a push-register, which is correct.

Nit: s/, won't/ won't/

>+          this._notifyAllAppsRegister();
>+          finishHandshake.bind(this)();
>+        }.bind(this), function(error) {
>+          debug("Error deleting all registrations. SHOULD NEVER HAPPEN!");
>+          this._resetWS();
>+          return;
>+        }.bind(this));

Same whitespace nit as before; this is hard to parse.  But yay promises.

>+  _handleNotificationReply: function(reply) {
>+    debug("handleNotificationReply()");
>+    var update;

Please move this into the loop.

>+
>+    if (typeof reply.updates !== 'object') {
>+      debug("No 'updates' field in response. Type = " + typeof reply.updates);
>+      return;
>+    }
>+
>+    debug("Reply updates: " + reply.updates.length);
>+    for (var i = 0; i < reply.updates.length; i++) {
>+      update = reply.updates[i];
>+      debug("Update: " + update.channelID + ": " + update.version);
>+      if (typeof update.channelID !== "string") {
>+        debug("Invalid update literal at index " + i);
>+        continue;
>+      }
>+
>+      if (!update.version) {
>+        debug("update.version does not exists");

s/exists/exist

But surely version 0 should be acceptable, right?  If so, we need to change
this conditional.

>+        continue;
>+      }
>+
>+      var version = update.version;
>+
>+      if (typeof version === "string") {
>+        version = parseInt(version, 10);
>+      }
>+
>+      if (typeof version === "number") {
>+        // FIXME(nsm): this relies on app update notification being infallible!
>+        // eventually fix this
>+        this._receivedUpdate(update.channelID, version);
>+        this._sendAck(update.channelID, version);
>+      }

If the version isn't a number, we don't ack it, but we also don't send an error
to the push server?  Then won't the push server keep pinging us?

>+  /*
>+   * Must be used only by request/response style calls
>+   * over the websocket.
>+   */
>+  _request: function(action, data) {

Can we call this _sendRequest, to match _sendAck and _send?

>+    debug("request() " + action);
>+    if (typeof data.channelID !== "string") {
>+      throw "No string channelID";
>+    }
>+    var deferred = Promise.defer();
>+
>+    if (Object.getOwnPropertyNames(this._pendingRequests).length == 0) {
>+      // start the timer since we now have atleast one request

s/atleast/at least

>+      if (!this._requestTimeoutTimer)
>+        this._requestTimeoutTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
>+      this._requestTimeoutTimer.init(this, this._requestTimeout, Ci.nsITimer.TYPE_REPEATING_SLACK);
>+    }
>+
>+    this._pendingRequests[data.channelID] = { deferred: deferred, ctime: Date.now() };
>+
>+    this._send(action, data);
>+    return deferred.promise;
>+  },
>+
>+  _send: function(action, data) {
>+    debug("send()");
>+    this._requestQueue.push([action, data]);
>+    debug("Queued " + action);
>+    this._processNextInQueue();
>+  },
>+
>+  _processNextInQueue: function() {

Can we say _processNextRequestInQueue?  It's marginally longer, but a lot more clear.

>+  receiveMessage: function(aMessage) {
>+    debug("receiveMessage(): " + aMessage.name);
>+
>+    if (["Push:Register", "Push:Unregister", "Push:Registrations"].indexOf(aMessage.name) == -1) {

This array is |messages| in Init().  Perhaps we shouldn't duplicate it.

>+      debug("Invalid");

Nit: Maybe add a bit more context here?  Or you could add the current function
name to the output of debug() with a bit of hackery.

>+      return null;

Plain |return|?

>+    }
>+
>+    let mm = aMessage.target.QueryInterface(Ci.nsIMessageSender);
>+    let json = aMessage.data;
>+    this[aMessage.name.slice("Push:".length).toLowerCase()](json, mm);
>+  },
>+
>+  // aPageRecord is a object sent by navigator.pushNotification, identifying
>+  // the sending page and other fields
>+  register: function(aPageRecord, aMessageManager) {
>+    debug("register()");
>+
>+    let uuidGenerator = Cc["@mozilla.org/uuid-generator;1"].getService(Ci.nsIUUIDGenerator);

Looking over  the implementation of generateUUID, it does not appear to use a
secure RNG on all platforms (certainly not Android/B2G).

It's problematic if an attacker can guess our CHIDs, right?  Particularly if
it's easy to guess a push endpoint given a CHID, but even if not.

>+    // generateUUID() gives a UUID surrounded by {...}
>+    // slice them off
>+    var channelID = uuidGenerator.generateUUID().toString().slice(1).slice(0, -1);
>+    this._request("register", {channelID: channelID})
>+      .then(this.onRegisterSuccess.bind(this, aPageRecord, channelID),
>+            this.onRegisterError.bind(this, aPageRecord, aMessageManager))
>+      .then(function(message) {
>+        aMessageManager.sendAsyncMessage("PushService:Register:OK", message);
>+      }, this.onRegisterErrorNotifyApp.bind(this, aPageRecord, aMessageManager));

I don't see why onRegisterErrorNotifyApp should be its own function when it
does essentialy the same thing as the anonymous function that calls
Register:OK.

>+  },
>+
>+  onRegisterSuccess: function(aPageRecord, generatedChannelID, data) {

This is a private function, so prefix with _, here and for the other onX
functions?

Also, "onX" functions look like event handlers.  Maybe _handleRegisterSuccess()
would be clearer.  Except we can't call it _handle, because then push servers
could invoke it!

>+    debug("onRegisterSuccess()");
>+    var deferred = Promise.defer();
>+    var message = { requestID: aPageRecord.requestID };
>+
>+    if (typeof data.channelID !== "string") {
>+      debug("Invalid channelID " + message);
>+      message["error"] = "Invalid channelID received";
>+      throw message;

Who listens to this exception and the others in this function?

>+    this._updatePushRecord(record)
>+      .then(function() {
>+        message["pushEndpoint"] = data.pushEndpoint;
>+        deferred.resolve(message);
>+      }, function(error) {
>+        // unable to save, we'll just let server GC
>+        // don't bother to unregister

Why not bother to unregister?

>+  onRegisterError: function(aPageRecord, aMessageManager, reply) {
>+    debug("onRegisterError()");
>+    switch (reply.status) {
>+      case 409: // CONFLICT CHID already exists

Perhaps this should be a constant.

What is the use-case for channel ID conflict retrying?  That is, under what
circumstances do you expect to get collisions on the channel IDs?  I ask
because implementing this correctly on the push server is non-trivial, and
because if I can guess another client's channel ID, I can probably guess its
push endpoint, which is bad.

>+        if (typeof aPageRecord._attempts !== "number")
>+          aPageRecord._attempts = 0;
>+
>+        if (aPageRecord._attempts < kCONFLICT_RETRY_ATTEMPTS) {
>+          aPageRecord._attempts++;
>+          // since register is async, its ok to launch it in a callback
>+          debug("CONFLICT: trying again");
>+          this.register(aPageRecord, aMessageManager);
>+          return;
>+        }
>+        throw { requestID: aPageRecord.requestID, error: "conflict" };
>+      default:
>+        debug("General failure " + reply.status);
>+        throw { requestID: aPageRecord.requestID, error: reply.error };

Who handles this exception?

>+  unregister: function(aPageRecord, aMessageManager) {
>+    debug("unregister()");
>+
>+    var fail = function(error) {
>+      debug("unregister() fail() error " + error);
>+      var message = {requestID: aPageRecord.requestID, error: error};
>+      aMessageManager.sendAsyncMessage("PushService:Unregister:KO", message);
>+    }
>+
>+    this._db.getByPushEndpoint(aPageRecord.pushEndpoint, function(record) {

This lets an app bypass our securty model.

In our model, the parent can't trust anything it hears from a child process.
In particular, it can't trust that the child is not sending us someone else's
push endpoint.

To meet our nominal security goals, an app should only be able to mess with its
own stuff.

Anyway, I don't think this is a big deal here; we can fix it, or not.  It's
unlikely that a malicious app will get another app's push endpoint, and messing
with someone else's push notifications is probably not the worst thing you
could do.  But it's something to keep in mind.

>+  onRegistrationsSuccess: function(aPageRecord, aMessageManager, aRegistrations) {

Private function: _, please, or move into registrations().

>+    var registrations = [];
>+    aRegistrations.forEach(function(pushRecord) {

Nit: The loop variable should relate to the list name (or vice versa).

>+      registrations.push({
>+          __exposedProps__: { pushEndpoint: 'r', version: 'r' },
>+          pushEndpoint: pushRecord.pushEndpoint,
>+          version: pushRecord.version
>+      });
>+    });
>+    aMessageManager.sendAsyncMessage("PushService:Registrations:OK", { requestID: aPageRecord.requestID, registrations: registrations });
>+  },
>+
>+  onRegistrationsError: function(aPageRecord, aMessageManager) {

Private function.

>diff --git a/mobile/android/installer/package-manifest.in b/mobile/android/installer/package-manifest.in
>--- a/mobile/android/installer/package-manifest.in
>+++ b/mobile/android/installer/package-manifest.in

> ; [Default Preferences]
> ; All the pref files must be part of base to prevent migration bugs
> @BINPATH@/@PREF_DIR@/mobile.js
> @BINPATH@/@PREF_DIR@/mobile-branding.js
> @BINPATH@/@PREF_DIR@/channel-prefs.js
>+@BINPATH@/@PREF_DIR@/push-prefs.js

I don't know what this does, but that's probably my ignorance.  Where does
push-prefs.js come from?
Attachment #729200 - Flags: review?(justin.lebar+bug)
Comment on attachment 725660 [details] [diff] [review]
Patch 4: Registration tests

Removing tests patch for now. With browser support disabled, the mochitests do not pass as is on B2G because the pywebsocket server isn't started by the test harness.
There is a way to get them running on b2g desktop, but it will need some modifications.
Attachment #725660 - Attachment is obsolete: true
Attachment #725660 - Flags: review?(doug.turner)
(In reply to Justin Lebar [:jlebar] from comment #60)
> Comment on attachment 729200 [details] [diff] [review]
> Patch 2: The core push implementation
> 
> Sorry this review took me so long, Nikhil.
> 
> This patch looks pretty sane to me, but there are enough edge cases that I'm
> worried about that I'd like to take another look.
> 
> I also have a lot of style / naming nits.  Part of the reason the review took
> me so long was that I had to spend a lot of time understanding how the pieces
> fit together.  I hope some of the suggestions I made here will make this code
> easier to digest.

Addressing some of your points here and others in the refreshed patch.

> 
> Another high-level comment is, it's not clear to me what's the rule here for
> throwing an exception versus returning silently.

Functions which are the success cases of a promise may throw to make the promise fail.
I'll clean up other instances of throw.

> 
> >+  init: function(aWindow) {
> >+    debug("init()");
> >+
> >+    let principal = aWindow.document.nodePrincipal;
> >+
> >+    this._pageURL = principal.URI;
> 
> If a page changes its URL using pushState or replaceState, I think web
> authors
> will expect the pageURL sent to the server to reflect this.  So maybe we
> shouldn't do this (although using the principal as you do here greatly
> simplifies the matter for data: URIs).
> 
> What is pageURL used for?

The pageURL is only used by the system messages API and never sent anywhere.
If this page registered for system messages, then this page is the one that
should have the mozSetMessageHandler. I'm not sure how this works with pushState,
maybe Fabrice can comment.

> 
> >diff --git a/dom/push/src/PushService.js b/dom/push/src/PushService.js
> >new file mode 100644
> >--- /dev/null
> >+++ b/dom/push/src/PushService.js
> 
> >+this.PushWebSocketListener.prototype = {
> 
> The reason this isn't folded into the PushService class is because you want
> to
> be able to stop the websocket from sending messages to the push service by
> clearing PushWebSocketListener._pushService?

Yes
> 
> If that's right, then it seems to me that this would be clearer if we made
> PushWebSocketListener a thinner proxy to PushService.  For example, we could
> say that listener method X calls pushService method webSocketX with minimal
> pre-processing.
> 
> Right now it's not clear how e.g. beginPushHandshake gets called, but if it
> was
> called webSocketOnStart, that would probably help.

Sounds good.
> 
> >+  onServerClose: function(context, aStatusCode, aReason) {
> >+    if (!this._pushService) return;
> >+    debug("onServerClose() " + aStatusCode + " " + aReason);
> >+    // 1000 is the normal close
> >+    if (aStatusCode == 1000 && Object.getOwnPropertyNames(this._pushService._pendingRequests).length > 0) {
> >+      // this should never happen. A successful close cannot have pending
> >+      // requests since the server should've responded to them before the
> >+      // connection was closed.
> >+      this._resetWS();
> >+      this._beginWSSetup();
> 
> this._pushService, I presume?  Also, maybe we should add a debug message here
> if it should never happen?
> 
> Is it possible to add a testcase which exercises this branch?
Will tweak the test server to explicitly do this and see what happens.

> 
> I don't understand why we _beginWSSetup() here.  The server just did
> something
> bad, but it clearly wanted to close the connection.  Why re-connect?

Good point, but what would the recovery path be then. If the server is fixed
by the system administrator, I think we should resume normal operations again.
Should I explicitly delay the reconnection?

> State machines are a tricky thing.  Can you please comment here which methods
> we can call in which states, and which methods transition us between which
> states?  Bonus points if you can make the state and method names line up.

Could you explain what you mean by making the line up?
> _resteWS seems to close the websocket, but does not start it up again. 
> Perhaps
> shutDownWS() or closeWS() would be a better name.  But then I don't see why
> we
> call it from init(), when presumably _ws is already cleared.

resetWS() is more of "set things to a blank slate so we can start from scratch later".
Called from init() only to avoid duplicating the reset logic in init (setting currentState and
clearing the requests)

> >+      this._currentState = STATE_READY;
> >+      this._processNextInQueue();
> >+    }
> >+
> >+    if (this._UAID && this._UAID != reply.uaid) {
> 
> What if !this._UAID && !reply.uaid?  Also, what if this._UAID && reply.uaid?
> Do we place any limits on the UAID sent by the server?  (E.g. what if it
> sends
> a 10mb string?)

If it sent a 10mb string, I would also be concerned about the performance issues
in JSON.parse. I'm not sure how to solve that problem, but a length should be enforceable on
the uaid.

> 
> If the version isn't a number, we don't ack it, but we also don't send an
> error
> to the push server?  Then won't the push server keep pinging us?

Yes, but the protocol only has positive acks, so there is no way to tell
the server it made an error. We just have to keep ignoring it.
> 
> >+    }
> >+
> >+    let mm = aMessage.target.QueryInterface(Ci.nsIMessageSender);
> >+    let json = aMessage.data;
> >+    this[aMessage.name.slice("Push:".length).toLowerCase()](json, mm);
> >+  },
> >+
> >+  // aPageRecord is a object sent by navigator.pushNotification, identifying
> >+  // the sending page and other fields
> >+  register: function(aPageRecord, aMessageManager) {
> >+    debug("register()");
> >+
> >+    let uuidGenerator = Cc["@mozilla.org/uuid-generator;1"].getService(Ci.nsIUUIDGenerator);
> 
> Looking over  the implementation of generateUUID, it does not appear to use a
> secure RNG on all platforms (certainly not Android/B2G).
> 
> It's problematic if an attacker can guess our CHIDs, right?  Particularly if
> it's easy to guess a push endpoint given a CHID, but even if not.

The attacker could cause some griefing and lower the battery life of the phone,
significantly. I wonder if B2G has a secure RNG for crypto and if that could be used for this.
Follow up bug?

> 
> >+    // generateUUID() gives a UUID surrounded by {...}
> >+    // slice them off
> >+    var channelID = uuidGenerator.generateUUID().toString().slice(1).slice(0, -1);
> >+    this._request("register", {channelID: channelID})
> >+      .then(this.onRegisterSuccess.bind(this, aPageRecord, channelID),
> >+            this.onRegisterError.bind(this, aPageRecord, aMessageManager))
> >+      .then(function(message) {
> >+        aMessageManager.sendAsyncMessage("PushService:Register:OK", message);
> >+      }, this.onRegisterErrorNotifyApp.bind(this, aPageRecord, aMessageManager));
> 
> I don't see why onRegisterErrorNotifyApp should be its own function when it
> does essentialy the same thing as the anonymous function that calls
> Register:OK.

Legacy code :/, will fix.
> 
> >+    debug("onRegisterSuccess()");
> >+    var deferred = Promise.defer();
> >+    var message = { requestID: aPageRecord.requestID };
> >+
> >+    if (typeof data.channelID !== "string") {
> >+      debug("Invalid channelID " + message);
> >+      message["error"] = "Invalid channelID received";
> >+      throw message;
> 
> Who listens to this exception and the others in this function?

Promises

> 
> >+    this._updatePushRecord(record)
> >+      .then(function() {
> >+        message["pushEndpoint"] = data.pushEndpoint;
> >+        deferred.resolve(message);
> >+      }, function(error) {
> >+        // unable to save, we'll just let server GC
> >+        // don't bother to unregister
> 
> Why not bother to unregister?

Unregistration was much more involved before, again, will fix.

> What is the use-case for channel ID conflict retrying?  That is, under what
> circumstances do you expect to get collisions on the channel IDs?  I ask
> because implementing this correctly on the push server is non-trivial, and
> because if I can guess another client's channel ID, I can probably guess its
> push endpoint, which is bad.

The very very rare circumstance that a UUID collision occurs. I don't understand how
implementing this on the server is non-trivial, is it simply because of the scale?
If this isn't required, we'll discuss a drop with TEF in Madrid, for now I'll keep it in.

> 
> >+        if (typeof aPageRecord._attempts !== "number")
> >+          aPageRecord._attempts = 0;
> >+
> >+        if (aPageRecord._attempts < kCONFLICT_RETRY_ATTEMPTS) {
> >+          aPageRecord._attempts++;
> >+          // since register is async, its ok to launch it in a callback
> >+          debug("CONFLICT: trying again");
> >+          this.register(aPageRecord, aMessageManager);
> >+          return;
> >+        }
> >+        throw { requestID: aPageRecord.requestID, error: "conflict" };
> >+      default:
> >+        debug("General failure " + reply.status);
> >+        throw { requestID: aPageRecord.requestID, error: reply.error };
> 
> Who handles this exception?

Again, promises, will comment.

> 
> >+  unregister: function(aPageRecord, aMessageManager) {
> >+    debug("unregister()");
> >+
> >+    var fail = function(error) {
> >+      debug("unregister() fail() error " + error);
> >+      var message = {requestID: aPageRecord.requestID, error: error};
> >+      aMessageManager.sendAsyncMessage("PushService:Unregister:KO", message);
> >+    }
> >+
> >+    this._db.getByPushEndpoint(aPageRecord.pushEndpoint, function(record) {
> 
> This lets an app bypass our securty model.
> 
> In our model, the parent can't trust anything it hears from a child process.
> In particular, it can't trust that the child is not sending us someone else's
> push endpoint.
> 
> To meet our nominal security goals, an app should only be able to mess with
> its
> own stuff.
Good observation! I can add a check in unregister to compare the manifestURL of
the app against the manifestURL associated with the record, rather than blindly
deleting.

> >diff --git a/mobile/android/installer/package-manifest.in b/mobile/android/installer/package-manifest.in
> >--- a/mobile/android/installer/package-manifest.in
> >+++ b/mobile/android/installer/package-manifest.in
> 
> > ; [Default Preferences]
> > ; All the pref files must be part of base to prevent migration bugs
> > @BINPATH@/@PREF_DIR@/mobile.js
> > @BINPATH@/@PREF_DIR@/mobile-branding.js
> > @BINPATH@/@PREF_DIR@/channel-prefs.js
> >+@BINPATH@/@PREF_DIR@/push-prefs.js
> 
> I don't know what this does, but that's probably my ignorance.  Where does
> push-prefs.js come from?

sigh, legacy code. Thanks for pointing it out.
> The pageURL is only used by the system messages API and never sent anywhere.
> If this page registered for system messages, then this page is the one that
> should have the mozSetMessageHandler. I'm not sure how this works with
> pushState, maybe Fabrice can comment.

Okay, let's make sure we figure this bit out.

>> I don't understand why we _beginWSSetup() here.  The server just did
>> something bad, but it clearly wanted to close the connection.  Why
>> re-connect?
>>
> Good point, but what would the recovery path be then. If the server is fixed
> by the system administrator, I think we should resume normal operations
> again.  Should I explicitly delay the reconnection?

Suppose the server misbehaves and closes the connection when it shouldn't, but
we don't have any pending requests.  What's the recovery path for this?

>> State machines are a tricky thing.  Can you please comment here which methods
>> we can call in which states, and which methods transition us between which
>> states?  Bonus points if you can make the state and method names line up.
>
> Could you explain what you mean by making the line up?

I just mean it would be helpful if it was obvious which states corresponded to
which methods.  So for example, if we have a few states WAITING_FOR_X, maybe
we'd have a few methods receivedX().

> If it sent a 10mb string, I would also be concerned about the performance
> issues in JSON.parse.

Absolutely, but at least we don't have to keep this string around indefinitely
in the prefs file.

> The attacker could cause some griefing and lower the battery life of the
> phone, significantly. I wonder if B2G has a secure RNG for crypto and if that
> could be used for this.  Follow up bug?

Sure, so long as it blocks tef.

>> What is the use-case for channel ID conflict retrying?
>>
> The very very rare circumstance that a UUID collision occurs.

Assuming the CHIDs are generated with full entropy, this should be about as
likely as getting a random MD5 collision.  It's just as likely as two people
choosing the same AES-128 keys for SSL and being able to read each other's
secure communications.

You are /much/ more likely to get a random bit error on the device and execute
arbitrary code than you are to get a uuid collision and execute this code.  So
having this in the protocol seems pretty paranoid.

Now, we've established that the CHIDs are /not/ generated with full entropy.
But then, we can't say that UUID collisions are very rare, so we have no way to
know how many times to retry!  And anyway at that point retrying is probably a
bad strategy.

If we want to force UAs to choose truly random CHIDs, ISTM that we should get
rid of this bit in the protocol.  If we don't want to do that, then maybe we
should make the push server choose the CHID.

> I don't understand how implementing this on the server is non-trivial, is it
> simply because of the scale?

Yes.  To do it correctly there needs to be a set of machines with strong
locking semantics that contain all known CHIDs in memory.

>> If this isn't required, we'll discuss a drop with TEF in Madrid, for now
>> I'll keep it in.

We can unilaterally drop it and assume that any CHID-collision messages we get
from the server are bogus.
(In reply to Justin Lebar [:jlebar] from comment #63)
> 
> >> I don't understand why we _beginWSSetup() here.  The server just did
> >> something bad, but it clearly wanted to close the connection.  Why
> >> re-connect?
> >>
> > Good point, but what would the recovery path be then. If the server is fixed
> > by the system administrator, I think we should resume normal operations
> > again.  Should I explicitly delay the reconnection?
> 
> Suppose the server misbehaves and closes the connection when it shouldn't,
> but
> we don't have any pending requests.  What's the recovery path for this?

Hmmm, now that I think about this, its possible for a server to cleanly close
the connection at any time. It may do this if its running low on resources and discards
a few clients based on some statistical model saying no notifications are expected for those
clients immediately. The client should always try to reconnect, because the client's goal is
to get notifications as soon as possible. So I'd remove the check and always reconnect on a
successful close. Does that sound ok?

> 
> >> State machines are a tricky thing.  Can you please comment here which methods
> >> we can call in which states, and which methods transition us between which
> >> states?  Bonus points if you can make the state and method names line up.
> >
> > Could you explain what you mean by making the line up?
> 
> I just mean it would be helpful if it was obvious which states corresponded
> to
> which methods.  So for example, if we have a few states WAITING_FOR_X, maybe
> we'd have a few methods receivedX().

Ok
> 
> > If it sent a 10mb string, I would also be concerned about the performance
> > issues in JSON.parse.
> 
> Absolutely, but at least we don't have to keep this string around
> indefinitely
> in the prefs file.

Constrain to 36 characters? (length of UUID4 with '-')
> 
> > The attacker could cause some griefing and lower the battery life of the
> > phone, significantly. I wonder if B2G has a secure RNG for crypto and if that
> > could be used for this.  Follow up bug?
> 
> Sure, so long as it blocks tef.

Ok
(In reply to Nikhil Marathe from comment #62)
> 
> The pageURL is only used by the system messages API and never sent anywhere.
> If this page registered for system messages, then this page is the one that
> should have the mozSetMessageHandler. I'm not sure how this works with
> pushState,
> maybe Fabrice can comment.

The exact behavior depends on the "embedding" that actually opens the apps or navigates them. In b2g, gaia does that when it receives a system message related mozChromeEvent. The relevant code is at https://mxr.mozilla.org/gaia/source/apps/system/js/window_manager.js#1397 - If the app receiving the system message already started by has another page loaded, we change the src attribute of the app iframe to "redirect" it to the page dealing with the system message.
Changes as suggested. This has been tested on the success case, but not tested completely yet.

Will update the UDP patch soon
Attachment #729200 - Attachment is obsolete: true
Attachment #729636 - Flags: review?(justin.lebar+bug)
I realize I'm a bit late to the game here, but I'm wondering if this UDP-based protocol has provision for UDP "keep-alive" packets, to keep any intermediate NAT-mappings alive.  (e.g. my home router is configured to discard UDP NAT mappings that have been idle for >1800s.)  I've looked through the linked docs above and don't see anything mentioned.

Some VPN implementations that allow IPsec-over-NAT-UDP do this by sending a 1-byte (0xFF) UDP packet over the link periodically, which is then discarded by the recipient.
The UDP bits are meant to be used only when the phone is connected to a carrier's cell network.
I haven't poked around cell networks in a while, but in the past carriers have used NAT as well, which (I assume) would be subject to the same problem.
Attached patch Patch 3: UDP wakeup support (obsolete) — Splinter Review
Updated to apply against Patch 2, with more comments.
Attachment #729201 - Attachment is obsolete: true
Attachment #729201 - Flags: review?(justin.lebar+bug)
Attachment #729797 - Flags: review?(justin.lebar+bug)
(In reply to Mike Habicher [:mikeh] from comment #69)
> I haven't poked around cell networks in a while, but in the past carriers
> have used NAT as well, which (I assume) would be subject to the same problem.

The assumption is that the server we're communicating with will live inside the cell carrier's NAT.  We assume that precisely because we don't want to try to figure out NAT busting.
(In reply to Mike Habicher [:mikeh] from comment #69)
> I haven't poked around cell networks in a while, but in the past carriers
> have used NAT as well, which (I assume) would be subject to the same problem.
To expand a little on what Justin said, this is something intended for carrier's use. So the UDP originator will/should be past any possible NAT. It will/should have direct visibility to the phone in other words.
(In reply to Antonio Manuel Amaya Calvo from comment #72)
> (In reply to Mike Habicher [:mikeh] from comment #69)
> > I haven't poked around cell networks in a while, but in the past carriers
> > have used NAT as well, which (I assume) would be subject to the same problem.
> To expand a little on what Justin said, this is something intended for
> carrier's use. So the UDP originator will/should be past any possible NAT.
> It will/should have direct visibility to the phone in other words.

To help clarify this point, I suggest you read this article:

  http://www.lacofa.es/index.php/general/notification-server?lang=en

in which we explain why and how :)
Comment on attachment 729199 [details] [diff] [review]
Patch 1: Push IDL

Review of attachment 729199 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/interfaces/push/Makefile.in
@@ +8,5 @@
> +VPATH          = @srcdir@
> +
> +include $(DEPTH)/config/autoconf.mk
> +
> +GRE_MODULE     = 1

GRE_MODULE does nothing and was removed in bug 852534. This renders this Makefile.in empty since it contains only boilerplate. Makefile.in are not required to exist any more. The important file is moz.build. Please remove this file.
Attached patch Patch 1: Push IDL (obsolete) — Splinter Review
Sorry, navigator.push.registrations() comment was inconsistent with what it actually returns.
Attachment #729199 - Attachment is obsolete: true
Attachment #729199 - Flags: review?(justin.lebar+bug)
Attachment #729199 - Flags: review?(jonas)
Attachment #730194 - Flags: review?(justin.lebar+bug)
Attachment #730194 - Flags: review?(jonas)
Comment on attachment 729636 [details] [diff] [review]
Patch 2: The core push implementation

This is a lot easier for me to understand; thanks.  These are mostly nits, but
at least one isn't, so I'm holding off on r+ until we figure it out.

"PC" means punctuation and/or capitalization.

>+ * A proxy between the PushService and the WebSocket, the listener is used so
>+ * that the PushService can silence messages from the WebSocket by setting
>+ * PushWebSocketListener._pushService to null.

Thanks for the comment.  But nit: Comma splice.

>+function PushService()
>+{
>+  debug("PushService Constructor . ");

s/ . /./?

>+  observe: function observe(aSubject, aTopic, aData) {
>+    switch (aTopic) {
>+      case "app-startup":
>+        let appInfo = Cc["@mozilla.org/xre/app-info;1"];
>+        let xre = Ci.nsIXULRuntime;
>+        if (!appInfo ||
>+            appInfo.getService(xre).processType != xre.PROCESS_TYPE_DEFAULT) {
>+          return;
>+        }

Hmm.  Is this if statement just for paranoia, or does this code actually get
run in the child process?

We went through a lot of work to minimize the amount of JS that's parsed and
run on child process startup.  It seems like we'd want to avoid parsing this
big file, too.  I don't think we can regress the startup numbers and then fix
it in a follow-up.

This is what's holding r+.

>+  _currentState: STATE_SHUTDOWN,

Nit: "Shutdown" is a noun, and frequently it's used (much to my chagrin) as a
verb.  But I think the adjective is still "shut down" or "shut-down".  That is,
STATE_SHUTDOWN looks like we're shutting down now, but what you mean is
STATE_SHUT_DOWN.

>+  init: function() {
>+    debug("init()");
>+    this._db = new PushDB(this);
>+    this._db.init(this);

Can we move PushDB.init into its constructor, or does it need to be outside for
some reason?

>+    this._db.getAllChannelIDs(function(channelIDs) {
>+      if (channelIDs.length > 0) {
>+        debug("Found registered channelIDs. Starting WebSocket");
>+        this._beginWSSetup();
>+      }
>+    }.bind(this),
>+    function(error) {
>+      debug("db error " + error);
>+    });

I think you may have missed this spacing nit from the earlier review.  If otoh
you feel strongly that this is the right indentation, I'm not going to fight
over it.

>+    // this is only really used for testing, where different tests require
>+    // connecting to slightly different URLs

PC

>+  shutdown: function() {

Please add a comment indicating how this and the other whitelisted handlers are
invoked.

>+    debug("shutdown()");
>+    this._db.close();
>+    this._db = null;
>+
>+    // All pending requests (ideally none) are dropped at this point. We
>+    // shouldn't have any applications performing registration/unregistration
>+    // or receiving notifications

PC

>+    this._shutdownWS();
>+
>+    debug("shutdown complete!");
>+  },
>+
>+  // aStatusCode is an NS error from Components.results
>+  _socketError: function(aStatusCode) {
>+    debug("socketError()");
>+    
>+    var retryTimeout = this._prefs.get("retryBaseInterval") *
>+                        Math.pow(2, this._retryFailCount);
>+    this._retryFailCount++;

Do you want to put a cap on this?  Otherwise it could become quite large,
potentially disabling push notifications on a device for days.

>+    debug("Retry in " + retryTimeout + " Try number " + this._retryFailCount);
>+
>+    if (!this._retryTimeoutTimer) {
>+      this._retryTimeoutTimer = Cc["@mozilla.org/timer;1"]
>+                                  .createInstance(Ci.nsITimer);
>+    }
>+
>+    this._retryTimeoutTimer.init(this,
>+                                 retryTimeout,
>+                                 Ci.nsITimer.TYPE_ONE_SHOT);

Is it possible for the timer to be running already here?

>+    if (uri.scheme === "wss") {
>+      this._ws = Cc["@mozilla.org/network/protocol;1?name=wss"]
>+                   .createInstance(Ci.nsIWebSocketChannel);
>+    }
>+    else if (uri.scheme === "ws") {
>+      debug("Push over an insecure connection (ws://) is not allowed!");
>+      return;
>+    }
>+    else {
>+      debug("Unsupported websocket scheme " + uri.scheme);
>+    }

return;

>+  _handleHelloReply: function(reply) {
>     [...]
>+    // a UUID4 formatted with '-' is 36 bytes long

PC

>+    if (reply.uaid.length > 36) {
>+      debug("UAID received from server was too long (should be a UUID4): " +
>+            reply.uaid);
>+      this._shutdownWS();
>+      return;
>+    }

If we're mandating UUID4, presumably we should check that it's properly
formatted as well?

>+    function finishHandshake() {
>+      this._UAID = reply.uaid;
>+      this._currentState = STATE_READY;
>+      this._processNextRequestInQueue();
>+    }
>+
>+    // by this point we've got a UAID from the server
>+    // that we are ready to accept.

PC

>+    // If we already had a valid UAID before, we've to
>+    // ask apps to re-register

Nit: The "we've" contraction used in this way (contracting "have" in "have to
X") is non-standard.

>+    if (this._UAID && this._UAID != reply.uaid) {
>+      debug("got new UAID: all re-register");
>+      this._dropRegistrations()
>+        .then(function() {
>+          // apps that have no prior registrations but are in the pending
>+          // queue won't get a push-register, which is correct.

PC

>+  _handleNotificationReply: function(reply) {
>+    debug("handleNotificationReply()");
>+    if (typeof reply.updates !== 'object') {
>+      debug("No 'updates' field in response. Type = " + typeof reply.updates);
>+      return;
>+    }
>+
>+    debug("Reply updates: " + reply.updates.length);
>+    for (var i = 0; i < reply.updates.length; i++) {
>+      var update = reply.updates[i];
>+      debug("Update: " + update.channelID + ": " + update.version);
>+      if (typeof update.channelID !== "string") {
>+        debug("Invalid update literal at index " + i);
>+        continue;
>+      }
>+
>+      if (update.version === undefined) {
>+        debug("update.version does not exist");
>+        continue;
>+      }
>+
>+      var version = update.version;
>+
>+      if (typeof version === "string") {
>+        version = parseInt(version, 10);
>+      }
>+
>+      if (typeof version === "number") {

I don't think this is right.

  > x = "abc";
  > y = parseInt(x, 10);
  > typeof y === "number"
  true

You probably want to test that version is a non-negative integer.

>+  _send: function(action, data) {
>+    debug("send()");
>+    this._sendRequestQueue.push([action, data]);
>+    debug("Queued " + action);
>+    this._processNextRequestInQueue();
>+  },
>+
>+  _processNextRequestInQueue: function() {
>+    debug("_processNextRequestInQueue()");
>+
>+    if (this._sendRequestQueue.length == 0) {
>+      debug("Request queue empty");
>+      return;
>+    }
>+
>+    if (this._currentState != STATE_READY) {
>+      if (!this._ws) {
>+        // this will end up calling processNextRequestInQueue()

PC

>+        this._beginWSSetup();
>+      }
>+      else {
>+        // we have a socket open
>+        // so we are just waiting for hello to finish
>+        // which will call processNextRequestInQueue()

PC

>+      }
>+      return;
>+    }
>+
>+    [action, data] = this._sendRequestQueue.shift();
>+    data.messageType = action;
>+    if (!this._ws) {
>+      // if our websocket is not ready and our state is STATE_READY
>+      // we may as well give up all assumptions about the world
>+      // and start from scratch again.
>+      // discard the message itself, let the timeout notify error to the app

PC.  Also please wrap paragraphs together into a block and separate
paragraphs with an empty line.

>+      debug("This should never happen!");
>+      this._shutdownWS();
>+    }
>+
>+    this._ws.sendMsg(JSON.stringify(data));
>+    // process next one as soon as possible

PC

>+  /**
>+   * Called on message from the child process
>+   * aPageRecord is an object sent by navigator.push, identifying
>+   * the sending page and other fields

PC and wrapping, please.

>+  register: function(aPageRecord, aMessageManager) {
>+    debug("register()");
>+
>+    let uuidGenerator = Cc["@mozilla.org/uuid-generator;1"]
>+                          .getService(Ci.nsIUUIDGenerator);
>+    // generateUUID() gives a UUID surrounded by {...}
>+    // slice them off

PC

>+    this._sendRequest("register", {channelID: channelID})
>+      .then(this._onRegisterSuccess.bind(this, aPageRecord, channelID),
>+            this._onRegisterError.bind(this, aPageRecord, aMessageManager))
>+      .then(function(message) {
>+        aMessageManager.sendAsyncMessage("PushService:Register:OK", message);
>+      },
>+      function() {
>+        aMessageManager.sendAsyncMessage("PushService:Register:KO", message);
>+      });

I still think this would be clearer as 

>       .then(
>          function(message) {
>           aMessageManager.sendAsyncMessage("PushService:Register:OK", message);
>          },
>          function() {
>           aMessageManager.sendAsyncMessage("PushService:Register:KO", message);
>          }
>       );

but whatever you prefer.

>+  /**
>+   * Exceptions thrown in _onRegisterSuccess are caught by the promise obtained
>+   * from _sendRequest, triggering the promise to be rejected instead.

Nit: s/triggering/causing/ would be better.  We don't usually trigger X to be
Y'ed; we cause X to be Y'ed, or we trigger X to Y.  (That is, it's a difference
of active vs. passive voice.)

>+    this._updatePushRecord(record)
>+      .then(function() {
>+        message["pushEndpoint"] = data.pushEndpoint;
>+        deferred.resolve(message);
>+      }, function(error) {
>+        // unable to save

PC

>+        this._sendRequest("unregister", {channelID: record.channelID});
>+        message["error"] = error;
>+        deferred.reject(message);
>+      });
>+
>+    return deferred.promise;
>+  },
>+
>+  /**
>+   * Exceptions thrown in _onRegisterError are caught by the promise obtained
>+   * from _sendRequest, triggering the promise to be rejected instead.

s/triggering/causing

>+   */
>+  _onRegisterError: function(aPageRecord, aMessageManager, reply) {
>+    debug("_onRegisterError()");
>+    switch (reply.status) {
>+      case 409: // CONFLICT CHID already exists
>+        if (typeof aPageRecord._attempts !== "number")
>+          aPageRecord._attempts = 0;
>+
>+        if (aPageRecord._attempts < kCONFLICT_RETRY_ATTEMPTS) {
>+          aPageRecord._attempts++;
>+          // since register is async, its ok to launch it in a callback

PC

>+          debug("CONFLICT: trying again");
>+          this.register(aPageRecord, aMessageManager);
>+          return;
>+        }
>+        throw { requestID: aPageRecord.requestID, error: "conflict" };
>+      default:
>+        debug("General failure " + reply.status);
>+        throw { requestID: aPageRecord.requestID, error: reply.error };
>+    }
>+  },
>+
>+  /**
>+   * Called on message from the child process

PC

>+   *
>+   * Why is the record being deleted from the local database before the server
>+   * is told?
>+   * Unregistration is for the benefit of the app and the AppServer

Empty line after "told?" if you want it to be a new paragraph.

>+  unregister: function(aPageRecord, aMessageManager) {

>+    this._db.getByPushEndpoint(aPageRecord.pushEndpoint, function(record) {
>+      // non-owner tried to unregister, say success, but don't do anything

PC

>+      if (record.manifestURL !== aPageRecord.manifestURL) {
>+        aMessageManager.sendAsyncMessage("PushService:Unregister:OK", {
>+          requestID: aPageRecord.requestID,
>+          pushEndpoint: aPageRecord.pushEndpoint
>+        });
>+        return;
>+      }
>+
>+      this._db.delete(record.channelID, function() {
>+        // let's be nice to the server and try to inform it, but we don't care
>+        // about the reply

PC

>+  _onRegistrationsSuccess: function(aPageRecord,
>+                                    aMessageManager,
>+                                    pushRecords) {

>+    if (this._retryTimeoutTimer)
>+      this._retryTimeoutTimer.cancel();
>+    // since we've had a successful connection
>+    // reset the retry fail count

PC

>+    function sendHelloMessage(ids) {
>+      // on success, ids is an array, on error its not

PC

>+  _wsOnMessageAvailable: function(context, message) {
>
>+    // A whitelist of protocol handlers. Add to these if new messages are added
>+    // in the protocol

PC

>+    var handlers = ["Hello", "Register", "Unregister", "Notification"];
>+
>+    // build up the handler name to call from messageType.
>+    // e.g. messageType == "register" -> _handleRegisterReply

PC

>+    if (typeof this[handler] !== "function") {
>+      debug("Handler whitelisted by not implemented! " + handler);

s/by/but/

>+  _wsOnServerClose: function(context, aStatusCode, aReason) {
>+    debug("wsOnServerClose() " + aStatusCode + " " + aReason);
>+    // 1000 is the normal close
>+    if (aStatusCode == 1000 && Object.keys(this._pendingRequests).length > 0) {
>+      // this should never happen. A successful close cannot have pending
>+      // requests since the server should've responded to them before the
>+      // connection was closed.

PC
Attachment #729636 - Flags: review?(justin.lebar+bug)
Comment on attachment 730194 [details] [diff] [review]
Patch 1: Push IDL

r=me, but please wrap the long line here.  I'd still like Jonas's review.

Did you consider the naming nits in comment 40?  API names do matter.
Attachment #730194 - Flags: review?(justin.lebar+bug) → review+
(In reply to Justin Lebar [:jlebar] from comment #77)
> Comment on attachment 730194 [details] [diff] [review]
> Patch 1: Push IDL
> 
> r=me, but please wrap the long line here.  I'd still like Jonas's review.
> 
> Did you consider the naming nits in comment 40?  API names do matter.

Yes, but registrations is appropriate too right? I register for multiple courses, register for multiple events, register for multiple pushEndpoints :)
> I register for multiple courses, register for multiple events, register for multiple 
> pushEndpoints :)

Sure, and if the semantics were

  pushEndpoint.register()

that would make perfect sense to me.  But instead it's push.register(), which I contend looks more like you're registering for push than that you're registering for a new push endpoint.

Indeed, it wouldn't make sense to call |pushEndpoint.register()| or |course.register()| twice, so similarly it doesn't make much sense to me that we can call |push.register()| twice.
(In reply to Justin Lebar [:jlebar] from comment #76)
> Comment on attachment 729636 [details] [diff] [review]
> Patch 2: The core push implementation
> 
> >+  observe: function observe(aSubject, aTopic, aData) {
> >+    switch (aTopic) {
> >+      case "app-startup":
> >+        let appInfo = Cc["@mozilla.org/xre/app-info;1"];
> >+        let xre = Ci.nsIXULRuntime;
> >+        if (!appInfo ||
> >+            appInfo.getService(xre).processType != xre.PROCESS_TYPE_DEFAULT) {
> >+          return;
> >+        }
> 
> Hmm.  Is this if statement just for paranoia, or does this code actually get
> run in the child process?
> 
> We went through a lot of work to minimize the amount of JS that's parsed and
> run on child process startup.  It seems like we'd want to avoid parsing this
> big file, too.  I don't think we can regress the startup numbers and then fix
> it in a follow-up.
> 
> This is what's holding r+.

This was indeed running in the child, so dougt wrote this piece to prevent that. I'll try it again and back this out if it isn't required.

> 
> >+    this._shutdownWS();
> >+
> >+    debug("shutdown complete!");
> >+  },
> >+
> >+  // aStatusCode is an NS error from Components.results
> >+  _socketError: function(aStatusCode) {
> >+    debug("socketError()");
> >+    
> >+    var retryTimeout = this._prefs.get("retryBaseInterval") *
> >+                        Math.pow(2, this._retryFailCount);
> >+    this._retryFailCount++;
> 
> Do you want to put a cap on this?  Otherwise it could become quite large,
> potentially disabling push notifications on a device for days.

Yes, will do that.

> 
> >+    debug("Retry in " + retryTimeout + " Try number " + this._retryFailCount);
> >+
> >+    if (!this._retryTimeoutTimer) {
> >+      this._retryTimeoutTimer = Cc["@mozilla.org/timer;1"]
> >+                                  .createInstance(Ci.nsITimer);
> >+    }
> >+
> >+    this._retryTimeoutTimer.init(this,
> >+                                 retryTimeout,
> >+                                 Ci.nsITimer.TYPE_ONE_SHOT);
> 
> Is it possible for the timer to be running already here?

Yes, only if their are subsequent network level failures which led to wsOnStart not being called between two calls to socketError(). AFAIK re-initing a timer without calling cancel is safe[1]. And since there was an error, we need to increase the back-off time. So I think this is fine.

[1]: http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#326
> This was indeed running in the child, so dougt wrote this piece to prevent
> that. I'll try it again and back this out if it isn't required.

Okay, but to be clear, if it /is/ required, then we need to do something about this to either

a) prevent this code from being run in the child at all (e.g. moving most of it into a JSM), or

b) demonstrate no effect on process startup time and memory usage.

I suspect (a) will be simpler.
Comment on attachment 729797 [details] [diff] [review]
Patch 3: UDP wakeup support

I don't mind if we address UDP retry in this patch or another one, but if we're
going to wait, please file a tef+ bug before landing.

r=me with that bug filed or fixed here.  This patch looks great; these are just
nits.

>diff --git a/dom/push/src/PushService.js b/dom/push/src/PushService.js
>--- a/dom/push/src/PushService.js
>+++ b/dom/push/src/PushService.js

>+const kUDP_WAKEUP_WS_STATUS_CODE = 4774;  // WebSocket Close status code sent
>+                                          // by server to signal that it can
>+                                          // wake client up using UDP.

If this status code is a constant, the other ones should be as well, I think.

>+        // check to see if we need to do anything.

PC

>@@ -363,30 +389,44 @@ PushService.prototype = {
>    * _sendRequest will automatically call beginWSSetup(), which will cancel the
>    * timer. In addition since the state will have changed, even if a pending
>    * timer event comes in (because the timer fired the event before it was
>    * cancelled), so the connection won't be reset.
>    */
>   _retryTimeoutTimer: null,
>   _retryFailCount: 0,
> 
>+  /**
>+   * According to the WS spec, servers should immediately close the underlying
>+   * TCP connection after they close a WebSocket. This causes wsOnStop to be
>+   * called with error NS_BASE_STREAM_CLOSED. Since the client has to keep the
>+   * WebSocket up, it should try to reconnect. But if the server closes the
>+   * WebSocket because it will wake up the client via UDP, then the client
>+   * shouldn't re-establish the connection. This keeps track of the WebSocket
>+   * Close (which happens in wsOnServerClose) and is checked in wsOnStop

This is a good comment, aside from the lack of a trailing period.  :)  But the
last sentence doesn't make sense to me.  "This keeps track of the WS Close."
Can you try rephrasing that?

>+  _willBeWokenUp: false,

We can be woken up over a WS or over UDP.  This variable has to do specifically
with UDP, so can we say that?

>+    this._port = this._prefs.get("udp.port");

WS uses ports too; can we call this udpPort, udpServerPort, or something like
that?

>@@ -1013,35 +1058,51 @@ PushService.prototype = {
> 
>     var data = {
>       messageType: "hello",
>     }
> 
>     if (this._UAID)
>       data["uaid"] = this._UAID;
> 
>+    if (this._ip) {
>+      data["interface"] = {
>+        ip: this._ip,
>+        port: this._port
>+      };

An address:port is not a network "interface".  The best we were able to come up
with on #developers is "hostport", which I'd never heard of but is apparently a
thing.

Maybe "wakeup_hostport" would be a good name.

>   /**
>    * This statusCode is not the websocket protocol status code, but the TCP
>    * connection close status code.
>+   *
>+   * If the client does not explicitly call ws.close() then statusCode
>+   * is always NS_BASE_STREAM_CLOSED, even on successful close.

"We" are "the client" here, right?  The former would be clearer to me.

Nit: s/on successful/on a successful/

>@@ -1079,22 +1140,103 @@ PushService.prototype = {
>+  /**
>+   * This does not deal with the successful close (aStatusCode == 1000,
>+   * according to spec), because a 'successful' close will still translate to
>+   * NS_BASE_STREAM_CLOSED in _wsOnStop() (see comment in that function), but
>+   * with _willBeWokenUp set to false. This means client will re-establish
>+   * connection

PC, s/means client/means the client/.  Also, while I grant that we only
actually have one "connection" here, it would be helpful if you'd specify
specifically that we're re-establishing the WS connection.

I'm not sure what you mean by "not deal with".  Do you mean that
_wsOnServerClose should never be called with aStatuscode = 1000?  If so, please
say that, and add an assertion of some sort.

>   _wsOnServerClose: function(context, aStatusCode, aReason) {
>     debug("wsOnServerClose() " + aStatusCode + " " + aReason);
>-    // 1000 is the normal close
>-    if (aStatusCode == 1000 && Object.keys(this._pendingRequests).length > 0) {
>-      // this should never happen. A successful close cannot have pending
>-      // requests since the server should've responded to them before the
>-      // connection was closed.
>-      this._shutdownWS();
>+
>+    // switch over to UDP

PC

>+  _listenForWakeup: function() {

listenForUDPWakeup, please.

>+    debug("listenForWakeup()");
>+
>+    if (this._udpServer) {
>+      debug("UDP Server already running");
>+      return;
>+    }
>+
>+    if (!this._ip) {
>+      debug("No IP");
>+      return;
>+    }
>+
>+    if (!this._prefs.get("udp")) {

Nit: I'd probably call this udpWakeupEnabled or something.

>+  /**
>+   * Called by UDP Server Socket. As soon as a ping is recieved via UDP,
>+   * reconnect WebSocket and get actual data.

s/WebSocket/the WebSocket/ or s/WebSocket/our WebSocket/
s/actual data/the actual data/

>+  onStopListening: function(aServ, aStatus) {

Could you please document who calls this, and perhaps explain why we don't take
any action upon receiving this notification?

>+  /**
>+   * Get network information to decide if the client is capable of being woken
>+   * up by UDP (which currently just means having an mcc and mnc along with an
>+   * IP.

Parenthesis.  Also it means being connected to a mobile network and not to
WiFi, as I understand the network manager.

Can we get rid of these member variables and just return them when needed?
It's not clear to me why we have to cache them, and using member variables
in place of return values is on the path to the Dark Side.

>+  _setupNetworkState: function() {
>+    debug("setupNetworkState()");
>+
>+    try {
>+      var networkManager = Cc["@mozilla.org/network/manager;1"]
>+                             .getService(Ci.nsINetworkManager);
>+      if (networkManager.active &&
>+          networkManager.active.type ==
>+                        Ci.nsINetworkInterface.NETWORK_TYPE_MOBILE) {
>+        debug("Running on mobile data");
>+        var mcp = Cc["@mozilla.org/ril/content-helper;1"]
>+                    .getService(Ci.nsIMobileConnectionProvider);
>+        this._mcc = mcp.iccInfo.mcc;
>+        this._mnc = mcp.iccInfo.mnc;
>+        this._ip  = networkManager.active.ip;
>+        return;
>+      }

Do these functions actually throw exceptions?  If so, which?  I see that maybe
iccInfo could be null, but that's all I could see.  If that's all, perhaps we
should just add a null-check.
Attachment #729797 - Flags: review?(justin.lebar+bug) → review+
(In reply to Justin Lebar [:jlebar] from comment #82)
> Comment on attachment 729797 [details] [diff] [review]
> Patch 3: UDP wakeup support
> 
> I don't mind if we address UDP retry in this patch or another one, but if
> we're
> going to wait, please file a tef+ bug before landing.
> 
> r=me with that bug filed or fixed here.  This patch looks great; these are
> just
> nits.

Let's keep it in this patch.
> 
> >   /**
> >    * This statusCode is not the websocket protocol status code, but the TCP
> >    * connection close status code.
> >+   *
> >+   * If the client does not explicitly call ws.close() then statusCode
> >+   * is always NS_BASE_STREAM_CLOSED, even on successful close.
> 
> "We" are "the client" here, right?  The former would be clearer to me.
> 
> Nit: s/on successful/on a successful/
> 
> >@@ -1079,22 +1140,103 @@ PushService.prototype = {
> >+  /**
> >+   * This does not deal with the successful close (aStatusCode == 1000,
> >+   * according to spec), because a 'successful' close will still translate to
> >+   * NS_BASE_STREAM_CLOSED in _wsOnStop() (see comment in that function), but
> >+   * with _willBeWokenUp set to false. This means client will re-establish
> >+   * connection
> 
> PC, s/means client/means the client/.  Also, while I grant that we only
> actually have one "connection" here, it would be helpful if you'd specify
> specifically that we're re-establishing the WS connection.
> 
> I'm not sure what you mean by "not deal with".  Do you mean that
> _wsOnServerClose should never be called with aStatuscode = 1000?  If so,
> please
> say that, and add an assertion of some sort.

I mean 1000 is not something we care about, or will affect our behaviour. No assertions, its not special.
> 
> >+  _setupNetworkState: function() {
> >+    debug("setupNetworkState()");
> >+
> >+    try {
> >+      var networkManager = Cc["@mozilla.org/network/manager;1"]
> >+                             .getService(Ci.nsINetworkManager);
> >+      if (networkManager.active &&
> >+          networkManager.active.type ==
> >+                        Ci.nsINetworkInterface.NETWORK_TYPE_MOBILE) {
> >+        debug("Running on mobile data");
> >+        var mcp = Cc["@mozilla.org/ril/content-helper;1"]
> >+                    .getService(Ci.nsIMobileConnectionProvider);
> >+        this._mcc = mcp.iccInfo.mcc;
> >+        this._mnc = mcp.iccInfo.mnc;
> >+        this._ip  = networkManager.active.ip;
> >+        return;
> >+      }
> 
> Do these functions actually throw exceptions?  If so, which?  I see that
> maybe
> iccInfo could be null, but that's all I could see.  If that's all, perhaps we
> should just add a null-check.

Legacy code from when this worked on FF Desktop, but network manager did not exist.
Reformatted comments
Attachment #730194 - Attachment is obsolete: true
Attachment #730194 - Flags: review?(jonas)
Attachment #730426 - Flags: review?(jonas)
Seems like the child process check isn't needed after all.
shutdown is private, prefixed with _.
changed UAID max length to 128 since it's a better magic number than 36. Dropped any mention of requiring a UUID.

I think I've got all the other nits fixed.
Attachment #729636 - Attachment is obsolete: true
Attachment #730428 - Flags: review?(justin.lebar+bug)
Attached patch Patch 3: UDP wakeup support (obsolete) — Splinter Review
wakeup_hostport :)
setNetworkState -> getNetworkState
eliminate some globals
Attachment #729797 - Attachment is obsolete: true
Attachment #730429 - Flags: review?(justin.lebar+bug)
> Let's keep [udp port retry] in this patch.

Did you change your mind on this one?
>+    // By this point we've got a UAID from the server that we are ready to
>+    // accept.

I would rather not go through all my review comments to make sure that you did them; I'd prefer to trust that you got it right.  Would you mind doing a once-over, seeing as at least one review comment has not been addressed with this latest patch?
Attachment #730428 - Flags: review?(justin.lebar+bug)
Attachment #730429 - Flags: review?(justin.lebar+bug)
Oh, sorry.  I misremembered which "we've" I nit'ed on.  Nevermind me.
Comment on attachment 730428 [details] [diff] [review]
Patch 2: The core push implementation

I didn't go through to verify that you did what you said, per above.  :)
Attachment #730428 - Flags: review+
Comment on attachment 730429 [details] [diff] [review]
Patch 3: UDP wakeup support

I'd like to re-review if we in fact add UDP port-in-use retry to this.
Attachment #730429 - Flags: review+
Comment on attachment 730426 [details] [diff] [review]
Patch 1: Push IDL

Jonas, I'd like you to review this because

a) I'm not a super-reviewer, and
b) nsm and I disagree about the method names here.  I'm happy to defer to your judgement.  See comment 77 - 79.
(In reply to Justin Lebar [:jlebar] from comment #87)
> > Let's keep [udp port retry] in this patch.
> 
> Did you change your mind on this one?

I'm sorry, I didn't even see the 'retry' bit before. I thought that you suggested that the whole UDP bit be kept in a separate bug. Sorry!

Port retry should be in another bug.
Everything discussed in the earlier reviews has been fixed.

As we discussed on IRC, arc4random() is secure with enough entropy in the system.
For dropping 409 - bug 855727
For empty serverURL - bug 855725
Attachment #730428 - Attachment is obsolete: true
Attachment #730731 - Flags: review?(justin.lebar+bug)
For port retry, bug 851303 is already filed.

Fixed nits.
Attachment #730429 - Attachment is obsolete: true
Attachment #730734 - Flags: review?(justin.lebar+bug)
Is there something in particular you'd like me to look at?  r+ with comments (from me, at least, but I think this is pretty standard) means that you don't need to ask for review again, unless you want additional feedback.  I was happy with the last few versions!
To be clear, I'm very happy to look at whatever changes you'd like me to look at.
Just put an r? so you could make it r+ so it is ready to land.

/me being a newbie
Your r+ should be sufficient for checkin-needed.  (In fact, you probably don't even need to do that.)  Whoever does the checkin should either trust you or check that the patches you r+'ed are similar to the obsolete patches I r+'ed.
Comment on attachment 730731 [details] [diff] [review]
Patch 2: The core push implementation

r+ Based on r+ on previous patch and comments by jlebar.
Attachment #730731 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 730734 [details] [diff] [review]
Patch 3: UDP wakeup support

r+ Based on r+ on previous patch and comments by jlebar.
Attachment #730734 - Flags: review?(justin.lebar+bug) → review+
Depends on: 849364
Comment on attachment 730731 [details] [diff] [review]
Patch 2: The core push implementation

Review of attachment 730731 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/push/src/PushService.js
@@ +428,5 @@
> +
> +    // It is easier to express the max interval as a pref in milliseconds,
> +    // rather than have it as a number and make people do the calculation of
> +    // retryBaseInterval * 2^maxRetryFailCount.
> +    retryTimeout = Math.min(retryTimeout, this._prefs.get("maxRetryInterval"));

So the websocket protocol has it's own exponential backoff algorithm for reconnecting from a failed connection, which we implement:

  http://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-17#section-7.2.3
  
  http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp?rev=8ee3745570f7#126

Our current implementation uses both a different initial/max delay than yours (200-400ms initial delay, max delay 1 minute, while your code uses 5 seconds/20 minutes).  It also increments by 1.5 instead of 2.  We could easily enough provide knobs on nsIWebsocketChannel to allow code to tweak the backoff algorithm as wanted.  I'm guessing that should be followup work since we want to land ASAP.  We'll effectively be running two separate backoff algorithms at the same time for now, but I don't think that's a problem (the algorithm here always waits longer to reconnect than our internal one, so you won't notice any interference). Followup bug?

@@ +482,5 @@
> +    }
> +
> +    debug("serverURL: " + uri.spec);
> +    this._wsListener = new PushWebSocketListener(this);
> +    this._ws.protocol = "push-notification";

Now that bug 849364 has landed we should also do this here:

   this._ws.pingInterval = ....; 

(and .pingTimeout if the default 10 seconds for the server to reply to a ping is not right for you).

Not sure what ping interval you have in mind (note that its in milliseconds).   I suggest you make the interval as long as possible to conserve power/bandwidth.
Comment on attachment 730731 [details] [diff] [review]
Patch 2: The core push implementation

Note that this patch is bitrotted on inbound, at least.  Please unbitrot and add the .pingInterval mentioned in last comment.
Comment on attachment 730426 [details] [diff] [review]
Patch 1: Push IDL

Review of attachment 730426 [details] [diff] [review]:
-----------------------------------------------------------------

Make unregistered set the. result to true/false depending on if an endpoint with that name was found.
Attachment #730426 - Flags: review?(jonas) → review+
Fix bitrot in package manifest files for browser and android.

Not including changes for websocket ping interval in this bug. Let's land this!
Attachment #730731 - Attachment is obsolete: true
Attachment #730971 - Flags: review+
Cool! Asking for leo+ now that it's ready to land. We would really love to get this on 1.1.
blocking-b2g: --- → leo?
OK, removing dependency on websocket pings.  File followup?
No longer depends on: 849364
If endpoint with that name was not found, shouldn't it call DOMRequest.onerror instead? The true/false dichtomy is captured by onsuccess/onerror. Putting the endpoint as the result is useful so that applications can remove it from their local set if they are juggling multiple endpoints and associating them with various things.
blocking-b2g: leo? → ---
Depends on: 849364
Flags: needinfo?(jonas)
blocking-b2g: --- → leo?
FWIW, I think this should be OOS for 1.1 for the following reasons:

1. Push notifications was not part of the v1.1 P1 user stories
2. Relatively new concept, so this will probably need a good amount of bake time

This is definitely a good candidate to push into 1.2, however.
> Make unregistered set the. result to true/false depending on if an endpoint with that name was found.

The result should include the pushEndpoint that was unregistered.  we will add support for this in a follow up.  (It isn't clear what the application can do if the unregister was unsuccessful.)
https://hg.mozilla.org/mozilla-central/rev/e93a4da26856
https://hg.mozilla.org/mozilla-central/rev/2aaf82b852e7
https://hg.mozilla.org/mozilla-central/rev/bfced2ecc0cf
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: sec-review?(ptheriault)
Flags: needinfo?(jonas)
Resolution: --- → FIXED
(In reply to Jason Smith [:jsmith] from comment #109)
> FWIW, I think this should be OOS for 1.1 for the following reasons:
> 
> 1. Push notifications was not part of the v1.1 P1 user stories
> 2. Relatively new concept, so this will probably need a good amount of bake
> time
> 
> This is definitely a good candidate to push into 1.2, however.

With respect to 1, Push has been in the scope since 1.0 and was not added because of lack of time. The agreement was that Push will include in 1.1 if it was finished before 1.1 Code Complete (which seems to be the case).

With respect to 2, I think Doug can provide more info about the risk of doing so.
Flags: needinfo?(doug.turner)
Requesting tracking-b2g18 as per [1]. Even weren't this to be considered blocking, it should be uplifted as per c#113.

Doug, can you request approval-mozilla-b2g18?

[1] https://wiki.mozilla.org/Release_Management/B2G_Landing#All_other_landings_for_v1.x_.28updated_3.2F27.29
tracking-b2g18: --- → ?
Tracking and we'll to look to the uplift nomination request for risk as well please state whether this can be easily preffed off in product or how challenging the backout would be if this sat in v1-train baking for a while (once apps start to use it how much control do we have over taking it out if there are too many regressions?).

If this lands we should also get some outreach to the b2g testdrivers asking them to update and poke around at this (give them clear STR).

Leaving this nominated for leo? so that we can keep checking in on the status here and evaluate whether the risks are worth the rewards.
There is no use for 'baking' this on master/m-c.  There will not be any users there.  We need to uplift this now to maximize testing.

We have been banging on it and haven't noticed any serious issues.

Uplift please.
Flags: needinfo?(doug.turner)
(In reply to Doug Turner (:dougt) from comment #116)
> There is no use for 'baking' this on master/m-c.  There will not be any
> users there.  We need to uplift this now to maximize testing.

Certainly true there's no testing coverage we'll get on m-c right now. But per product stories at least our QA team was given, this wasn't planned scope for 1.1.

If the plan ends up uplifting this, I need to understand:

1. End-user consumable impact - in scope push notification user stories vs. out of scope user stories
2. What risk does this pose to the rest of the phone at large
3. What other dependencies is needed to be implemented as part of landing this that provides end-user consumption
4. Does this patch include a killswitch preference to turn off the feature if we run out of time?

Note - FWIW, I understand operators are pushing hard for this feature, but on the reverse side, even selling to get input file support into b2g as unplanned b2g 1.1 feature was a tough sell until we made sure risk was under control. input file is a well understood concept implemented already on different platforms, but this is a *first* for b2g. We need to be careful about risk and scope here.

> 
> We have been banging on it and haven't noticed any serious issues.
> 
> Uplift please.

I've received a different story on this then Daniel has, so I want product input here to determine if this goes into 1.1 or not.
Flags: needinfo?(ffos-product)
hey Jason.

So, the feature is basically pref'ed off now.  You have to set a preference to enable Gecko to talk to a Push Service.  So, unless someone prefs on this feature, you're not going to see any impact.

So, the risk is near zero.
Flags: needinfo?(ffos-product)
> please state whether this can be easily preffed off in product

This feature can be trivially disabled.
Gotcha. Okay, thanks for addressing my QA concerns on this. I've got no issues then with uplifting this from a QA perspective.
I don't see any reason to not take this patch now on b2g18. I dare anyone to tell us when 1.1 will actually be out of the door, so I don't want features to be blocked for pseudo-deadlines.
blocking-b2g: leo? → leo+
b2g18:
f9263269716f Doug Turner – Bug 822712 - Start UDP socket on boot if required. r=dougt, jlebar, jst, a=blocking-b2g:leo+
e6f40aabf5a5 Nikhil Marathe – Bug 822712 - SimplePush WebSocket implementation, r=dougt, jlebar, jst. a=blocking-b2g:leo+
83eb6c9fcde4 Doug Turner – Bug 822712 - Simple Push notifications for b2g - Interfaces sr=jonas, a=blocking-b2g:leo+
(In reply to Fabrice Desré [:fabrice] from comment #121)
> I don't see any reason to not take this patch now on b2g18. I dare anyone to
> tell us when 1.1 will actually be out of the door, so I don't want features
> to be blocked for pseudo-deadlines.

leo+ implies that we'll block the release to make sure this feature lands. That isn't really the case - we'll disable/backout if we've still got critical regressions by the time we branch. I'll approve individual patches instead.
blocking-b2g: leo+ → -
Attachment #730426 - Flags: approval-mozilla-b2g18+
Attachment #730734 - Flags: approval-mozilla-b2g18+
Attachment #730971 - Flags: approval-mozilla-b2g18+
We shouldn't use error to signal "couldn't remove the URL beause it doesn't exist". The result of the URL not existing and it being deleted is essentially the same, we won't be receiving notifications from that URL any more.

Errors should only be used to indicate "unable to remove url" or "not sure if we successfully removed the URL or not".
sicking, i disagree.  let's discuss out of this bug... we can file a followup if it turns out you win.
backed out on b2g18:
   6c95f9a21e9b, f9263269716f, e6f40aabf5a5, 83eb6c9fcde4

Going to reland this weekend when try results are in.
Blocks: 747907
Alias: simple-push-b2g
Blocks: 888003
Blocks: 888058
Blocks: 893163
Depends on: 894879
Blocks: 894879
No longer depends on: 894879
No longer blocks: 888003
What would be needed to enable this API on FF desktop? Is there a bug for that?
(In reply to Marco Castelluccio [:marco] from comment #129)
> What would be needed to enable this API on FF desktop? Is there a bug for that?

Push for Firefox desktop is bug 857464.
Blocks: 857464
You need to log in before you can comment on or make changes to this bug.