Closed Bug 763198 Opened 12 years ago Closed 11 years ago

B2G push notification

Categories

(Firefox OS Graveyard :: General, defect, P1)

All
Gonk (Firefox OS)
defect

Tracking

(blocking-kilimanjaro:+, blocking-basecamp:-)

RESOLVED WONTFIX
blocking-kilimanjaro +
blocking-basecamp -

People

(Reporter: khu, Assigned: sinker)

References

(Depends on 2 open bugs)

Details

(Keywords: dev-doc-needed, feature, Whiteboard: [soft][tech-p1])

Attachments

(5 files, 8 obsolete files)

35.80 KB, patch
Details | Diff | Splinter Review
12.69 KB, patch
Details | Diff | Splinter Review
45.20 KB, patch
Details | Diff | Splinter Review
18.21 KB, patch
Details | Diff | Splinter Review
135.62 KB, application/pdf
Details
This case is used to trace push notification.
Assignee: nobody → tlee
blocking-basecamp: --- → ?
Latest implementation of push notification of gecko is at http://bitbucket.org/thinker/mozilla-central.  There is not patch until the API and the protocol is worked out.
Blocks: 747907
blocking-kilimanjaro: --- → ?
Attaching a work in progress patch in the bug wouldn't hurt.
This is a WIP patch based on the API on https://wiki.mozilla.org/WebAPI/PushAPI. I made it before I saw this bug since I'm working on the server/client stuff in bug 754062.

There's no tests yet, but I wanted to get some early feedback first. Some problems I'm having:

* the QueryInterface property is exposed to the web
* navigator.mozPush doesn't show up as a completion in the web console until I use it at least once.

This patch only provides the DOM bindings. My plan is to implement the client API under services/notifications, WIP here: https://github.com/jbalogh/services-central/tree/notifications
Attachment #633276 - Flags: feedback?(mounir)
(In reply to Jeff Balogh (:jbalogh) from comment #3)
> Created attachment 633276 [details] [diff] [review]
> mozPush implementation in javascript
> 
> This is a WIP patch based on the API on
> https://wiki.mozilla.org/WebAPI/PushAPI. I made it before I saw this bug
> since I'm working on the server/client stuff in bug 754062.
> 
> There's no tests yet, but I wanted to get some early feedback first. Some
> problems I'm having:
> 
> * the QueryInterface property is exposed to the web
> * navigator.mozPush doesn't show up as a completion in the web console until
> I use it at least once.

Jeff, you may like to try push-notification branch in the repository mentioned at comment 1. There are workable testcases in netwerk/test/unit/ and dom/tests/mochitests/.  The implementation is based on the API purposed before; very close to the current one that you have mentioned.  I will move it to the latest API next week.  I guest it only a few lines of rewriting.

Since the detail of underlying protocol is still not settled, the implementation will be updated time to time.
Comment on attachment 633276 [details] [diff] [review]
mozPush implementation in javascript

Review of attachment 633276 [details] [diff] [review]:
-----------------------------------------------------------------

There is not much I can say about this patch for the moment. I'm not fully convinced about the API but that discussion has to happen in dev-webapi. I can't really say much about the JS implementation. The only comment I can do right now is that I would have seen that code living in dom/push/ instead of dom/src/notification.
Attachment #633276 - Flags: feedback?(mounir)
blocking-basecamp: ? → +
blocking-kilimanjaro: ? → +
Whiteboard: [soft]
Depends on: 767375
Attached patch WIP v2 (obsolete) — Splinter Review
This patch follow the API at https://wiki.mozilla.org/WebAPI/PushAPI with a minor change to allow apps passing a token.
Attachment #636526 - Attachment is obsolete: true
Comment on attachment 648208 [details] [diff] [review]
WIP v2

Review of attachment 648208 [details] [diff] [review]:
-----------------------------------------------------------------

Quick comments from my POV

::: dom/interfaces/notification/nsIDOMNavigatorPushNotification.idl
@@ +4,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> + * You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "nsISupports.idl"
> +#include "nsIDOMDOMRequest.idl"

I needed to change this to:

-#include "nsIDOMDOMRequest.idl"
+interface nsIDOMDOMRequest;

to compile (MacOS 10.8).

::: dom/src/notification/PushNotification.cpp
@@ +21,5 @@
> +#include "nsIScriptGlobalObject.h"
> +#include "nsIScriptContext.h"
> +// Get the origin of a nsIDOMWindow.
> +#include "nsIScriptObjectPrincipal.h"
> +#include "nsIPrincipal.h"

And adding

+#include "nsIDOMDOMRequest.h"

::: dom/tests/mochitest/chrome/file_push_server_wsh.py
@@ +11,5 @@
> +    def registerUA(msg_obj):
> +        if msg_obj["data"]["uatoken"] == UATOKEN:
> +            ret_obj = {"status": "REGISTERED"}
> +        else:
> +            ret_obj = {"status": "ERROR"}

This could send more specific errors. The message will be {"status": "ERROR. Your UAtoken is not valid for this server"}

::: netwerk/push-notification/nsIPushNotification.idl
@@ +4,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> + * You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "nsISupports.idl"
> +#include "nsIDOMDOMRequest.idl"

I needed to delete this to compile.

::: netwerk/push-notification/nsPushNotification.js
@@ +590,5 @@
> +  // nsIPushNotification
> +
> +  nsURL: "ws://example.com:8080/", // URI of the notification server
> +
> +  uatokenURL: "http://example:8080/", // URI for retrieving the UA token

The URL should be this baseURL + /token
Comment on attachment 648208 [details] [diff] [review]
WIP v2

Review of attachment 648208 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/src/notification/PushNotification.cpp
@@ +21,5 @@
> +#include "nsIScriptGlobalObject.h"
> +#include "nsIScriptContext.h"
> +// Get the origin of a nsIDOMWindow.
> +#include "nsIScriptObjectPrincipal.h"
> +#include "nsIPrincipal.h"

It is really weir since #include "nsIDOMDOMRequest.idl" would be translated by xpidl to #inclue "nsIDOMDOMRequest.h" in nsIDOMNavigatorPushNotification.h.  You don't have to do it by yourself.  Could you show me the error messages you got?

::: dom/tests/mochitest/chrome/file_push_server_wsh.py
@@ +11,5 @@
> +    def registerUA(msg_obj):
> +        if msg_obj["data"]["uatoken"] == UATOKEN:
> +            ret_obj = {"status": "REGISTERED"}
> +        else:
> +            ret_obj = {"status": "ERROR"}

You should put the reason of errors as a separated attribute.  For example, {"status": "ERROR", "reason": "Your UAtoken is not valid for this server"}.

::: netwerk/push-notification/nsIPushNotification.idl
@@ +4,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> + * You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "nsISupports.idl"
> +#include "nsIDOMDOMRequest.idl"

Something is wrong.  It should not be so.  Do you have dom/base/nsIDOMDOMRequest.idl?

::: netwerk/push-notification/nsPushNotification.js
@@ +590,5 @@
> +  // nsIPushNotification
> +
> +  nsURL: "ws://example.com:8080/", // URI of the notification server
> +
> +  uatokenURL: "http://example:8080/", // URI for retrieving the UA token

This is the server for getting a user-agent token for a new device.  We don't have any token before getting from this URL.
(In reply to Thinker Li [:sinker] from comment #9)
> Comment on attachment 648208 [details] [diff] [review]
> WIP v2
> 
> Review of attachment 648208 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/src/notification/PushNotification.cpp
> @@ +21,5 @@
> > +#include "nsIScriptGlobalObject.h"
> > +#include "nsIScriptContext.h"
> > +// Get the origin of a nsIDOMWindow.
> > +#include "nsIScriptObjectPrincipal.h"
> > +#include "nsIPrincipal.h"
> 
> It is really weir since #include "nsIDOMDOMRequest.idl" would be translated
> by xpidl to #inclue "nsIDOMDOMRequest.h" in
> nsIDOMNavigatorPushNotification.h.  You don't have to do it by yourself. 
> Could you show me the error messages you got?

I had to do the changes to your branch on bitbucket. I've recompiled without those changes and it's correct, but let me clean the build and do it again.

> 
> ::: dom/tests/mochitest/chrome/file_push_server_wsh.py
> @@ +11,5 @@
> > +    def registerUA(msg_obj):
> > +        if msg_obj["data"]["uatoken"] == UATOKEN:
> > +            ret_obj = {"status": "REGISTERED"}
> > +        else:
> > +            ret_obj = {"status": "ERROR"}
> 
> You should put the reason of errors as a separated attribute.  For example,
> {"status": "ERROR", "reason": "Your UAtoken is not valid for this server"}.

Agree. Will change that in our server.

> 
> ::: netwerk/push-notification/nsIPushNotification.idl
> @@ +4,5 @@
> > + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> > + * You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +#include "nsISupports.idl"
> > +#include "nsIDOMDOMRequest.idl"
> 
> Something is wrong.  It should not be so.  Do you have
> dom/base/nsIDOMDOMRequest.idl?

Yes.

> 
> ::: netwerk/push-notification/nsPushNotification.js
> @@ +590,5 @@
> > +  // nsIPushNotification
> > +
> > +  nsURL: "ws://example.com:8080/", // URI of the notification server
> > +
> > +  uatokenURL: "http://example:8080/", // URI for retrieving the UA token
> 
> This is the server for getting a user-agent token for a new device.  We
> don't have any token before getting from this URL.

The "/token" is a part of the URL, you don't need to change "token" for your "uatoken". It's a real URL:

GET http://push.mozilla.org/token

and gives a new token that should be used to register a new UA.
(In reply to Guillermo López (:willyaranda) from comment #10)
> The "/token" is a part of the URL, you don't need to change "token" for your
> "uatoken". It's a real URL:
> 
> GET http://push.mozilla.org/token
> 
> and gives a new token that should be used to register a new UA.
This is a configurable value.  You can change it by setting preference "network.push-notification.user-agent-token-server".
(In reply to Thinker Li [:sinker] from comment #11)
> (In reply to Guillermo López (:willyaranda) from comment #10)
> > The "/token" is a part of the URL, you don't need to change "token" for your
> > "uatoken". It's a real URL:
> > 
> > GET http://push.mozilla.org/token
> > 
> > and gives a new token that should be used to register a new UA.
> This is a configurable value.  You can change it by setting preference
> "network.push-notification.user-agent-token-server".

Ok!
Priority: -- → P1
Attached patch WIP v3 (obsolete) — Splinter Review
Attachment #648208 - Attachment is obsolete: true
Flags: sec-review+
Dan, when was the security review of this done? The security review I attended basically didn't go anywhere because of a lack of documentation and so we decided to reschedule.
@Jonas, as far as I know it hasn't been rescheduled yet - re-opened and assigned to myself. 

I've been on leave - do you know if there is any more formal documentation yet? (I saw a bunch of emails, but I am still catching up)
Flags: sec-review+ → sec-review?(ptheriault)
@Jonas, Push-API on wiki does not yet define the way to delivery push-messages to APPs.  Since Web Activity is landing, I will move the implementation to web activity from current mozChromeEvent way.  We need to pass a token string to PushManager.requestURL() as the shared knowledge between push server and web application server.

Can you confirm it?
I don't think you want to use WebActivities since a WebActivity brings up UI where the user can chose which application to use to handle a specific message. That's clearly not what we want here.

What you want to use is system messages, which is the message mechanism used for the Alarm API by WebActivities. The goal of system messages is to deliver a message to an application and wake up that application if needed to do so. Which exactly matches what you are looking for here.

System messages are working already (since they were needed in order for web activities to work).


I don't know what you mean by the token. A pointer to any explanation of what the API looks like and how it works would be very helpful.
Hey Jonas, can we have a meeting before the HackWeek in Brazil to clarify some things about how to pass info to the apps, and what kind of notifications we are supporting?

I'll stay in Madrid developing the server but Fernando (frsela) will be there helping thinker.

I'll be in the IRC #b2g channel for the whole day.
Hi Thinker!

I talked with Jonas last Thursday, and he said that the best option for push notifications is to use mozSetMessageHandler in the apps. I don't really know the gecko API to use, but maybe he can ellaborate a little bit more.

It's being used in the AlarmAPI: https://wiki.mozilla.org/WebAPI/AlarmAPI

and in the app manifest: https://github.com/mozilla-b2g/gaia/blob/master/apps/clock/manifest.webapp#L71
Priority: P1 → --
Guillermo,

To use the System Messages API, you need to do what the alarm API is doing, ie. register the target of push notifications and then use the sendMessage() (see https://mxr.mozilla.org/mozilla-central/source/dom/messages/interfaces/nsISystemMessagesInternal.idl#22) that is implemented by the @mozilla.org/system-message-internal;1 component.
Attached patch WIP v4 (obsolete) — Splinter Review
Include the testcases against push-notification server, and code for delivering arriving push messages through system message.
Attachment #651348 - Attachment is obsolete: true
Blocks: 786841
Depends on: 771884
Attachment #656829 - Attachment description: Implementation of push-notification protocol → Part 1: implementation of push-notification protocol
Attached patch Part 3: push-API (obsolete) — Splinter Review
Attachment #656829 - Flags: review?
Attachment #656830 - Flags: review?
Attachment #656831 - Flags: review?
(In reply to Thinker Li [:sinker] from comment #23)
> Created attachment 656830 [details] [diff] [review]
> Part 2: push-API

This part needs a super review to make sure API is ready.
Thinker, I think you forgot to put a reviewer on those review requests. Even if Justin is going to do them, better to ask explicitly so it appears in his review request list.
sorry that this is late, the earlier comment must have been eaten.

I have a few questions about the API approach being taken here. 

1) Are tokens limited to base32 or will they be using base64? (or is there no set limit on the character set available for token representation?)

2) How do end users control or revoke a given token? If they request a new token, is the ID the same?

3) I take it that the sockets only connect after a UDP ping. Is that required or will the ws do a timed backoff like how we're doing sync?

4) I'm still a bit fuzzy on this. Is there a separate websocket channel for each broadcaster or are the notifications collected to a single websocket channel for final transmission?
Thinker, the patches in this bug need to have requests from a specific person to get review. Please go edit the attachment details and type in the email of the person you want to review these patches. If you don't know who should review, let me know and I can help find someone to review.
Attachment #656596 - Attachment is obsolete: true
Attachment #656829 - Attachment is obsolete: true
Attachment #656829 - Flags: review?
Attachment #658441 - Flags: review?(justin.lebar+bug)
Attachment #658442 - Flags: review?(justin.lebar+bug)
Comment on attachment 656830 [details] [diff] [review]
Part 3: push-API

remove review flag for changes of API spec.
Attachment #656830 - Attachment description: Part 2: push-API → Part 3: push-API
Attachment #656830 - Flags: review? → review-
Attachment #656830 - Flags: review-
Attachment #656831 - Attachment is obsolete: true
Attachment #656831 - Flags: review?
Just to let you know, I'm really backed up on reviews at the moment.  I'll try to get to this by the end of my week.  Sorry for the wait.  :(
(In reply to Justin Lebar [:jlebar] from comment #32)
> Just to let you know, I'm really backed up on reviews at the moment.  I'll
> try to get to this by the end of my week.  Sorry for the wait.  :(

That is fine.  I also need time for revising API part.
Whiteboard: [soft] → [soft][WebAPI:P3]
Keywords: feature
Attachment #656830 - Attachment is obsolete: true
Attachment #660707 - Flags: review?(justin.lebar+bug)
Attachment #660707 - Flags: review?(jonas)
Attachment #660708 - Flags: review?(justin.lebar+bug)
Some information that would be helpful in the wiki [1]:

* What is the difference between "userMessage", "appMessage", and "appBadge"?

I think I inferred what appBadge and userMessage are for, and I guess appMessage is somehow delivered to the app itself?  In that case,

* How are appMessages delivered to apps?

[1] https://wiki.mozilla.org/WebAPI/PushAPI
(In reply to Justin Lebar [:jlebar] from comment #36)
> * What is the difference between "userMessage", "appMessage", and "appBadge"?
The original idea was not limited to just these three types, but instead allowed for notifications to contain an intended consumer element, the user display would just be another consumer. My assumption is that these three states are a reduced subset of that design with:

userMessage - message displayed to the user 
appMessage - message displayed to a given app (Note: how is the app determined?)
appBadge - Indicator state to send to the app's icon (+n adds an indicator to the app icon displaying some value associated e.g. new messages, updates, bottles of beer on the wall, etc.)

I believe this spec is missing a field like "appIdentifier" to indicate which app receives the message/badge.
> I believe this spec is missing a field like "appIdentifier" to indicate which app receives the 
> message/badge.

I don't think so...  On the side calling navigator.push, the app which will receive the message is simply the app that's calling navigator.push.  On the application server side, we also don't need an application ID, because that's encoded into the URL that it's sending to.

Thanks for clarifying the other issues.  I should have been clearer that what I'd like is us to update the wiki, which would make this review a bit simpler...
Attachment #633276 - Attachment is obsolete: true
Comment on attachment 658441 [details] [diff] [review]
Part 1: implementation of push-notification protocol, v2

Hi, Thinker.  We have WebFM mostly sort of wrapped up, so I have bandwidth to
think about this now.  Thanks a lot for being so patient with me.

I've reviewed only the IDL changes in this patch.  It's a measly 10% of this
one patch!  But it's something to start with; I hope it's useful.

>diff --git a/netwerk/push-notification/nsIPushNotification.idl b/netwerk/push-notification/nsIPushNotification.idl
>+[scriptable,function,uuid(e06ef3e0-a840-4d02-ba33-0cd3fd6be254)]
>+interface nsIPushNotificationUASuccessCallback : nsISupports {
>+  void handleSuccess();
>+};
>+
>+[scriptable,function,uuid(c8c83bbf-beb8-46e1-86ce-ad9d24879743)]
>+interface nsIPushNotificationSuccessCallback : nsISupports {
>+  void handleSuccess(in ACString url);
>+};
>+
>+[scriptable,function,uuid(09ea4c62-24d9-4a97-b3ea-3f961110e4c8)]
>+interface nsIPushNotificationErrorCallback : nsISupports {
>+  void handleError();
>+};
>+
>+/**
>+ * Callbacks for nsIPushNotification::registerUA().
>+ */
>+[scriptable,uuid(0888a37d-ce90-4547-9e3c-278522d04617)]
>+interface nsIPushNotificationUARequest : nsISupports {
>+  attribute nsIPushNotificationUASuccessCallback onSuccess;
>+  attribute nsIPushNotificationErrorCallback onError;
>+};
>+
>+/**
>+ * Callbacks for nsIPushNotification::registerWA().
>+ */
>+[scriptable,uuid(2877925d-07b3-42c7-93b0-def6bed80cbf)]
>+interface nsIPushNotificationWARequest : nsISupports {
>+  attribute nsIPushNotificationSuccessCallback onSuccess;
>+  attribute nsIPushNotificationErrorCallback onError;
>+};

Can we replace all these callback classes with DOMRequest?  I don't get why we
need specific classes here.  I see that you do stuff with these classes in
other patches, so I'm happy to hold off criticism until I've read the rest of
the patches.  :)

>+/**
>+ * Push-notification service.
>+ *
>+ * This interface provides the service for implementation of         
>+ * push-notification protocol.  The implementation is responsible for
>+ * interactions with push server.  The major functions of this       
>+ * interface are to register WEB applications with push server, and  
>+ * receiving push messages sent by Application Servers from push     
>+ * server.                                                           
>+ */
>+[scriptable,uuid(53c124aa-5f62-4f83-8a15-096ba1584ac6)]
>+interface nsIPushNotification : nsISupports {

This is not an interface for one push notification; it is for handling all push
notifications.  So nsIPushNotification is not a good name; it should be
nsIPushNotifications, at least.

But maybe nsIPushNotificationsBackEnd would be a better name, since this is the
part that talks to the push notification server.

Nits on the comment:

Thanks a /ton/ for writing this high-level comment.  Some comments on the
comment:

  * "interactions with /the/ push server."
  
  * s/WEB applications/web applications/ (or Web applications, if you prefer)

  * s/Application Servers/application servers

  * When you have a list of two or more phrases, as in the last sentence in the
    comment, the phrases need to be "parallel".

    "The major functions of this interface are to X and to Y" is parallel,
    because X and Y are both infinitive phrases.  But "The major functions of
    this interface are to X and receiving Y" does not work, because "to X" is
    an infinitive phrase and "receiving Y" is not (it doesn't start with "to
    VERB").

    How about we change the last sentence to:
    
      The major functions of this interface are to register web applications
      with the push server and to receive messages sent from the push server on
      behalf of application servers.

How does this class work cross-process?  It looks like some state, like the
uatoken, needs to be shared across processes.  Does this interface exist only
in the main process (it doesn't look like it)?

I'm sure I'll learn more as I read more, but for now, a brief comment on that
subject would be appreciated.

>+  /**
>+   * The URL of the Notification server.
>+   *
>+   * Value of preference network.push-notification.notification-server.
>+   */
>+  attribute ACString nsURL;

notificationServerURL, please.

Also s/Notification/notification/ in the comment.

If this is just the value of a pref, it should be read-only, I think.  Although
it's not clear to me that this needs to be in the interface at all.

>+  attribute ACString uatokenURL;

uaTokenURL, please.

It would be great if you could add a comment here; it wasn't obvious to me what
this attribute meant.  It looks like it just mirrors a pref, like
notificationServerURL.  In that case, it should be read-only.

But like with notificationServerURL, it's not clear to me why this should be on
the interface at all.  It doesn't look to me like this needs to be on the
interface.

>+  attribute ACString uatoken;

A comment here would be great.  Also, 'uaToken'.  And like the previous ones,
this should be read-only, and, since it's really an implementation detail,
probably shouldn't be on the interface at all.

>+  attribute bool connected;

Shouldn't this be read-only?  So it matches the changes below, this should
probably be connectedToNotificationsServer.  That's kind of a mouthfull, but I
think it's at least a little clearer.  What do you think?

>+  /**
>+   * Register and connect UA itself with the notification server.
>+   */
>+  nsIPushNotificationUARequest registerUA();

How about "connectToNotificationServer"?

>+  /**
>+   * Close the connection between UA and the notification server.
>+   */
>+  void closeUA();

If we have connectToNotificationServer(), we'd have
disconnectFromNotificationServer() here.

>+  /**
>+   * Register an WEB application itself with the notification server.

s/an WEB/a web/ or Web, if you prefer.  I'll stop mentioning this one.  Also
s/itself// -- "itself" doesn't add anything here.

>+   * @param pageURL
>+   *        is the page of the application handling push messages.
>+   * @param manifestURL
>+   *        is the manifestURL of the applicaiton.
>+   * @param watoken
>+   *        is the token provided by the WEB application.

How about appToken, instead of watoken?

>+   * @return a request.
>+   */
>+  nsIPushNotificationWARequest registerWA(in ACString pageURL,
>+                                          in ACString manifestURL,
>+                                          in ACString watoken);

How about registerApp() instead?

>+  /**
>+   * Unregister the service with the notification server.
>+   *
>+   * @param manifestURL
>+   *        is the manifestURL of the application.
>+   * @return a request.
>+   */
>+  nsIPushNotificationWARequest unregisterWA(in ACString manifestURL);

unregisterApp()?

>+  /**
>+   * Get the URL for pushing for given manifestURL.
>+   *
>+   * This function does not register the app with the server.  It only
>+   * returns the existing URL if there is.

s/if there is/if there is one/, or even better, s/if there is/if we have one.

>+  nsIPushNotificationWARequest getCurrentPushURL(in ACString manifestURL);

getCurrentPushURLForApp()?
(In reply to Justin Lebar [:jlebar] from comment #36)
> Some information that would be helpful in the wiki [1]:
> 
> * What is the difference between "userMessage", "appMessage", and "appBadge"?
> 
> I think I inferred what appBadge and userMessage are for, and I guess
> appMessage is somehow delivered to the app itself?  In that case,
> 
> * How are appMessages delivered to apps?
> 
> [1] https://wiki.mozilla.org/WebAPI/PushAPI

Hi, This Wiki contains the previous PushAPI proposal, the new one is discussed on this other bug: https://bugzilla.mozilla.org/show_bug.cgi?id=787965

Also you can use this shared document with the new API proposal: https://www.dropbox.com/s/xa3gqxyd7ycpw0y/Push%20notification%20server%20APIs.docx
Attached file Push Server API
This is the document that describes the Push Server API (between all actors) and Fernando refered to, but in PDF:
BTW: I am going to update the Wiki, moving the "old" spec to WebAPI/PushAPI_old
Hi,

I just updated the wiki page, moving the old one to the location above:

https://wiki.mozilla.org/WebAPI/PushAPI
I am going over the wiki page (see #43) to clean up the spec. There are several things that are fuzzy around the WAToken (sec 2.1.1)

What generates this token? 
What format should it take? (Previous version created a minimum 256 bit random base36 string. Is that still valid?)
Should WAToken generators prefix their tokens or otherwise make them visible?
What length limits exist for these tokens?

Thanks!
A few other important design points that I don't see covered anywhere:

* If the notification is pushed from the notification server to the client over UDP, the whole notification must fit in one MTU.  That means that the protocol must impose a limit on the size of one of these notifications.  So we need to find the minimum MTU over all common cell networks.

* Does the notification server guarantee delivery of notifications to clients?  How does the application server indicate that a new notification obsoletes an old notification?

ISTM that a notification must be a tuple of the form (topic, version), where "topic" is an arbitrary (but length-limited) string defined by the application, and "version" is an integer.  If I send notifications (t, v1) and (t, v2) with v1 < v2, then (t, v1) is obsolete, and the notification server does not need to send (t, v1) to the client.  And if the client has already received (t, v2) and later receives (t, v1), Gecko is free to ignore that notification.

If we do it this way, we can probably send the notifications unencrypted.  This is helpful because encryption is going to add space overhead (for MACs, padding blocks in block ciphers, etc), and we have to fit this whole thing into one MTU.
In reply to JR Conlin [:jrconlin,:jconlin] from comment #44)
> I am going over the wiki page (see #43) to clean up the spec. There are
> several things that are fuzzy around the WAToken (sec 2.1.1)
> 

I'll try to clarify ;)

> What generates this token? 

The application. It SHALL be a SECRET (like an auth. cookie ;)

> What format should it take? (Previous version created a minimum 256 bit
> random base36 string. Is that still valid?)

Any Hashed string is valid ... so random base36 is valid. The only req. is that it SHALL be SECRET (don't forget this !)

> Should WAToken generators prefix their tokens or otherwise make them visible?

No, NEVER show the WAToken. A prefix is valid but reduces security (you now first part of it)

> What length limits exist for these tokens?

No defined limit, @willyaranda ... we should define this to avoid DoS ;)

> 
> Thanks!

You're welcome
(In reply to Justin Lebar [:jlebar] from comment #45)
> A few other important design points that I don't see covered anywhere:
> 
> * If the notification is pushed from the notification server to the client
> over UDP, the whole notification must fit in one MTU.  That means that the
> protocol must impose a limit on the size of one of these notifications.  So
> we need to find the minimum MTU over all common cell networks.
> 

No, UDP is only used to wakeup the handset, but no data should be readed from it (cause security reasons ;)
The notification is pulled from the client after he wakes up :)

> * Does the notification server guarantee delivery of notifications to
> clients?  How does the application server indicate that a new notification
> obsoletes an old notification?
> 

Nowadays, same Google politics ... we'll do best effort. Guarantee is high, but we cann't assure 100% nowadays.

No deprecated notification protocol defined on this first release. We'll study it for next release. Thanks.

> ISTM that a notification must be a tuple of the form (topic, version), where
> "topic" is an arbitrary (but length-limited) string defined by the
> application, and "version" is an integer.  If I send notifications (t, v1)
> and (t, v2) with v1 < v2, then (t, v1) is obsolete, and the notification
> server does not need to send (t, v1) to the client.  And if the client has
> already received (t, v2) and later receives (t, v1), Gecko is free to ignore
> that notification.
> 

Same as previous topic. No version, defined. Each notification is delivered to the client ASAP.

> If we do it this way, we can probably send the notifications unencrypted. 
> This is helpful because encryption is going to add space overhead (for MACs,
> padding blocks in block ciphers, etc), and we have to fit this whole thing
> into one MTU.

The encription is decided by the developer. We only carry the RAW data from one side to the other.

We only ask to SIGN it in order to verify origins.

Hope this helps.

Sincerely.
Fernando.
Sorry, I should have phrased my questions more carefully.  Although I'm interested in what you currently have designed, I'm also interested in what we /should/ design.  I consider my review here to be of the push-notifications API and its implementation.

> Nowadays, same Google politics ... we'll do best effort. Guarantee is high, but we cann't assure 
> 100% nowadays.

We should design a protocol with better guarantees.  I'm not ready to r+ a protocol for inclusion in Firefox which waves its hands and says "well, your data might be out-of-sync for a long time if our servers have a problem".

That's why I'm suggesting a design which has versions on the notifications.  This mirrors Google's design, and allows the client to periodically check to ensure that it hasn't missed any notifications.
> The encription is decided by the developer. We only carry the RAW data from one side to the other.

If the notification server sees sensitive data (such as e-mail addresses, or e-mail bodies), then I think that data should be encrypted end-to-end (between the app server and the client) as part of the protocol.

I'm not comfortable relying on apps to perform encryption when we don't even have crypto primitives built into the browser.  And I'm not comfortable r+'ing a push-notification protocol which requires users to trust a third party to read any messages which are not properly encrypted by the application.

Again, the question isn't what the protocol currently does, but what it should do.
> > What generates this token? 
> 
> The application. It SHALL be a SECRET (like an auth. cookie ;)

I'm starting to think that "application" is getting overloaded. I'd recommend differentiating the Application Provider (being the service that provides the content of the content being executed on the device) and Application (being the content executing on the device which may or may not have communication back to the Application Provider servers). For instance, I could create a calculator application that never needs to talk to my servers and could run locally via the glory of javascript and DHTML. 

> 
> > What format should it take? (Previous version created a minimum 256 bit
> > random base36 string. Is that still valid?)
> 
> Any Hashed string is valid ... so random base36 is valid. The only req. is
> that it SHALL be SECRET (don't forget this !)

I'll opt for URL safe base64 

> 
> > Should WAToken generators prefix their tokens or otherwise make them visible?
> 
> No, NEVER show the WAToken. A prefix is valid but reduces security (you now
> first part of it)
> 
> > What length limits exist for these tokens?
> 
> No defined limit, @willyaranda ... we should define this to avoid DoS ;)

Again, I'll suggest a limit of 45 characters. That's enough to transcode a 256 bit random number into a base64 value. I'd also be open to a hard limit of 89 characters to handle 512bit random numbers.
(In reply to Justin Lebar [:jlebar] from comment #49)
> > The encription is decided by the developer. We only carry the RAW data from one side to the other.
> 
> If the notification server sees sensitive data (such as e-mail addresses, or
> e-mail bodies), then I think that data should be encrypted end-to-end
> (between the app server and the client) as part of the protocol.
> 
> I'm not comfortable relying on apps to perform encryption when we don't even
> have crypto primitives built into the browser.  And I'm not comfortable
> r+'ing a push-notification protocol which requires users to trust a third
> party to read any messages which are not properly encrypted by the
> application.
> 
> Again, the question isn't what the protocol currently does, but what it
> should do.

The original version used the Stanford Javascript Crypto Library http://crypto.stanford.edu/sjcl/ which provides sufficient AES encryption. While not as optimal as using DomCrypt, it's a reasonable short term solution for encryption needs.
I'm not comfortable asking application developers to do their own crypto of highly-private information, particularly using a library that hasn't been battle-tested, unless we have no other option.  But we do have an other option -- two, in fact:

1) Don't send any sensitive data using this API, and
2) Encrypt all data as part of this API.

It feels to me like (1) is the best solution here.  The notification says "hey, there's new data available on topic X, go ask the application server for it", and the app acts on that.  There's no fundamental reason we should send the any private data using the notification API, afaict.
(In reply to Justin Lebar [:jlebar] from comment #48)
> Sorry, I should have phrased my questions more carefully.  Although I'm
> interested in what you currently have designed, I'm also interested in what
> we /should/ design.  I consider my review here to be of the
> push-notifications API and its implementation.
> 
> > Nowadays, same Google politics ... we'll do best effort. Guarantee is high, but we cann't assure 
> > 100% nowadays.
> 
> We should design a protocol with better guarantees.  I'm not ready to r+ a
> protocol for inclusion in Firefox which waves its hands and says "well, your
> data might be out-of-sync for a long time if our servers have a problem".

It should be noted that notifications have a TTL. The original spec had a default TTL of 72 hours. It may well be possible that you're off line the duration of the notification, in which case, you would not receive it. In addition, network errors may prevent the UA from accessing the servers for a length of time. Providing 100% reliability is very expensive from an operational standpoint. Those are some of the reasons that the first spec said "best effort for delivery". 

> 
> That's why I'm suggesting a design which has versions on the notifications. 
> This mirrors Google's design, and allows the client to periodically check to
> ensure that it hasn't missed any notifications.

The prior Notification code used a model similar to IMAP to indicate back to the server when a Notification was read by a given UA. (The reply was processed via the websocket calls.) Once a notification was considered "read" it was removed from the user's queue.
(In reply to Justin Lebar [:jlebar] from comment #48)
> Sorry, I should have phrased my questions more carefully.  Although I'm
> interested in what you currently have designed, I'm also interested in what
> we /should/ design.  I consider my review here to be of the
> push-notifications API and its implementation.
> 
> > Nowadays, same Google politics ... we'll do best effort. Guarantee is high, but we cann't assure 
> > 100% nowadays.
> 
> We should design a protocol with better guarantees.  I'm not ready to r+ a
> protocol for inclusion in Firefox which waves its hands and says "well, your
> data might be out-of-sync for a long time if our servers have a problem".
> 
> That's why I'm suggesting a design which has versions on the notifications. 
> This mirrors Google's design, and allows the client to periodically check to
> ensure that it hasn't missed any notifications.

We've taken compromises all over the places to get V1 out of the door. Is this really (something that might or might not happen, that might or might not be important, depending on *what* exactly the notifications for a concrete application are) something to block this on? 

Specially considering that whether the push server API has or has not an 'obsolete notification message' that should not change an iota on Firefox code, since the 'this notification obsoletes that one' is something between the app server and the push server. And as Fernando said, we can do that at a later point if it's really needed... and as I said it should not change the client implementation. At all.
RE: #52

The Privacy panels thought was that "we don't know what's private, nor should we". They had originally pushed that all info be encrypted. 

As for the library, I was in the process of building out a set of libraries to assist in this: https://github.com/jrconlin/fncrypto. I'll add in that those libraries will not play with the current spec since they presume that the UA (not the App Provider) generates and handles the keys.
(In reply to Justin Lebar [:jlebar] from comment #49)
> > The encription is decided by the developer. We only carry the RAW data from one side to the other.
> 
> If the notification server sees sensitive data (such as e-mail addresses, or
> e-mail bodies), then I think that data should be encrypted end-to-end
> (between the app server and the client) as part of the protocol.
> 
> I'm not comfortable relying on apps to perform encryption when we don't even
> have crypto primitives built into the browser.  And I'm not comfortable
> r+'ing a push-notification protocol which requires users to trust a third
> party to read any messages which are not properly encrypted by the
> application.
> 
> Again, the question isn't what the protocol currently does, but what it
> should do.

I'm sorry, but this doesn't make any sense. At all. You cannot force encryption on someone that doesn't want to use it. Why? 

Let's assume we give a 'push data' primitive that receives raw data and some kind of key. And we implement an API for that. And distribute it. So instead of using a simple SSL PUSH now they have to use our API. And of course we implement on Firefox a 'pull data' primitive that receives a key, fetches an notification, decrypts it and return the raw data. A developer still has to manage the keys (because if the notification server manages them then, hey, we can read the data again), which is the biggest hassle. So he can perfectly well use the same key for every user of HIS (not ours, HIS) application. For that, he can even find a key K for which C(D,K)=D for every D and use that as key. And we wouldn't see if any of those things happened.

E2E encryption is the responsibility of the application developer, not the transport mechanism (the notification/push server). We provide transport encryption (SSL on both the incoming and outgoing message queues) but that's it. If the application feels their notifications should be encrypted, that's his decision (again not ours) and he can use any of a number of free encryption packages to do it. And then he has to figure key management. Which he would have to do anyway if we provided the ciphering.
(In reply to Justin Lebar [:jlebar] from comment #52)
> I'm not comfortable asking application developers to do their own crypto of
> highly-private information, particularly using a library that hasn't been
> battle-tested, unless we have no other option.  But we do have an other
> option -- two, in fact:
> 
> 1) Don't send any sensitive data using this API, and
> 2) Encrypt all data as part of this API.
> 
> It feels to me like (1) is the best solution here.  The notification says
> "hey, there's new data available on topic X, go ask the application server
> for it", and the app acts on that.  There's no fundamental reason we should
> send the any private data using the notification API, afaict.

Option 1 is cool, but *we* don't send ANY data through the server. The application developers send the data. We *don't* know what the data is, and we don't care either. It isn't our data after all. So yeah, we can just remove the option for sending any data at all because someplace somewhere could (or will) send some data he should not. But it's... a little bit extreme. Not to mention it breaks some use cases (most of them annoying, true), where the whole notification can be included on the message and the application doesn't need to check anything. Like sending notifications to every user of an app to tell them: 'hey, the new version is out, it's shiny, and today is free, come get it' without overloading to dead the app servers when every user tries to download the text of the notification. 

Again, I don't think we should charge ourselves with things that are completely out of the scope. This is a transport mechanism, something to get some raw data that we don't care what contains from point A to point *. If the data is sensitive then the one responsible for the data (and that's the application that you as a user are trusting your data with in the first place) is the one that should take care of protecting it.

We can protect it on transit (and are doing) and we could (but that's architectural, and out of the scope of the API definition) protect it while in rest on our servers. We're not responsible about doing E2E encryption, nor should we be.
> I'm sorry, but this doesn't make any sense. At all. You cannot force encryption on someone that 
> doesn't want to use it.

Of course you can't force encryption on someone who doesn't want it.  I can just broadcast my keys to the world.

The issue is that we should not design an API where it's difficult to properly secure highly-private data.  We seem to have designed such an API, and for no good reason.  Like I said earlier, it strikes me as quite simple to design the push-notification API so that no private data passes over it at all.  Then the private data can be protected by a direct SSL connection between the client and the app server.

> Specially considering that whether the push server API has or has not an 'obsolete notification 
> message' that should not change an iota on Firefox code, since the 'this notification obsoletes 
> that one' is something between the app server and the push server. And as Fernando said, we can do 
> that at a later point if it's really needed... and as I said it should not change the client 
> implementation.

Reliable notifications will necessarily change the client protocol, because the client will be required to ack notifications.

As far as I can tell from the design document uploaded here (thanks!), there's no attempt to re-send the UDP packet if it's not received, and the client doesn't ack the UDP packet.  Is that right?
> This is a transport mechanism, something to get some raw data that we don't care what contains from 
> point A to point *.

Actually, I think it's a ping mechanism, not a transport mechanism at all.  "Ping, this topic X you asked to be notified about earlier has been updated to version Y.  Go ask the app server about it."

Arbitrary data transfer should be out of scope for this protocol.  That would simplify it greatly.

> So yeah, we can just remove the option for sending any data at all because someplace somewhere 
> could (or will) send some data he should not. But it's... a little bit extreme. Not to mention it 
> breaks some use cases (most of them annoying, true), where the whole notification can be included 
> on the message and the application doesn't need to check anything. Like sending notifications to 
> every user of an app to tell them: 'hey, the new version is out, it's shiny, and today is free, 
> come get it' without overloading to dead the app servers when every user tries to download the 
> text of the notification. 

That's a pretty unconvincing use-case, IMO.  As soon as I find out that there's a new version of the app, I'm going to go /download/ it.  That's a huge overhead compared to replying to the pings from clients  saying "you told me the version is updated; what's the new version?"
(In reply to Justin Lebar [:jlebar] from comment #58)
> > I'm sorry, but this doesn't make any sense. At all. You cannot force encryption on someone that 
> > doesn't want to use it.
> 
> Of course you can't force encryption on someone who doesn't want it.  I can
> just broadcast my keys to the world.
> 
> The issue is that we should not design an API where it's difficult to
> properly secure highly-private data.  We seem to have designed such an API,
> and for no good reason.  Like I said earlier, it strikes me as quite simple
> to design the push-notification API so that no private data passes over it
> at all.  Then the private data can be protected by a direct SSL connection
> between the client and the app server.

Er... ok. The only way to design the push-notification API without using mandatory E2E encryption (and that's just one call harder with the current implementation than if the implementation also did the ciphering) so that no private data passes over it at all is to design it so that no data passes over it at all. Which we *could* do. But then we force apps to distribute non private data from their servers when they could be using our infrastructure to do it. Which isn't that bad for 1 to 1 messages (the only thing the app loses is the ability to chose when to distribute the message), but for 1 to * messages can be a killer. 

And all that because we don't trust the application developer to do the right thing with our data. The application developer supposedly we already our data to in the first place


> 
> > Specially considering that whether the push server API has or has not an 'obsolete notification 
> > message' that should not change an iota on Firefox code, since the 'this notification obsoletes 
> > that one' is something between the app server and the push server. And as Fernando said, we can do 
> > that at a later point if it's really needed... and as I said it should not change the client 
> > implementation.
> 
> Reliable notifications will necessarily change the client protocol, because
> the client will be required to ack notifications.
> 
> As far as I can tell from the design document uploaded here (thanks!),
> there's no attempt to re-send the UDP packet if it's not received, and the
> client doesn't ack the UDP packet.  Is that right?
(In reply to Justin Lebar [:jlebar] from comment #59)
> > So yeah, we can just remove the option for sending any data at all because someplace somewhere 
> > could (or will) send some data he should not. But it's... a little bit extreme. Not to mention it 
> > breaks some use cases (most of them annoying, true), where the whole notification can be included 
> > on the message and the application doesn't need to check anything. Like sending notifications to 
> > every user of an app to tell them: 'hey, the new version is out, it's shiny, and today is free, 
> > come get it' without overloading to dead the app servers when every user tries to download the 
> > text of the notification. 
> 
> That's a pretty unconvincing use-case, IMO.  As soon as I find out that
> there's a new version of the app, I'm going to go /download/ it.  That's a
> huge overhead compared to replying to the pings from clients  saying "you
> told me the version is updated; what's the new version?"

That's actually something that a lot of apps seem to do, at least on iOS. Not specifically for 'hey, there's a new version, come and get it' (which by the way, they won't get from the developer site but from the marketplace), but the kind that says 'Come and play whatevergame today! You'll get double fruzzies for every fizzie you kill!'. Kid games do that a lot. And that works because the game developer doesn't need to have the infrastructure to serve any data. He sents one notification that gets distributed to every customer. Nice and cheap (for him anyway).
> But then we force apps to distribute non private data from their servers when they could be using 
> our infrastructure to do it. Which isn't that bad for 1 to 1 messages (the only thing the app loses 
> is the ability to chose when to distribute the message), but for 1 to * messages can be a killer. 

I don't understand why we should optimize this protocol for broadcast messages.  You yourself categorized these use-cases as "annoying" earlier.

Broadcast messages like "new version of app available" are very well-handled by polling an app server once a day.  It's non-broadcast messages like "you've got mail" that push notifications are important for.

> And all that because we don't trust the application developer to do the right thing with our data. 
> The application developer supposedly we already our data to in the first place

It's not an issue of trust.  The issue is that application developers lack tested tools to do strong crypto in the browser at the moment.  The best we have is the Stanford crypto library.  I'm friends with the authors of that library, but I would not trust it with my personal data; the library is not battle-tested in the same way NSS is.
So, this version of Notifications does not allow for anonymous notifications, then. Users would leak the number, type and potentially the location of user agents they have (by pinging the App servers that created the notification requests) by polling for additional information not carried in the original notification. Because of the polling mechanism, a site could determine which users are active when. 

We also have no idea what element of the interaction is private. A user might simply get an notification to gather at a particular location, or that a test result has arrived, or even that someone is waiting for them at a location. Of course, the context of any of those (a protest gathering, STD test, or an affair) could be something that might be subject to subpoena.
> the kind that says 'Come and play whatevergame today! You'll get double fruzzies for every fizzie 
> you kill!'. Kid games do that a lot.

Again, I don't need push notifications for this.  I don't need to wake up the device from sleep so it can receive this special offer.  The client should simply periodically poll the app server to get this data.
> So, this version of Notifications does not allow for anonymous notifications, then. Users would leak 
> the number, type and potentially the location of user agents they have (by pinging the App servers 
> that created the notification requests) by polling for additional information not carried in the 
> original notification. Because of the polling mechanism, a site could determine which users are 
> active when. 

Anonymous notifications only work if they don't wake up the app.  As soon as we wake up the app due to a notification, we're screwed, because the app can leak whatever it wants to the app server.

The notification API proposed here always wakes up the app when we receive a notification.  That happens with or without my proposal to limit the data sent in the notification.
Yep, not arguing, just pointing out the difference. I'll note that 'user messages' where there may be no associated app might still be anonymous, but I get the sense that those are not long for this spec.
(In reply to Justin Lebar [:jlebar] from comment #64)
> > the kind that says 'Come and play whatevergame today! You'll get double fruzzies for every fizzie 
> > you kill!'. Kid games do that a lot.
> 
> Again, I don't need push notifications for this.  I don't need to wake up
> the device from sleep so it can receive this special offer.  The client
> should simply periodically poll the app server to get this data.

He, I don't need them either. My kids seem to like them well enough though. And if we leave notifications have app-defined data, the app doesn't even need to *have* a server (in fact I don't think most of the stupid games they play have servers). 

And the fact that apps *can* send data doesn't mean they *have* to send data. If deciphering on the UA is too hard or too unreliable (which BTW is a complete different problem from the PUSH API and I think should be boarded on other place/bug/whatever) then the app developer can do a terrible thing... he can *choose* to not send the data!
I feel like you've conceded this point; your argument as I understand it is now that games might want to send data in push notifications for ads, which we both agree is not useful, and that apps might choose not to send any data inside the push notification, which I hope we can agree is not relevant to the question at hand.

Let's move on to another important point about the protocol, if that's OK with you.

It looks like the notification part of a push notification is a single UDP packet sent to the client.

Is any attempt made to discover whether this packet made to the client and re-send it if it doesn't?  UDP packets can of course be dropped even when I have a strong connection to the network, and of course I don't always have a strong connection to the network as I go about my daily  business.
(In reply to Justin Lebar [:jlebar] from comment #48)
> Sorry, I should have phrased my questions more carefully.  Although I'm
> interested in what you currently have designed, I'm also interested in what
> we /should/ design.  I consider my review here to be of the
> push-notifications API and its implementation.
> 

Sure, Probably I don't expressed well ... 

> > Nowadays, same Google politics ... we'll do best effort. Guarantee is high, but we cann't assure 
> > 100% nowadays.
> 
> We should design a protocol with better guarantees.  I'm not ready to r+ a
> protocol for inclusion in Firefox which waves its hands and says "well, your
> data might be out-of-sync for a long time if our servers have a problem".
> 

When I answered I was thinking in the server infrastructure. We are designing it to be HA, scalable, etc... so in theory is 100% (99.99999% reliable) anyway, it depends on a lot of things so I cann't assure it now ... so that's only what I mean ... _sorry for the confusion_.

So, for clarify, the protocol is completely reliable: In order to avoid out-of-sync the notification message includes a TTL parameter used to define the Time-To-Live so if the handset is powered off or the message cann't be delivered before the TTL time it will be discarded. (see: https://wiki.mozilla.org/WebAPI/PushAPI#Notification_JSON_Content)

On the client side, as soon the message is delivered (and ACK to the server) it shall be dispatched to the application.


> That's why I'm suggesting a design which has versions on the notifications. 

I don't understand why we need version on the notification, it increase the server logic (which needs to move millions of messages per second) and with the TTL can be solved in a clean way ;)

> This mirrors Google's design, and allows the client to periodically check to
> ensure that it hasn't missed any notifications.

We want to add on a next release of the server a callback URL for status notification.

--

Hopes now the confusion was clarified ;)
(In reply to Justin Lebar [:jlebar] from comment #49)
> > The encription is decided by the developer. We only carry the RAW data from one side to the other.
> 
> If the notification server sees sensitive data (such as e-mail addresses, or
> e-mail bodies), then I think that data should be encrypted end-to-end
> (between the app server and the client) as part of the protocol.
> 

I Agree but the server cann't decide the encryption method since the security shall be end to end, so that's what I mean that the crypto is not responsibility of the server infrastructure and only of the application server and web application.

Anyway, the server uses HTTPS and WSS on both sides to avoid attacks on the transmission, but the crypto is responsibility of the third parties, no the server.

> I'm not comfortable relying on apps to perform encryption when we don't even
> have crypto primitives built into the browser.  And I'm not comfortable
> r+'ing a push-notification protocol which requires users to trust a third
> party to read any messages which are not properly encrypted by the
> application.
> 

Yeap, I agree too... but the server cann't force to use one or another crypto mechanism since it's in the middle, this is responsibility of gecko, webapps and aplication servers.

> Again, the question isn't what the protocol currently does, but what it
> should do.

Ok ;)
(In reply to JR Conlin [:jrconlin,:jconlin] from comment #50)
> > > What generates this token? 
> > 
> > The application. It SHALL be a SECRET (like an auth. cookie ;)
> 
> I'm starting to think that "application" is getting overloaded. I'd
> recommend differentiating the Application Provider (being the service that
> provides the content of the content being executed on the device) and
> Application (being the content executing on the device which may or may not
> have communication back to the Application Provider servers). For instance,
> I could create a calculator application that never needs to talk to my
> servers and could run locally via the glory of javascript and DHTML. 
> 

Well, I mean the Web Application who wants to receive PUSH notifications using the notification platform ;)

> > 
> > > What format should it take? (Previous version created a minimum 256 bit
> > > random base36 string. Is that still valid?)
> > 
> > Any Hashed string is valid ... so random base36 is valid. The only req. is
> > that it SHALL be SECRET (don't forget this !)
> 
> I'll opt for URL safe base64 


URL? for the WAToken? - No, the URL is returned by the server and the Token is only an auth. hash. We are mixing concepts ;)

> 
> > 
> > > Should WAToken generators prefix their tokens or otherwise make them visible?
> > 
> > No, NEVER show the WAToken. A prefix is valid but reduces security (you now
> > first part of it)
> > 
> > > What length limits exist for these tokens?
> > 
> > No defined limit, @willyaranda ... we should define this to avoid DoS ;)
> 
> Again, I'll suggest a limit of 45 characters. That's enough to transcode a
> 256 bit random number into a base64 value. I'd also be open to a hard limit
> of 89 characters to handle 512bit random numbers.

Why Base64, we don't need to UnBase64 the hash so a SHA256 or SHA512 is enough ;)
(In reply to Justin Lebar [:jlebar] from comment #52)
> I'm not comfortable asking application developers to do their own crypto of
> highly-private information, particularly using a library that hasn't been
> battle-tested, unless we have no other option.  But we do have an other
> option -- two, in fact:
> 
> 1) Don't send any sensitive data using this API, and
> 2) Encrypt all data as part of this API.
> 
> It feels to me like (1) is the best solution here.  The notification says
> "hey, there's new data available on topic X, go ask the application server
> for it", and the app acts on that.  There's no fundamental reason we should
> send the any private data using the notification API, afaict.

That's the main reason of notificacions, inform about new data async.

Also you can send anything, anyway the developer shall encrypt it, that's true and the user shall trust on it.

Like with current popular IM solutions which aren't secure (and is public) and a lot of people use it ... is the user problem to trust in the incorrect app ... to avoid this you need to teach your users, but is not a protocol issue (from my point of view)
(In reply to Justin Lebar [:jlebar] from comment #68)
> I feel like you've conceded this point; your argument as I understand it is
> now that games might want to send data in push notifications for ads, which
> we both agree is not useful, and that apps might choose not to send any data
> inside the push notification, which I hope we can agree is not relevant to
> the question at hand.

Actually, no :) I said it isn't useful to *me*. But I told you that my kids, which are the target for *those* notifications actually *like* them. So for the app target (the kids) they are actually useful. And for the app developer they *are* useful also since they keep the kids interested. And the way it works as the API is defined now, they get something useful (for they and their target public) that doesn't cost them anything. I don't see how that isn't useful.

And that apps can choose to not send any data actually means we don't have to *force* them to not send any data. It is, and should be, a decision on hands on the app developer. Allowing to send data has some valid use cases (which you don't like, but they exist, they're used, and believe it or not, there are people that like getting those notifications) that would become unfeasible or much more expensive if we forbid it. 

So point not conceded, sorry :)
Are you guys willing to change your protocol in response to criticism?  I feel like maybe I'm just wasting my breath here.
(In reply to Fernando R. Sela from comment #71)
> > Again, I'll suggest a limit of 45 characters. That's enough to transcode a
> > 256 bit random number into a base64 value. I'd also be open to a hard limit
> > of 89 characters to handle 512bit random numbers.
> 
> Why Base64, we don't need to UnBase64 the hash so a SHA256 or SHA512 is
> enough ;)

I agree that how the hash is generated is meaningless (it's a random value). My concern was more in ensuring that the value is transmitted without accidental alteration and in a reliable manner. If we can ensure UTF-8 end-to-end encoding, then I see no problem sending the value "raw". I've learned that it's often wiser, though, to encode the value to something simple so that there's less chance of accidental corruption or mistaken use of the wrong formatting causing issues.
(In reply to Justin Lebar [:jlebar] from comment #74)
> Are you guys willing to change your protocol in response to criticism?  I
> feel like maybe I'm just wasting my breath here.

If that goes for me, I'm not even the one that would have to change the protocol... and speaking only for myself, I have and will support any changes I believe strengthen the protocol security or enrich it somehow. And... the point I've argued to death (and I'm done arguing that, don't worry) is the kind of data being transmitted is not and should not be part of the protocol definition. And that removing what are (in my view, and I guess in the view of the people actually using them) valid use cases because we don't trust app developers to do the right thing is extreme.

So, I'm afraid that I don't agree with that change, and if the point is raised internally I'll say as much. The final say isn't mine in any case.
Some more questions on the design document:

1) The public key |pbk| argument to the register() call in the Gecko <-> App interface needs some improvement in terms of documentation.

The public key is a base-64-encoded key, but what is it?  An RSA 1024 key?  An RSA 2048 key?  Something else?

Also, the fact that this is a public key needs to be explicitly defined in the spec; it wasn't at all clear to me.  (I'm used to using "pk" and "sk" for public key and secret key, respectively.)

I feel like using cryptography here when a secret URL would work just as well is overkill.  Either way you must have a secret -- the only question is whether that secret is a private key or a a URL.  Moreover, I don't see why we need to use public/private key crypto here; the app server can just send us a symmetric key over our SSL connection.

But I'm holding off any criticism of the protocol itself until we establish whether it's up for debate, or whether this is a take-it-or-leave-it proposal.

2) In the Gecko <-> Notification Server interface, does the same device get the same token every time it GETs from https://push.telefonica.com/token?  If so, how do you guarantee that, since the GET does not seem to contain any identifying information.

If not, and each GET from /token is unique, then is the device supposed to save the token forever?  If so, this needs to be added to the design document, because it's not clear.

3) Is the getAllMessages() Gecko <-> Notification Server method for a given WA, or a given UA?  (4) says for a given WA, and then 4.a says for a given UA.

3.1) Can I ACK more than one message at a time? (Section 4.b, section 5)

3.2) I think a word is missing in section 4.c, but I don't know what it is.  Please fix up that section.

4) Please indicate in the section on the notification server <-> third party protocol what the valid priorities are.

5) I think having different priorities is a losing game; we've already failed if we're backed up such that we have to prioritize some messages over others.  And anyway, all apps are just going to send P1 messages.  We're not gaining anything with that.

6) Please indicate in the spec what the valid TTLs are for a notification.

7) What's the purpose of the timestamp field?

8) You can't include the signature in the message sent from app server to notification server, because the signature is over the message!  The signature traditionally comes after the message.  Please specify that, and specify the exact crypto parameters for the signature.  Although again, I think the signature is overkill, because it just shifts your responsibility from maintaining one secret to maintaining a different secret.
> I'm not even the one that would have to change the protocol...

Whose decision is this, on your end?

> So, I'm afraid that I don't agree with that change, and if the point is raised internally I'll say 
> as much. The final say isn't mine in any case.

Can we please get whoever makes the decisions on your end into a forum where he/she/they can discuss this with Thinker and me?  I don't care if that forum is this bug, or an e-mail chain, or whatever.  I would prefer that the discussion happen in public, however.

If we can't agree on a protocol, we both lose.  So I hope we can come to an agreement here.
In response to every comment:

1) The WAtoken could be shared (so we can send broadcast messages) or, as Fernando said, secret, so it acts as a cookie and it's unique for each user
2) The UDP is just a "ping" (paging in network terms). When Gecko receives that packet, just wakes up the 3G connection and open a web socket to the push server to fetch notifications. This UDP is only used in mobile networks.
3) We can have multiple notification versions. I need to discuss this with Fernando (we are the main developers of the server-side, and of course, we can change things, but remember that this is for v1 and the code complete is really near in time)
4) We can't assure a 100% delivery, but "best effort".
5) If UDP packet doesn't reach the phone, we will take other measures to retry to send that packet past some minutes.
6) I think that *encryption* will be really cool, but we are just transmitting data, it's up to the dev to cypher the message payload (not the entire notification message, we need some data), and also how to send the URL from the app to his servers.
7) We ask for the PbK (pb) to sign the body and check if it comes from the correct origin.
8) It's up to the developer to send by this notifications what kind of information. We don't care, we just transport.
9) Yes, we have an ACK message to say the server that we have received a notification.
10) Yes, we think that developers could send "arbitrary" data. They could send a JSON payload with a soccer goal info or just a code that indicates that there are a new email and the app need to connect the server to get it. Up to the developer.
11) The thing is that we want to avoid polling in B2G. We want the phone to stay as idle as possible, and having 30 apps installed, everyone polling the server is the thing we want to solve with this server. (both for battery life, less network data for the user and the carrier).
12) We don't allow "anonymous" notifications, because we need to reach you in the mobile network, and thus we need at least your IP, port and MCC and MNC to use the UDP part, which will reduce *a lot* the network traffic if developers use this server.

--- to comment #77
13) The PbK (pb) is a RSA-SHA256. We need to encode it to base64 because we had some issues while transferring it in a JSON.
14) The PbK is used to verify the origin of the notification, as in point 7.
15) The token is different each time it's asked, but the thing is that you just ask for the token once, and you keep it forever. If you want to reset, just ask another one and register again the web apps that you want.
16) in response to 3) for a given UA, sorry
17) in response to 3.1) yes, we can think about it, we just want to make sure the system works
18) in response to 5) yes, I think so. But maybe an email notification has more priority than a "new version message". We can delete it.
19) in response to 6) TTLs are from 0 to Infinitive. But we will delete messages that are older than 4 days, despite of the TTL.
20) in response to 8) it only signs the "body" of the notification, not the whole notification, which will be the payload that your app will consume at the end.

So yes, let's discuss here ;)

Cheers,

Guillermo
Thanks for your responses.

> So yes, let's discuss here ;)

I'd only like to discuss this design further in a forum where someone with decision-making authority from Telefonica is present.  So I'm going to hold off responding to comment 79 until we get that cleared up.

Thanks!
In the meantime, it would definitely be helpful to others if you could update the design doc to clarify the points we've discussed.
> I'd only like to discuss this design further in a forum where someone with decision-making authority 
> from Telefonica is present.

Unless that's you, Guillermo?  Sorry I'm not clearer on who has what authority here.

Another concern I have is about a malicious client replaying a broadcast message to all the recipients.  As far as I can tell, a client has all the necessary data to replay a message it received from the notification server.  (There's no scheme in the signature preventing replays, for example, again showing that the cryptographic signature has limited utility.)

That means, in particular, that if a client receives a broadcast message, she can re-broadcast that message to all recipients as many time as she would like.  This is a simple DoS attack on the whole notification infrastructure.

A simple fix is not to allow broadcast messages.  This is consistent with a principal in Internet security where we make an attacker do as much or more work than she is asking the server to do.  (In contrast, if we allow broadcast messages, an attacker gets to do much less work than she asks the server to do, amplifying the attacker's power.)
(In reply to Justin Lebar [:jlebar] from comment #82)
> > I'd only like to discuss this design further in a forum where someone with decision-making authority 
> > from Telefonica is present.
> 
> Unless that's you, Guillermo?  Sorry I'm not clearer on who has what
> authority here.

I don't think so ;) I'll ping my supervisors.

> 
> Another concern I have is about a malicious client replaying a broadcast
> message to all the recipients.  As far as I can tell, a client has all the
> necessary data to replay a message it received from the notification server.
> (There's no scheme in the signature preventing replays, for example, again
> showing that the cryptographic signature has limited utility.)
> 
> That means, in particular, that if a client receives a broadcast message,
> she can re-broadcast that message to all recipients as many time as she
> would like.  This is a simple DoS attack on the whole notification
> infrastructure.
> 
> A simple fix is not to allow broadcast messages.  This is consistent with a
> principal in Internet security where we make an attacker do as much or more
> work than she is asking the server to do.  (In contrast, if we allow
> broadcast messages, an attacker gets to do much less work than she asks the
> server to do, amplifying the attacker's power.)

As it is right now, only the right person with the right pb can sign messages, so they are allowed to enter the system (if the signature of the message is not valid, we just reject it on the frontends).

Buuut… we haven't thought about it, and we were sending the raw notification back to the devices, so, yes, they could do a POST to the URL and broadcast the same messages over and over again. Excellent point, we need to think about what notification parameters sent by the third party server should be passed back to the device. Thanks for pointing this out!

Cheers
(In reply to Guillermo López (:willyaranda) from comment #83)
> (In reply to Justin Lebar [:jlebar] from comment #82)
> > > I'd only like to discuss this design further in a forum where someone with decision-making authority 
> > > from Telefonica is present.
> > 
> > Unless that's you, Guillermo?  Sorry I'm not clearer on who has what
> > authority here.
> 
> I don't think so ;) I'll ping my supervisors.
> 
> > 
> > Another concern I have is about a malicious client replaying a broadcast
> > message to all the recipients.  As far as I can tell, a client has all the
> > necessary data to replay a message it received from the notification server.
> > (There's no scheme in the signature preventing replays, for example, again
> > showing that the cryptographic signature has limited utility.)
> > 
> > That means, in particular, that if a client receives a broadcast message,
> > she can re-broadcast that message to all recipients as many time as she
> > would like.  This is a simple DoS attack on the whole notification
> > infrastructure.
> > 
> > A simple fix is not to allow broadcast messages.  This is consistent with a
> > principal in Internet security where we make an attacker do as much or more
> > work than she is asking the server to do.  (In contrast, if we allow
> > broadcast messages, an attacker gets to do much less work than she asks the
> > server to do, amplifying the attacker's power.)
> 
> As it is right now, only the right person with the right pb can sign
> messages, so they are allowed to enter the system (if the signature of the
> message is not valid, we just reject it on the frontends).
> 
> Buuut… we haven't thought about it, and we were sending the raw notification
> back to the devices, so, yes, they could do a POST to the URL and broadcast
> the same messages over and over again. Excellent point, we need to think
> about what notification parameters sent by the third party server should be
> passed back to the device. Thanks for pointing this out!
> 
> Cheers

From my point of view, only the PAYLOAD (message param. in the JSON object) is needed by the client, so the signature never goes out from the notification server.

Avoid sending the signature again avoids this SPAM, like Guillermo told, the frontend rejects unsigned messages (or bad signed ones), so without the private key, is nos feasible to re-send the message.

Good point. Cheers.
> From my point of view, only the PAYLOAD (message param. in the JSON object) is needed by the client, 
> so the signature never goes out from the notification server.

Well, I probably want a timestamp of some sort at least, in addition to the payload.  But sure, if you don't send the signature to the client, then the client can't replay the message.

Please update the documentation with your new proposal for what Gecko should receive from the notification server, so we can all be on the same page here.
I'm pretty confused about the decision-making structure here.  Why is it that you guys are able to modify what's sent to Gecko (to prevent the replay attack), but you are not empowered to engage about other issues I'm raising about the spec?  Specifically what kinds of issues are we able to debate here, and what issues are you not able to engage with me on until a supervisor arrives?

To be clear, this spec is not "Telefonica to B2G push notifications"; it's "Web push notifications".  As such, I'm pretty uncomfortable with reliability expectations ("99.99999%", see comment 69) that Google has said [1] they can't meet.  Even if you guys are capable of running a service which drops no more than 1 in 10 billion messages, I think there's pretty clear evidence that few other entities are able to run such a system.

So what I'm asking here is for us to consider if there are changes we can make to make implementing a correct notification server more realistic for organizations which can't design systems with seven-nines reliability.  I think there are numerous changes we could make to reduce load on the data center, allow the system to scale better, etc.  Google has a lot of experience in this area, and I think we should carefully consider whether their paper on push notifications [1] has insight to offer in this respect, instead of designing our system ad hoc.

In addition to being concerned about how the design of the protocol makes it difficult to handle server failures and may present scaling difficulties, I am concerned about

 - broadcast messages, which are a huge burden on the notification server and an easy DoS vector,
 - accidental exposure of private user data through notification messages,
 - unnecessary crypto requirements from the app server that add significant complexity to app server implementations.

Again, it's vital that we consider these concerns in the context of an ecosystem of actors providing and consuming this service.  Even if Telefonica is capable of servicing arbitrarily large and frequent broadcast messages, and even if all of our app servers won't have any problem meeting our crypto requirements, that's not sufficient to show that these concerns aren't relevant in general.

If you guys aren't willing or able to negotiate in good faith and consider large changes to the protocol to address these issues, then we'll likely ship something with known deficiencies and that we think it's unlikely anyone else will be able to implement correctly.  At that point, we can't call it a Web API.  We can add it to B2G as a certified-apps-only API and try to improve it later, and I'm totally happy to consider that approach if you'd like to talk abou tit.  But adding APIs that only a chosen few apps can touch is against the principals of this project, so I hope we can instead agree to try to come to consensus here.

[1] http://research.google.com/pubs/pub37474.html
A few comments here:

The security issues Justin raised are definitely valid. It's very unfortunate that we didn't discover them during the security review we had, but that doesn't make them any less important.

These types of issues are one of the reasons we have reviews. We don't have them as just a procedural road-block, but the intent is to find issues and make sure that the code and features that we check in is of good quality, like the ones Justin has pointed out.


However, the fact that these are real issues doesn't mean that we have to fix them in any particular way, or maybe even fix them at all.

The next step here should be to review the security and privacy issues that Justin has found. We need to decide which ones we can live with and which ones we need to fix. And for the ones that we need to fix, we need to decide how to fix them. Ideally that would of course be done without losing any use cases.


Finding last minute issues is an unfortunate effect of having last minute patches. I know everyone has been working very hard on this, and Thinker has done a great job at writing this patch. But that doesn't mean that we should lower our standards for the review phase of this feature. Push notifications are hard (as demonstrated by the fact that it took both iOS and Android many releases to build it), so we need to make sure that we do a good job at it.

It's much better that we find and fix problems now than that we find the problems them once the code has been shipped to thousands of users since that might result in having to shut off the feature completely until we've pushed an update to all users.

Additionally, if this feature isn't well built developers won't use it. The only way we can get people to use this feature is by building it well. We don't have any other way of discouraging them from using long-polling HTTP requests or websockets.
To put the biggest problem with this protocol in succinct terms:  The protocol does not guarantee eventual delivery of notifications.  A client will lose notifications if

 1. the client is offline for 4 days or more, or
 2. there is a fault in the notification servers.  The servers might crash and lose data, be unavailable for a period of time, or simply run out of space and drop unsent notifications before the 4 day TTL expires.

Because notifications are delivered on a best-effort basis, applications on the client will need to poll the application servers periodically to ensure that they don't miss important updates.

This is problematic for a number of reasons, including:

 a. Polling increases the load on application servers, making it harder for developers to design apps at scale.
 b. Polling uses energy on the device, reducing battery life.
 c. Each application must poll its application servers separately, so the impact on battery life gets worse as more apps use push notifications.
 d. When setting the polling frequency, application developers are asked to trade off between
      i.  having fresher data on the client and
      ii. increasing the battery life of the device,
    but application developers are the wrong people to be making that trade-off.

I think we can guarantee eventual delivery of notifications and avoid this mess with a protocol like Google's Thialfi.

I'm happy to write this up in more detail, but the biggest change we'd have to enact to make our protocol more Thialfi-like is to not deliver explicit messages in the notifications.  Instead, we would deliver (topic, version) tuples to clients.

If we require the notification system to deliver messages (as currently spec'ed), the server must queue the messages when clients are unavailable, and it must never send a message twice (because messages are not idempotent).  This makes correct implementation of servers much more difficult in the general case, and makes correct implementation especially difficult in the presence of faults.

In contrast, notification servers need not queue anything in the (topic, version) scheme, so their storage is bounded.  Moreover, a (topic, version) notification is idempotent, so application servers are free to send notifications to clients based on out-of-date information.

Again, it's really not clear to me that one could build a system that works at web scale under the current protocol.  If that's not the requirement -- if this API is available only to certified apps, so isn't part of the web -- then maybe the current proposal (with some smaller modifications) is acceptable.  But I do not know if it's worth the large amount effort that remains on Mozilla's and Telefonica's sides to build a throw-away system accessible only to a few apps, and then replace that system entirely in the next version.
Hi,

some ideas here…

Justin, I am not against your arguments but, for some questions:

1) Are you saying that if a notification is sent, you will want to be delivered next time you connect? Even if you connect 1 month later?

Example: I have an app in Android that send me push messages with latest hot news. If I'm offline, and connect a month later, should I get the notification from 30 days back news? Maybe I'm not interested or I have read that headline before. And in this case, the TTL doesn't make any sense since different users could read that in different moments.

2) Delivering a message just one time is not really difficult on our side. They must be ACKed by the user agent (gecko) in order to not re-deliver it again.
In fact, remember that this push notifications are sent to the app, and it does *whatever* it wants with them. If they found two notifications with the same (id, body), they can discard it.

3) Actually this API is only for webapps 'installed' on the phone (or in desktop gecko). But I'd love to see this more open to simple websites in the future.

And yes, I'm definitely going to read the Google's Thialfi paper, but we based on this, mainly: http://developer.android.com/guide/google/gcm/index.html
(In reply to Jonas Sicking (:sicking) from comment #87)
> A few comments here:
> 
> The security issues Justin raised are definitely valid. It's very
> unfortunate that we didn't discover them during the security review we had,
> but that doesn't make them any less important.
> 

Jonas, thanks for your explanation. I agree that it is very unfortunate not all these comments were raised during the security review. 


> These types of issues are one of the reasons we have reviews. We don't have
> them as just a procedural road-block, but the intent is to find issues and
> make sure that the code and features that we check in is of good quality,
> like the ones Justin has pointed out.
> 
> 
> However, the fact that these are real issues doesn't mean that we have to
> fix them in any particular way, or maybe even fix them at all.
> 
> The next step here should be to review the security and privacy issues that
> Justin has found. We need to decide which ones we can live with and which
> ones we need to fix. And for the ones that we need to fix, we need to decide
> how to fix them. Ideally that would of course be done without losing any use
> cases.
> 

As for the comments, I see two different kind of them:
 
 1/ Some of them do not have to do with security. Those items should discussed at a product level and hence should not be part of this discussion.
 2/ Security comments: Happy to discuss them, however some of them were already raised during the Security Review that took place, for instance, E2E Encryption was discussed and agreed that it was a responsibility of the app developer.

> 
> Finding last minute issues is an unfortunate effect of having last minute
> patches. I know everyone has been working very hard on this, and Thinker has
> done a great job at writing this patch. But that doesn't mean that we should
> lower our standards for the review phase of this feature. Push notifications
> are hard (as demonstrated by the fact that it took both iOS and Android many
> releases to build it), so we need to make sure that we do a good job at it.
> 

As this item was considered a P1 and part of a complete E2E solution, and reviews are that important, I would like to understand why this review did not have all the right stakeholders, and why it did not happen before given that the protocol was pretty much defined a couple of months ago. 


> It's much better that we find and fix problems now than that we find the
> problems them once the code has been shipped to thousands of users since
> that might result in having to shut off the feature completely until we've
> pushed an update to all users.
> 
> Additionally, if this feature isn't well built developers won't use it. The
> only way we can get people to use this feature is by building it well. We
> don't have any other way of discouraging them from using long-polling HTTP
> requests or websockets.
(In reply to Justin Lebar [:jlebar] from comment #88)
> To put the biggest problem with this protocol in succinct terms:  The
> protocol does not guarantee eventual delivery of notifications.  A client
> will lose notifications if
> 
>  1. the client is offline for 4 days or more, or
>  2. there is a fault in the notification servers.  The servers might crash
> and lose data, be unavailable for a period of time, or simply run out of
> space and drop unsent notifications before the 4 day TTL expires.
> 
> Because notifications are delivered on a best-effort basis, applications on
> the client will need to poll the application servers periodically to ensure
> that they don't miss important updates.
> 
> This is problematic for a number of reasons, including:
> 
>  a. Polling increases the load on application servers, making it harder for
> developers to design apps at scale.
>  b. Polling uses energy on the device, reducing battery life.
>  c. Each application must poll its application servers separately, so the
> impact on battery life gets worse as more apps use push notifications.
>  d. When setting the polling frequency, application developers are asked to
> trade off between
>       i.  having fresher data on the client and
>       ii. increasing the battery life of the device,
>     but application developers are the wrong people to be making that
> trade-off.
> 

OK, None of those points seems to be a security concern and I believe that it is a  product decision. I note your feedback and will discuss it with the product team.

Can you provide a similar summary of the key security and privacy concerns you have with the protocol?
(In reply to comment 89)
> 1) Are you saying that if a notification is sent, you will want to be
> delivered next time you connect? Even if you connect 1 month later?

This is what we can guarantee with a Thialfi-like system, yes.  It does not
require that the servers keep one month's worth of state.  But it does require
that we don't send "messages" but instead send "notifications".

> Example: I have an app in Android that send me push messages with latest hot
> news. If I'm offline, and connect a month later, should I get the
> notification from 30 days back news?

In this case, you'd receive a notification one month later that your hot news
is at version X.  This is exactly what we want.  Note that if the app updates
the current hot news every 5 days, then it's possible under the current scheme
that we'd get no notification at all when we log in after one month.  So the
app developer must manually poll, causing all the problems I explained earlier.

> 3) Actually this API is only for webapps 'installed' on the phone (or in
> desktop gecko). But I'd love to see this more open to simple websites in the
> future.

At the moment, I would not be comfortable releasing the current API to
installed apps, for all the reasons I've laid out.  I would only be comfortable
releasing a modified version of the current API to a very small subset of apps
(i.e., certified-only).

(In reply to comment 90)
>  1/ Some of them do not have to do with security. Those items should
>  discussed at a product level and hence should not be part of this
>  discussion.

At Mozilla, we consider API and protocol design to be the realm of engineering.
It's totally fine if you guys view this differently and would like to discuss
this in a different forum, but this design is definitely a part of Mozilla's
review process, and will be reviewed by a technical reviewer, not someone
responsible for product at Mozilla.

>  2/ Security comments: Happy to discuss them, however some of them were
>  already raised during the Security Review that took place, for instance, E2E
>  Encryption was discussed and agreed that it was a responsibility of the app
>  developer.

As I've said a number of times in this bug, I don't think it's feasible to ask
application developers to do crypto.  This is because the Web does not have
battle-tested cryptography available to it.  The best we have at the moment is
the Stanford Crypto Library, which is completely un-tested in the wild.

Nowhere else on the web do we ask application developers to encrypt data before
it's sent over the wire.  What I'm afraid is going to happen is, application
developers are going to assume this data is secure (it's sent over "https") and
send private data in the clear.  Then users' private data will be readable by
the notification server.

Users will have no way to avoid this situation -- they won't even be aware that
this privacy leak is occurring.

> As this item was considered a P1 and part of a complete E2E solution, and
> reviews are that important, I would like to understand why this review did
> not have all the right stakeholders, and why it did not happen before given
> that the protocol was pretty much defined a couple of months ago. 

This item was a P2 in all of the Mozilla documents I saw.  (You can see in this
bug; it's marked as a "[soft]" blocker and "WebAPI:P3".)  The bug therefore
didn't get any attention from a peer until recently; we were all busy with
other priorities.

I'm really sorry that we didn't get to this sooner, but we simply didn't have
the manpower.  At some point, we have to make trade-offs.

> Can you provide a similar summary of the key security and privacy concerns
> you have with the protocol?

I have a number of concerns with this protocol.  They are all relevant to
Mozilla's technical review process.

 0. The current protocol is difficult-to-impossible to implement at scale,
    because the lack of fault-tolerance built into the protocol requires ultra
    high-availibility servers.  In distributed computing, it's well known that
    we trade off scale for availability; in this case, the protocol requires
    both.

 1. No guaranteed delivery (comment 88) leads to poor battery life and a poor
    user experience.

 2. Broadcast messages allow for easy DoS of the notification server.

 3. Any data that apps send over the notification service is readable by the
    notification server.  Web apps do not have a proven ability to do strong
    crypto; the existing libraries are untested.  Moreover, app developers are
    not used to encrypting data on the client side, so it will be easy for app
    developers to make a mistake and expose all their users' data to the
    notification server.

 4. Unnecessary crypto requirements on the app servers.  The message signatures
    buy us nothing and make it much more difficult to implement an app server.

Concern (3) is the main privacy concern, and I'd say that (0) and (2) are both
related to security, in the sense that we must secure notification servers from
denial-of-service attacks.
> What I'm afraid is going to happen is, application developers are going to
> assume this data is secure (it's sent over "https") and send private data in
> the clear.  Then users' private data will be readable by the notification
> server.

To elaborate on this just a bit: The principal at work here is that we should
design APIs so the obvious way of using the API is the same as the correct way
of using the API.  This is because, by definition, people will use the API the
obvious way.  If that's not the correct way to use the API, then we've failed
at our design.

In this case, the obvious thing to do is to send private information inside
notifications.  It even seems safe, because the notifications are sent over
https.  But the obvious thing to do is not the correct thing to do.  That's why
even if we had strong crypto available to web apps, this would be a problem.
Not to distract us from the larger issues here, but another smaller issue with the protocol is: I don't understand why we have explicit ack's built into the protocol.  The notifications are delivered to the client over TCP (via websockets).  Why aren't TCP's acks sufficient?
>
> --- Comment #92 from Justin Lebar [:jlebar] <justin.lebar+bug@gmail.com> 2012-09-20 08:18:01 PDT ---
> (In reply to comment 89)
>> 1) Are you saying that if a notification is sent, you will want to be
>> delivered next time you connect? Even if you connect 1 month later?
>
> This is what we can guarantee with a Thialfi-like system, yes.  It does not
> require that the servers keep one month's worth of state.  But it does require
> that we don't send "messages" but instead send "notifications".

No, it requires that we just send "change notifications", or "last state notification". Which I'm sure is appropriate for a good variety of applications, but it's more restricted than sending general notifications. And it does require apps that want to use the notification service to have a online presence. Sending raw data though? I don't need any online presence at all. Whenever the app developer can push something to my customers I can do it from anywhere, and the message will push it to the customer.

Oh, and about privacy... See that object identifier which happens to be a string on the protocol? I can totally violate your privacy in less than 32 bytes, and that's just a typical value, not an absolute maximum size.

Anyway, I thought that deciding what use cases we want to cover is a product decision, not an engineering one.

>
>> Example: I have an app in Android that send me push messages with latest hot
>> news. If I'm offline, and connect a month later, should I get the
>> notification from 30 days back news?
>
> In this case, you'd receive a notification one month later that your hot news
> is at version X.  This is exactly what we want.  Note that if the app updates
> the current hot news every 5 days, then it's possible under the current scheme
> that we'd get no notification at all when we log in after one month.  So the
> app developer must manually poll, causing all the problems I explained earlier.

No, what it will tell depends on how the app developer has defined the object id, and what you subscribed for, for starters. If he has defined the object ID as the 'new identifier' and he creates a new object for each and every new, then when you turn on your device you'll get a bazillion of out to date notifications for news that aren't new anymore, unless the developer has taken care of also deleting the old objects.

But that totally, and completely, depends on the app developer. The protocol doesn't impose limits on that. So if the developer doesn't delete his old objects? Welcome to the world of old news. Have fun.

>
>> 3) Actually this API is only for webapps 'installed' on the phone (or in
>> desktop gecko). But I'd love to see this more open to simple websites in the
>> future.
>
> At the moment, I would not be comfortable releasing the current API to
> installed apps, for all the reasons I've laid out.  I would only be comfortable
> releasing a modified version of the current API to a very small subset of apps
> (i.e., certified-only).

Which reasons, specifically? That some developers can send some private data they store in raw form and they'll be readable on our servers? That maybe the servers can go down and notifications will be backed up? Because I haven't seen a problem described, yet, that says to me: damn, I wouldn't want an untrusted app to have access to that!

Oh, BTW, what makes our proposal different to the one that Apple's running? I think they've sold a couple or three of those devices... ipons or something like that and they allow every application to use it :P

I guess nobody told them it was impossible to build such a service this way so.. they just went ahead and built it. 

>
> (In reply to comment 90)
>>  1/ Some of them do not have to do with security. Those items should
>>  discussed at a product level and hence should not be part of this
>>  discussion.
>
> At Mozilla, we consider API and protocol design to be the realm of engineering.
> It's totally fine if you guys view this differently and would like to discuss
> this in a different forum, but this design is definitely a part of Mozilla's
> review process, and will be reviewed by a technical reviewer, not someone
> responsible for product at Mozilla.
>
>>  2/ Security comments: Happy to discuss them, however some of them were
>>  already raised during the Security Review that took place, for instance, E2E
>>  Encryption was discussed and agreed that it was a responsibility of the app
>>  developer.
>
> As I've said a number of times in this bug, I don't think it's feasible to ask
> application developers to do crypto.  This is because the Web does not have
> battle-tested cryptography available to it.  The best we have at the moment is
> the Stanford Crypto Library, which is completely un-tested in the wild.
>
> Nowhere else on the web do we ask application developers to encrypt data before
> it's sent over the wire.  What I'm afraid is going to happen is, application
> developers are going to assume this data is secure (it's sent over "https") and
> send private data in the clear.  Then users' private data will be readable by
> the notification server.
>
> Users will have no way to avoid this situation -- they won't even be aware that
> this privacy leak is occurring.

Ok, I've told it before and I'll tell it again. Nobody's asking developers to use crypto. Nobody's telling them they *have* to send data. They're the ones that you trusted your data with in the first place, they should be the ones that can decide when it's adequate to send data on the notification: "Hey, check out the new items we added to our store" (again, it can look useless to you, it isn't to the developers and to some of their customers), or just to send a 'Your latest test results are available'

Know what? An astute developer could even implement an application so the system, as it is proposed, behaves Thialfi like for his application.

>
>> As this item was considered a P1 and part of a complete E2E solution, and
>> reviews are that important, I would like to understand why this review did
>> not have all the right stakeholders, and why it did not happen before given
>> that the protocol was pretty much defined a couple of months ago.
>
> This item was a P2 in all of the Mozilla documents I saw.  (You can see in this
> bug; it's marked as a "[soft]" blocker and "WebAPI:P3".)  The bug therefore
> didn't get any attention from a peer until recently; we were all busy with
> other priorities.
>
> I'm really sorry that we didn't get to this sooner, but we simply didn't have
> the manpower.  At some point, we have to make trade-offs.
>
>> Can you provide a similar summary of the key security and privacy concerns
>> you have with the protocol?
>
> I have a number of concerns with this protocol.  They are all relevant to
> Mozilla's technical review process.
>
>  0. The current protocol is difficult-to-impossible to implement at scale,
>     because the lack of fault-tolerance built into the protocol requires ultra
>     high-availibility servers.  In distributed computing, it's well known that
>     we trade off scale for availability; in this case, the protocol requires
>     both.

I think the fellows that implemented it are best suited to answer this. I still don't see how this is a security or privacy problem though. 

>
>  1. No guaranteed delivery (comment 88) leads to poor battery life and a poor
>     user experience.

Again, not what I had understood when they explained to me the protocol (in fact one of the main design goals was to reduce network load) but again, the developers are best suited to answer this.

>
>  2. Broadcast messages allow for easy DoS of the notification server.

Er how exactly? To do an effective DoS, an attacker would have to:

1. Implement an application, of course.
2. Convince a suitable number of users to install it, and to activate notifications for it.
3. Hope for the best, and hope that his users are mostly online at the same time
4. And then, BAM! he can send a (signed, so he can incriminate himself duly) broadcast message to our servers. Which will start sending the wakeups to the devices, which will try to download it.

Taking into account the horizontal scalability of servers, it *might* be possible to do a DDoS on them this way... but I wouldn't call it easy, precisely.

>
>  3. Any data that apps send over the notification service is readable by the
>     notification server.  Web apps do not have a proven ability to do strong
>     crypto; the existing libraries are untested.  Moreover, app developers are
>     not used to encrypting data on the client side, so it will be easy for app
>     developers to make a mistake and expose all their users' data to the
>     notification server.

Yep, and this has an easy easy solution for the astute developer... if your data is confidential, don't send it over the notification service! And there, problem solved all by itself!

Meanwhile, if you data isn't confidential, by all means feel free to send it.

>
>  4. Unnecessary crypto requirements on the app servers.  The message signatures
>     buy us nothing and make it much more difficult to implement an app server.

The message signatures buy us:

* A defense against the 'easy DoS'
* A defense against fake message posting
* A way to ensure that only the authorized poster can post messages to a registered device

We cannot use symmetric crypto for this because the whole point behind this idea was to NOT having to share any secret between the application developer and the push server. Specially considering that the secret would have to go through the client device.

Thinking about this, though, you're right in one thing. Signatures are not really needed. We can change it to use SSL with client certificate, and changing the protocol so instead of posting the public key on registration the whole certificate is posted. That would simplify development complexity for the application developer (assuming he has access to a half decent SSL library and know how to generate self-signed certificates) and would give the same level of protection for all practical senses.

This change we can do, if you agree to it.

>
> Concern (3) is the main privacy concern, and I'd say that (0) and (2) are both
> related to security, in the sense that we must secure notification servers from
> denial-of-service attacks.

Concern 3 is completely out of scope of the protocol. We don't mind, don't care which data passes through us. That's the app developer problem, he's responsible of the data, not us.

By the way, this point DID come out on the Security Review, it was argued and in the end the agreement was that the developer can choose to not send any data.

Concern 2... I still don't see the 'easy DoS'. It's an easy as to build a million device botnet from zero. So not that much.
Just in general, this comment feels really antagonistic to me.

I'm really, honestly sorry if I've antagonized anyone here; that's not my goal.
Reviews at Mozilla are a collaborative process; the goal isn't to convince the
reviewer to r+ your patch as-is, but to work with the reviewer to improve your
code.  I hope we can work together here in that spirit.

> > (In reply to comment 89)
> >> 1) Are you saying that if a notification is sent, you will want to be
> >> delivered next time you connect? Even if you connect 1 month later?
> >
> > This is what we can guarantee with a Thialfi-like system, yes.  It does not
> > require that the servers keep one month's worth of state.  But it does require
> > that we don't send "messages" but instead send "notifications".
> 
> No, it requires that we just send "change notifications", or "last state
> notification". Which I'm sure is appropriate for a good variety of
> applications, but it's more restricted than sending general notifications.

I'll refer you to what the Thialfi authors said about this.  In section 8,
"Lessons Learned", they write:

  Our choice to provide applications with only a notification signal was
  contentious.  In particular, developers have almost universally asked for
  richer features than Thialfi provides: e.g., support for data delivery [...].
  We have avoided these features, however, as they would significantly
  complicate both our implementation and API.  Moreover, we have encountered
  few applications with a fundamental need for them.

> And it does require apps that want to use the notification service to have a
> online presence. Sending raw data though? I don't need any online presence at
> all. Whenever the app developer can push something to my customers I can do
> it from anywhere, and the message will push it to the customer.

We're building a web API here.  I don't think we should optimize for the case
when the web app does not have an online presence, particularly when doing so
leads us to make design decisions which make life difficult for the
overwhelmingly-common case where the app does have an online presence.

General push messages preclude guaranteed eventual delivery, because servers
must be willing to drop messages after they reach capacity.  Guaranteed
delivery makes life much easier for applications, and is much better for our
users, as I explained in comment 88.

> Anyway, I thought that deciding what use cases we want to cover is a product
> decision, not an engineering one.

At Mozilla, we don't have a hard division of labor like this.  Instead, product
and engineering work together to understand the issues at hand.  I'm not
suggesting that I have the final say on what this protocol provides, but I do
think that the technical limitations of the current protocol should inform our
decision here.

Additionally, product at Mozilla doesn't usually design technical use-cases for
technical protocols; instead, they are more focused on user-facing use-cases
and requirements.  We try not to be particularly territorial here at Mozilla,
but this protocol is a Web API so it technically falls under the purview of
Jonas and the WebAPI team, not any particular product.  (Again, that doesn't
mean that product has no say here, but it means the lines are fuzzy.)

> >> Example: I have an app in Android that send me push messages with latest hot
> >> news. If I'm offline, and connect a month later, should I get the
> >> notification from 30 days back news?
> >
> > In this case, you'd receive a notification one month later that your hot news
> > is at version X.  This is exactly what we want.  Note that if the app updates
> > the current hot news every 5 days, then it's possible under the current scheme
> > that we'd get no notification at all when we log in after one month.  So the
> > app developer must manually poll, causing all the problems I explained earlier.
> 
> No, what it will tell depends on how the app developer has defined the object
> id, and what you subscribed for, for starters. If he has defined the object
> ID as the 'new identifier' and he creates a new object for each and every
> new, then when you turn on your device you'll get a bazillion of out to date
> notifications for news that aren't new anymore, unless the developer has
> taken care of also deleting the old objects.

Two things here:

First, I didn't see anything in the notification server <-> third party section
of the protocol attached to this bug about deleting objects.  Perhaps that was
elided?

Second, I understood from earlier that undelivered notifications are deleted
after 4 days.  (Comment 79, #19.)  What you're saying here doesn't seem
consistent with that, if I understand you correctly.  Which is correct?

> I haven't seen a problem described, yet, that says to me: damn, I wouldn't
> want an untrusted app to have access to that!
>
> Oh, BTW, what makes our proposal different to the one that Apple's running? I
> think they've sold a couple or three of those devices... ipons or something
> like that and they allow every application to use it :P

I've described five problems with this protocol; see the end of comment 92 and
comment 94.  But you ask a great question: Why should the existance of problems
mean that we shouldn't expose this API to a wider audience?

The major difference between us and Apple in this respect is that Apple is free
to change their API at will.  Every year they can (and do) make incompatible
changes to the API developers write against.  And if you misuse an API, Apple
is free to deny you entrance to the app store, or remove an existing app from
the store.

On the Web, we have no such luxury.  We must get it right the first time,
because we're basically stuck with it forever after people start using it.
That's why we strive to reach such a high level of quality for the APIs we
release to a wide audience.

> > (In reply to comment 90)
> > As I've said a number of times in this bug, I don't think it's feasible to ask
> > application developers to do crypto.
> 
> Ok, I've told it before and I'll tell it again. Nobody's asking developers to
> use crypto. Nobody's telling them they *have* to send data. They're the ones
> that you trusted your data with in the first place, they should be the ones
> that can decide when it's adequate to send data on the notification

As I explained in comment 93, the obvious way to use the current API is not the
right way to use the API, so web developers will make mistakes.  We should not
design an API with obvious privacy footguns.

If we could go through and audit every use of this API, then sure, this isn't a
problem.  Indeed, if we expose this only to certified apps, that's exactly what
we'd do!

> >  0. The current protocol is difficult-to-impossible to implement at scale,
> >     because the lack of fault-tolerance built into the protocol requires ultra
> >     high-availibility servers.  In distributed computing, it's well known that
> >     we trade off scale for availability; in this case, the protocol requires
> >     both.
> 
> I think the fellows that implemented it are best suited to answer this. I
> still don't see how this is a security or privacy problem though. 

In comment 69 Fernando said of the system that's being designed:

  We are designing it to be HA, scalable, etc... so in theory is 100%
  (99.99999% reliable)

That's great, but server infrastructure can only get you so far.  In my
experience, seven-nines reliability is pretty unrealistic even for
infrastructure much more critical than this.  My point is that the protocol can
be designed to tolerate failures gracefully, but if we don't design it in now,
we'll have to start over basically from scratch when we go to re-design it,
because how we handle failures is baked deep into the protocol.

> >  2. Broadcast messages allow for easy DoS of the notification server.
> 
> Er how exactly? To do an effective DoS, an attacker would have to:
> 
> 1. Implement an application, of course.
> 2. Convince a suitable number of users to install it, and to activate notifications for it.
> 3. Hope for the best, and hope that his users are mostly online at the same time
> 4. And then, BAM! he can send a (signed, so he can incriminate himself duly) broadcast message to our servers. Which will start 
sending the wakeups to the devices, which will try to download it.

Yes, this is basically right, except that with respect to (3), the users don't
really need to be online at the same time to cause problems, because the
notification servers must keep a record of which users have not ack'ed which
messages.  And the notification server must keep this record around for at
least 4 days (perhaps indefinitely; it's not clear to me any longer what the
contract is).

Offhand, I'd say this "storage attack" may be a bigger vector for DoS against
the notification servers than a "network attack", since you can always throttle
the rate of sending notifications, but throttling the storage means reneging on
the notification server's contract.

(Note that Thialfi supports broadcast messages without this overflowing-storage
problem.)

Note that the application developers need not be malicious.  An attacker could
gain control over a popular application's servers and send an effective DoS
with just a few thousand global notifications.  Or perhaps more plausibly, an
application developer could accidentally set off a DoS by sending out a few
thousand notifications.

> >  3. Any data that apps send over the notification service is readable by the
> >     notification server.  Web apps do not have a proven ability to do strong
> >     crypto; the existing libraries are untested.  Moreover, app developers are
> >     not used to encrypting data on the client side, so it will be easy for app
> >     developers to make a mistake and expose all their users' data to the
> >     notification server.
> 
> Yep, and this has an easy easy solution for the astute developer... if your
> data is confidential, don't send it over the notification service! And there,
> problem solved all by itself!

As I've explained, the problem is the assumption that developers will be
astute.  Experience writing web APIs suggests that developers are not uniformly
astute.

That's not a big problem if a bad web developer causes her page to render
poorly under some conditions.  It's a much bigger problem if a bad web
developer leaks her users' private data.

We should not design APIs where it's easy for a developer of average
astuteness to leak private data.

> >  4. Unnecessary crypto requirements on the app servers.  The message signatures
> >     buy us nothing and make it much more difficult to implement an app server.
> 
> The message signatures buy us:
> 
> * A defense against the 'easy DoS'
> * A defense against fake message posting
> * A way to ensure that only the authorized poster can post messages to a registered device

It seems to me that we gain all of these by simply making the URL that you use
to post to a given client a secret.

The client itself may see this URL, but it's not a problem that a malicious
client can post messages to itself.

Of course, this does pose some difficulty with respect to broadcast URLs, but
you already know how I feel about those.  :)

Maybe we could agree to implement push notifications now without broadcast, and
reconsider this decision in V2.

> We cannot use symmetric crypto for this [...]

Agreed, we cannot use symmetric crypto.

> Thinking about this, though, you're right in one thing. Signatures are not
> really needed. We can change it to use SSL with client certificate, and
> changing the protocol so instead of posting the public key on registration
> the whole certificate is posted. That would simplify development complexity
> for the application developer (assuming he has access to a half decent SSL
> library and know how to generate self-signed certificates) and would give the
> same level of protection for all practical senses.

I like the idea here, but I don't think this is a big improvement, because now
app developers have to figure out how to generate client certs and serialize
and pass their certificate to the client.

Better yet would be to have the developer provide a client cert when posting
messages to the notification server which proves that the dev owns
http://my.app.com/.  Then you wouldn't have to do anything on the client side.
I don't know if such certs are readily available.

But again, simply using a secret URL for posting messages to the client seems
much, much simpler.  It's also less crypto for the server to do, which makes it
easier for the servers to be fast.

> By the way, this point DID come out on the Security Review, it was argued and
> in the end the agreement was that the developer can choose to not send any
> data.

It's our mistake that I was asked to review this code but wasn't invited to the
security review, I agree.
(In reply to Justin Lebar [:jlebar] from comment #96)
> Just in general, this comment feels really antagonistic to me.
> 
> I'm really, honestly sorry if I've antagonized anyone here; that's not my
> goal.
> Reviews at Mozilla are a collaborative process; the goal isn't to convince
> the
> reviewer to r+ your patch as-is, but to work with the reviewer to improve
> your
> code.  I hope we can work together here in that spirit.

I'm sorry if I did come as antagonist, I tend to get carried away on arguments.

> 
> > > (In reply to comment 89)
> > >> 1) Are you saying that if a notification is sent, you will want to be
> > >> delivered next time you connect? Even if you connect 1 month later?
> > >
> 
> General push messages preclude guaranteed eventual delivery, because servers
> must be willing to drop messages after they reach capacity.  Guaranteed
> delivery makes life much easier for applications, and is much better for our
> users, as I explained in comment 88.

Ok, I think we're arguing in circles here. You have your opinion, Google has its opinion, Apple has its opinion, and I have mine. We can go for either a version control notification system (Thialfi) or a message notification system (which *can* work as the other actually). But since the cases covered are not the same (as the developers asking Google prove) I don't think that's an engineering question. 

We say we can provide a good enough notification system. A system with better uptime than 99% of the actual applications that are going to be using it. Again, I think Fernando and Willy can give further details on this, but from what I remember the system was designed so it could scale quite easily. 

> > Anyway, I thought that deciding what use cases we want to cover is a product
> > decision, not an engineering one.
> 
> At Mozilla, we don't have a hard division of labor like this.  Instead,
> product
> and engineering work together to understand the issues at hand.  I'm not
> suggesting that I have the final say on what this protocol provides, but I do
> think that the technical limitations of the current protocol should inform
> our
> decision here.

But the technical limitations are, and sorry if this comes out as antagonistic, your point of view. Apple is using the kind of system you say can't or won't work. And I for one don't find the user experience terrible. I get the Whatsapp notifications almost the moment I send a message, the kids get their stupid game notifications, the temperature app I use gets it's temperature notifications... it just work.

> > > In this case, you'd receive a notification one month later that your hot news
> > > is at version X.  This is exactly what we want.  Note that if the app updates
> > > the current hot news every 5 days, then it's possible under the current scheme
> > > that we'd get no notification at all when we log in after one month.  So the
> > > app developer must manually poll, causing all the problems I explained earlier.
> > 
> > No, what it will tell depends on how the app developer has defined the object
> > id, and what you subscribed for, for starters. If he has defined the object
> > ID as the 'new identifier' and he creates a new object for each and every
> > new, then when you turn on your device you'll get a bazillion of out to date
> > notifications for news that aren't new anymore, unless the developer has
> > taken care of also deleting the old objects.
> 
> Two things here:
> 
> First, I didn't see anything in the notification server <-> third party
> section
> of the protocol attached to this bug about deleting objects.  Perhaps that
> was
> elided?
> 
> Second, I understood from earlier that undelivered notifications are deleted
> after 4 days.  (Comment 79, #19.)  What you're saying here doesn't seem
> consistent with that, if I understand you correctly.  Which is correct?

I'm sorry I didn't come out clear. I wasn't explaining how the proposed protocol would work. I was telling how a Thialfi server could work in the example case Willy proposed. You said you would get a notification telling your hot news is at version X... and I said that what you get depends heavily on what the app developer define as objects. So you could very well end getting notifications for all the (not so new anymore) news.


> 
> > I haven't seen a problem described, yet, that says to me: damn, I wouldn't
> > want an untrusted app to have access to that!
> >
> > Oh, BTW, what makes our proposal different to the one that Apple's running? I
> > think they've sold a couple or three of those devices... ipons or something
> > like that and they allow every application to use it :P
> 
> I've described five problems with this protocol; see the end of comment 92
> and
> comment 94.  But you ask a great question: Why should the existance of
> problems
> mean that we shouldn't expose this API to a wider audience?
> 
> The major difference between us and Apple in this respect is that Apple is
> free
> to change their API at will.  Every year they can (and do) make incompatible
> changes to the API developers write against.  And if you misuse an API, Apple
> is free to deny you entrance to the app store, or remove an existing app from
> the store.
> 
> On the Web, we have no such luxury.  We must get it right the first time,
> because we're basically stuck with it forever after people start using it.
> That's why we strive to reach such a high level of quality for the APIs we
> release to a wide audience.

That's great. Still even your APIs change and are alive, as they should be. Why should this one be different? Apple could go to a Thialfi system tomorrow... but the fact is that they've been using this system for 3 years now, they have what I would call a healthy number of users and they don't seem to have any apparent problems. Why would this one be different?

> > Ok, I've told it before and I'll tell it again. Nobody's asking developers to
> > use crypto. Nobody's telling them they *have* to send data. They're the ones
> > that you trusted your data with in the first place, they should be the ones
> > that can decide when it's adequate to send data on the notification
> 
> As I explained in comment 93, the obvious way to use the current API is not
> the
> right way to use the API, so web developers will make mistakes.  We should
> not
> design an API with obvious privacy footguns.
> 
> If we could go through and audit every use of this API, then sure, this
> isn't a
> problem.  Indeed, if we expose this only to certified apps, that's exactly
> what
> we'd do!

Again... I don't care what app developers send through the API. That's not and should not be the API problem. That's the developer problem. Because the easy way to send data to their apps would be for their apps to open an HTTP (why bother with https, http is easier) connection to their server and download confidential data to their heart's wish. 

Well, in fact that's what Whatsapp was doing till not too long ago, come to think about it. Do you know what the solution is? If you don't trust the app developer to treat your data with respect, don't use the app. If you don't like the way Whatsapp treat messages don't use it (and be out of touch with all your friends that apparently don't give a flying s*t about that and only use Whatsapp to talk, but that's another problem).

In any case, that's NOT the push system problem. And even on a Thialfi type system, as I said, an incompetent developer can still leak private information to their heart wish. Using the protocol the easy way too. 

Again, I know we have differing views on this... I believe that since apps already have the data they should know what to do with it, and you don't. 


> 
> > >  0. The current protocol is difficult-to-impossible to implement at scale,
> > >     because the lack of fault-tolerance built into the protocol requires ultra
> > >     high-availibility servers.  In distributed computing, it's well known that
> > >     we trade off scale for availability; in this case, the protocol requires
> > >     both.
> > 
> > I think the fellows that implemented it are best suited to answer this. I
> > still don't see how this is a security or privacy problem though. 
> 
> In comment 69 Fernando said of the system that's being designed:
> 
>   We are designing it to be HA, scalable, etc... so in theory is 100%
>   (99.99999% reliable)
> 
> That's great, but server infrastructure can only get you so far.  In my
> experience, seven-nines reliability is pretty unrealistic even for
> infrastructure much more critical than this.  My point is that the protocol
> can
> be designed to tolerate failures gracefully, but if we don't design it in
> now,
> we'll have to start over basically from scratch when we go to re-design it,
> because how we handle failures is baked deep into the protocol.

Once again, I could agree with you hadn't it been already done, and hadn't it been working for the 200-pound Gorilla for 3 years already. It works for Apple, it will work here. 


> 
> > >  2. Broadcast messages allow for easy DoS of the notification server.
> > 
> > Er how exactly? To do an effective DoS, an attacker would have to:
> > 
> > 1. Implement an application, of course.
> > 2. Convince a suitable number of users to install it, and to activate notifications for it.
> > 3. Hope for the best, and hope that his users are mostly online at the same time
> > 4. And then, BAM! he can send a (signed, so he can incriminate himself duly) broadcast message to our servers. Which will start 
> sending the wakeups to the devices, which will try to download it.
> 
> Yes, this is basically right, except that with respect to (3), the users
> don't
> really need to be online at the same time to cause problems, because the
> notification servers must keep a record of which users have not ack'ed which
> messages.  And the notification server must keep this record around for at
> least 4 days (perhaps indefinitely; it's not clear to me any longer what the
> contract is).

Still 4 days, as I said, I was talking about Thialfi before, not this system. 

> 
> Offhand, I'd say this "storage attack" may be a bigger vector for DoS against
> the notification servers than a "network attack", since you can always
> throttle
> the rate of sending notifications, but throttling the storage means reneging
> on
> the notification server's contract.

Storage is cheap. And offhand I would say that broadcast messages get stored only once. Still, I defer on Fernando and Willy on that. But... if the way we store messages become a problem that can be solved by adding more disk to it at any point, we can tweak how we treat notifications. 

As for having lots of users (and again I defer to Fernando and Willy) I seem to remember the backend system being designed with horizontal scalability in mind, so different users will be served by different servers.

> 
> (Note that Thialfi supports broadcast messages without this
> overflowing-storage problem.)

Yeah, cause they only send version control information. Which is great for some use cases as I said, just don't cover all the use cases the general notification system does. 

> 
> Note that the application developers need not be malicious.  An attacker
> could
> gain control over a popular application's servers and send an effective DoS
> with just a few thousand global notifications.  Or perhaps more plausibly, an
> application developer could accidentally set off a DoS by sending out a few
> thousand notifications.

Again, I defer on Fernando and Willy on the specific backend implementation details... but again, this can be solved with backend design (and I think it's been taken care of already, but we can tweak that if necessary). 

> 
> > >  3. Any data that apps send over the notification service is readable by the
> > >     notification server.  Web apps do not have a proven ability to do strong
> > >     crypto; the existing libraries are untested.  Moreover, app developers are
> > >     not used to encrypting data on the client side, so it will be easy for app
> > >     developers to make a mistake and expose all their users' data to the
> > >     notification server.
> > 
> > Yep, and this has an easy easy solution for the astute developer... if your
> > data is confidential, don't send it over the notification service! And there,
> > problem solved all by itself!
> 
> As I've explained, the problem is the assumption that developers will be
> astute.  Experience writing web APIs suggests that developers are not
> uniformly astute.

I was being ironic there, sorry :) You don't have to be very astute in fact to realize that, if your data is privileged, maybe you should not send them over third party systems. In fact, with our privacy laws, if you do that you would be infringing on the data protection law. And by you I mean the developer, not the push server. 

> 
> That's not a big problem if a bad web developer causes her page to render
> poorly under some conditions.  It's a much bigger problem if a bad web
> developer leaks her users' private data.
> 
> We should not design APIs where it's easy for a developer of average
> astuteness to leak private data.

You know, maybe you should not give your data to a developer of average astuteness in the first place :)

Oh, and 'leak' in this case means the data will be stored temporarily on our servers... the same way all SMSes are stored (more than temporarily) on the Telco servers... It doesn't mean leak as in 'make publicly available'. And yes, I know, since the data isn't ours we should not be able to see it. But again, easiest solution: don't send the data.

> 
> > >  4. Unnecessary crypto requirements on the app servers.  The message signatures
> > >     buy us nothing and make it much more difficult to implement an app server.
> > 
> > The message signatures buy us:
> > 
> > * A defense against the 'easy DoS'
> > * A defense against fake message posting
> > * A way to ensure that only the authorized poster can post messages to a registered device
> 
> It seems to me that we gain all of these by simply making the URL that you
> use
> to post to a given client a secret.
> 
> The client itself may see this URL, but it's not a problem that a malicious
> client can post messages to itself.

The URL is a secret, or should be, but it doesn't have to be. And the URL is sent to the app server/developer by the client app, over the protocol the app developer chooses (which might very well be HTTP). This way, no matter the way the URL is sent back to the server, a compromise of the URL doesn't allow posting of messages.

> 
> Of course, this does pose some difficulty with respect to broadcast URLs, but
> you already know how I feel about those.  :)

I'm starting to notice, yeah ;)

> 
> Maybe we could agree to implement push notifications now without broadcast,
> and reconsider this decision in V2.
> 
> > We cannot use symmetric crypto for this [...]
> 
> Agreed, we cannot use symmetric crypto.
> 
> > Thinking about this, though, you're right in one thing. Signatures are not
> > really needed. We can change it to use SSL with client certificate, and
> > changing the protocol so instead of posting the public key on registration
> > the whole certificate is posted. That would simplify development complexity
> > for the application developer (assuming he has access to a half decent SSL
> > library and know how to generate self-signed certificates) and would give the
> > same level of protection for all practical senses.
> 
> I like the idea here, but I don't think this is a big improvement, because
> now app developers have to figure out how to generate client certs and 
> serialize and pass their certificate to the client.

Steps to generate a self signed certificate from zero knowledge:

1. Look up 'generate self signed certificate' on Google
2. Hit the first link
3. Follow the very detailed instructions. Copy and paste the commands...

Total time: about 1 minute if you already have openssl. About 5 if you don't.

Frankly, if this procedure is a problem for an average developer, maybe he should be doing other things ;)

> 
> Better yet would be to have the developer provide a client cert when posting
> messages to the notification server which proves that the dev owns
> http://my.app.com/.  Then you wouldn't have to do anything on the client
> side.
> I don't know if such certs are readily available.

Er I think this complicates and make things more expensive. Self signed certificates are easy to do (even not knowing what a certificate is at all) and best of all, they're free.

> 
> But again, simply using a secret URL for posting messages to the client seems
> much, much simpler.  It's also less crypto for the server to do, which makes
> it easier for the servers to be fast.

Actually, using client cert to authenticate versus not using it doesn't increase much the crypto charge on a server, assuming you're using SSL anyway.
>Ok, I think we're arguing in circles here. You have your opinion, Google has
>its opinion, Apple has its opinion, and I have mine. We can go for either a
>version control notification system (Thialfi) or a message notification system
>(which *can* work as the other actually). But since the cases covered are not
>the same (as the developers asking Google prove) I don't think that's an
>engineering question. 

As I've explained, there are strong engineering benefits to a version
notification system.  In particular:

 - A version notification system is not susceptable to certain kinds of DoS
   attacks that a message system is.

 - A version notification system requires far less state on the notification
   server.  All of this state can be made "soft" -- that is, it does not affect
   correct functioning of the protocol if the server loses its state.

 - A version notification system can guarantee eventual delivery of messages
   even when the server loses state.

   In contrast, a server fault in a messaging system leads to missed
   notifications.  This requires applications to poll.  I've explained how this
   is bad for us, bad for users, and bad for application developers.

At Mozilla, it's the reviewer's perogative to accept or reject a patch.  When
we disagree, the burden lies on you to convince me, or, if you're not happy
with my decision, to convince an owner of the relevant module.  In this case,
the owner would probably be Jonas.

So far, you haven't addressed these points; instead, the only argument I've
heard is that you like a protocol which allows you to send messages.  As the
section I quoted from the paper suggests, you're in good company.  But that's
not a reason to ignore the sound design advice we have here.

>> First, I didn't see anything in the notification server <-> third party
>> section of the protocol attached to this bug about deleting objects.
>> Perhaps that was elided?

You didn't respond to this part.  Is deleting messages part of the protocol, or
not?

>> Second, I understood from earlier that undelivered notifications are deleted
>> after 4 days.  (Comment 79, #19.)  What you're saying here doesn't seem
>> consistent with that, if I understand you correctly.  Which is correct?
>
>I'm sorry I didn't come out clear. I wasn't explaining how the proposed
>protocol would work. I was telling how a Thialfi server could work in the
>example case Willy proposed. You said you would get a notification telling
>your hot news is at version X... and I said that what you get depends heavily
>on what the app developer define as objects. So you could very well end
>getting notifications for all the (not so new anymore) news.

I don't think you understand how the system I'm proposing would work.  If you
haven't, I highly recommend you read this paper, which might clear up some
misconceptions you may have.

A client can't get a notification for a "topic" ("object", in Thialfi-speak)
that the client hasn't subscribed to.

So the app developer cannot create one "object" for each news item; it wouldn't
make any sense.  But suppose we did it anyway.  Suppose I turned off my phone
for a while.  Then the app developer creates a bunch of objects, one for each
news item.  Then I turn on my phone.  I'm not subscribed to any of those new
objects, so I won't hear about them.

Again, the point of my comment is that the current proposal cannot guarantee
eventual delivery of messages, while the system I'm proposing can.  I've
explained previously why this is so important, and you haven't responded to
this point.

>Even your APIs change and are alive, as they should be.  Why should this one
>be different?

This API is no different from any other API in this respect -- we would be
extremely hesitant to release to the Web any API with numerous outstanding
issues, if we thought fixing those issues would require
non-backwards-compatible changes to that API.

Sure, sometimes we have to make those changes, and it sucks.  But we always do
our best to provide what we think is a good API before we release it to the
wolves^Wpublic.

>> If we could go through and audit every use of this API, then sure, this
>> isn't a
>> problem.  Indeed, if we expose this only to certified apps, that's exactly
>> what
>> we'd do!
>
>Again... I don't care what app developers send through the API. [...] That's
>the developer's problem.

You don't share my concern about this; okay, that's fine.  But again, your
burden here is to convince me and/or Jonas.  This is not a convincing argument.
In fact, it's not an argument at all.

I've explained that I think APIs should be designed in accordance with the
principal that the obvious way to use the API should be the same as the correct
way to use the API.  I've explained that this is particularly important when a
mistake means that user privacy is violated.  You haven't explained why you
think we should ignore this principal in this case, except by writing off the
whole principal altogether as "the developer's problem."

The thing is, it's the /user's/ problem if their private data is exposed.  At
Mozilla, we stick up for users first.

> Because the easy way to send data to their apps would be for their apps to
> open an HTTP (why bother with https, http is easier) connection to their
> server and download confidential data to their heart's wish. [...] If you
> don't trust the app developer to treat your data with respect, don't use the
> app.

Just because there are ways to intentionally or otherwise violate user privacy
doesn't mean that we should make it easy to accidentally expose user data in a
new way.

>> That's great, but server infrastructure can only get you so far.  In my
>> experience, seven-nines reliability is pretty unrealistic even for
>> infrastructure much more critical than this.  My point is that the protocol
>> can be designed to tolerate failures gracefully, but if we don't design it
>> in now, we'll have to start over basically from scratch when we go to
>> re-design it, because how we handle failures is baked deep into the
>> protocol.
>
>Once again, I could agree with you hadn't it been already done, and hadn't it
>been working for the 200-pound Gorilla for 3 years already. It works for
>Apple, it will work here. 

I just looked up Apple's push notifications [1].  It's actually different from
the system you've proposed here in a number of critical ways.

The most important difference is that the system "retains only one notification
per application on a device: the last notification received from a provider for
that application."

This makes Apple's system largely equivalent to Thialfi.  The main difference
is that Apple allows you to specify an arbitrary message, while Thialfi only
allows you to specify a version number.  The reason that I think we should go
with the latter is because it's much harder to leak private data with a version
number; obviously, that's not a concern of Apple's.

Note that Apple also has a strict limit on the payload size: "The maximum size
allowed for a notification payload is 256 bytes."

>> Offhand, I'd say this "storage attack" may be a bigger vector for DoS against
>> the notification servers than a "network attack", since you can always
>> throttle
>> the rate of sending notifications, but throttling the storage means reneging
>> on
>> the notification server's contract.
>
>Storage is cheap. And offhand I would say that broadcast messages get stored only once.

As I said, the issue isn't storing the broadcast message, but rather storing
the arbitrarily-long list of users who have not yet received this broadcast
message.

As anyone who has designed a realtime distributed system will tell you, storage
is cheap, but fast storage is precious.  Hard disks aren't particularly useful
if you can't read and write data off them quickly enough.

If you stored your broadcast bitmap on disk, we're talking one random write to
a disk for each message delivered.  At 4ms per random write, that's 70 minutes
of disk time to write 1 million messages.  That's just one broadcast message.

>> (Note that Thialfi supports broadcast messages without this
>> overflowing-storage problem.)
>
>Yeah, cause they only send version control information. Which is great for
>some use cases as I said, just don't cover all the use cases the general
>notification system does. 

You need to convince me that there's a use-case which version-notification does
not solve.  So far, the only use-case you've proposed is one in which the
application is not hosted on the web.  That is not at all important.

Indeed, a version-notification system is equivalent to a message-notification
system in that a client can, upon discovering that it has an old version, ask
the application server what messages it has missed.  Thus, a
version-notification system is no less powerful than a message-notification
system, assuming that the app has an online presence.

>Again, I defer on Fernando and Willy on the specific backend implementation
>details... but again, this can be solved with backend design (and I think it's
>been taken care of already, but we can tweak that if necessary). 

If I can use Apple as an example: Given that Apple designed their protocol
specifically to avoid this problem of queueing messages, I think you
underestimate the difficulty of this problem.

>> > >  4. Unnecessary crypto requirements on the app servers.  The message signatures
>> > >     buy us nothing and make it much more difficult to implement an app server.
>> > 
>> > The message signatures buy us:
>> > 
>> > * A defense against the 'easy DoS'
>> > * A defense against fake message posting
>> > * A way to ensure that only the authorized poster can post messages to a registered device
>> 
>> It seems to me that we gain all of these by simply making the URL that you
>> use
>> to post to a given client a secret.
>> 
>> The client itself may see this URL, but it's not a problem that a malicious
>> client can post messages to itself.

>And the URL is sent to the app server/developer by the client app, over the
>protocol the app developer chooses (which might very well be HTTP). This way,
>no matter the way the URL is sent back to the server, a compromise of the URL
>doesn't allow posting of messages.

There's a trade-off here.

a. On the one hand, we can require that messages be signed.  This is difficult
   for application servers, but it means that we never send our secret over the
   wire, so it's less likely to be compromised.

b. On the other hand, we can require that the URL be secret.  This is easier
   for application servers, but increases the chance that a URL will be sent
   over an insecure channel and fall into the wrong hands.

I think (b) is probably the right trade-off here.  Application developers are
already familiar with the idea of a secret token that shouldn't be sent over
HTTP; we have https-only cookies, for example.

I'll note that your argument here is basically "let's design the system so that
it's hard for an app to make a mistake".  That is /precisely/ the argument I've
used earlier and that you've written off as unimportant to API design!

>Steps to generate a self signed certificate from zero knowledge:
>
>1. Look up 'generate self signed certificate' on Google
>2. Hit the first link
>3. Follow the very detailed instructions. Copy and paste the commands...
>
>Total time: about 1 minute if you already have openssl. About 5 if you don't.

And if you're not running Unix and don't have a copy of IIS sitting around?  Yeah...

And then I guess the server needs to store this cert/public key and look it up
every time it receives a message?  And if the server loses this state, we're
totally screwed, right?  That means that you need to back this data up within
your data center; the certs need to be stored on multiple hard drives, etc.
Again, the Thialfi system is built so that the server can lose any state and we
still guarantee eventual delivery of messages.

Also, the application developer has to figure out how to generate signatures
using this cert, which yet another piece of complexity.

Compare this complexity to simply POST'ing to a URL, which has none of these
problems.

Separately, I'm still curious about those ACKs in the protocol; the explicit
ack seems redundant with TCP's ack to me, but I was wondering if I was
missing something.

[1] http://developer.apple.com/library/mac/#documentation/NetworkingInternet/Conceptual/RemoteNotificationsPG/ApplePushService/ApplePushService.html
(In reply to Justin Lebar [:jlebar] from comment #98)
> 
> Separately, I'm still curious about those ACKs in the protocol; the explicit
> ack seems redundant with TCP's ack to me, but I was wondering if I was
> missing something.
> 

Well, as you point, TCP assures the delivery of the messages into the Gecko client but no to the web app, so we need a second ACK at application level to assure the delivery to the Web Application level.

In other words, Gecko SHALL send the ACK messages as soon as the message had been delivered to the target application so the message has been delivered End 2 End.
Leaving aside (aside but not forgotten :) ) the rest for now...

(In reply to Justin Lebar [:jlebar] from comment #98)
> I'll note that your argument here is basically "let's design the system so
> that
> it's hard for an app to make a mistake".  That is /precisely/ the argument
> I've
> used earlier and that you've written off as unimportant to API design!

No, in fact this is the "let's design the system so it's hard to be abused" argument.

> 
> >Steps to generate a self signed certificate from zero knowledge:
> >
> >1. Look up 'generate self signed certificate' on Google
> >2. Hit the first link
> >3. Follow the very detailed instructions. Copy and paste the commands...
> >
> >Total time: about 1 minute if you already have openssl. About 5 if you don't.
> 
> And if you're not running Unix and don't have a copy of IIS sitting around? 
> Yeah...

I run Windows as my main machine. I don't have IIS installed, nor did I have OpenSSL right now. Modified steps:

> >1. Look up 'generate self signed certificate' on Google
> >2. Hit the first link
2.1 Google 'openssl windows'
2.2 Hit the first link, download and install (it's next, next, next finish)
> >3. Follow the very detailed instructions. Copy and paste the commands...

Still less than 5 minutes. To do something that you have to do once every two years... Doesn't seem like a huge complexity to me.

> 
> And then I guess the server needs to store this cert/public key and look it
> up
> every time it receives a message?  And if the server loses this state, we're
> totally screwed, right?  That means that you need to back this data up within
> your data center; the certs need to be stored on multiple hard drives, etc.
> Again, the Thialfi system is built so that the server can lose any state and
> we
> still guarantee eventual delivery of messages.

Er yes and no. The server already stores the public key, along with the subscription identity (that's described on the document, IIRC). This just s/public key/self signed certificate/g. If we lose this state we've lost all subscription information, so yeah, we're screwed, but not because of the certificate... mostly because of the "we don't know who's subscribed to what anymore" part :)

> 
> Also, the application developer has to figure out how to generate signatures
> using this cert, which yet another piece of complexity.

Er again no. The whole point about changing the pk by a certificate is so the app developer doesn't have to sign. He just has to do an HTTPS connection authenticated with the certificate, and do a POST there.

> 
> Compare this complexity to simply POST'ing to a URL, which has none of these
> problems.

But he just *has* to POST to a URL, only using client authentication (if we implement the change I proposed). 

Again, I don't see what the problems are. Generating a self signed certificate? Easy even not having any idea of what a certificate is. Using https? He has to use that anyway. Using https with client certificate? It's not harder than making the library actually check the client certificate (actually for some APIs is easier), and we assume he's doing that, right?

Full disclosure: I *do* know what a certificate is, but I convinced myself I don't. I haven't programmed in Java in what seems like ages, and can say I didn't remember the network API at all. Just to check how difficult it was, I made a small example:

1. Configured a server to request a client certificate (ok, I cheated at this point on my faking ignorance), and made a small script that dumped the cert.
2. Generated a certificate using the procedure described above
3. Looked up in Google how to connect using client certificates
4. Followed the steps on the second link
5. Tested the app.
6. It worked :P

Total time: 20 minutes. And that includes the server part, and I'm anything but fluent in Java at this moment. I still don't see what the problem is.
> Well, as you point, TCP assures the delivery of the messages into the Gecko
> client but no to the web app, so we need a second ACK at application level to
> assure the delivery to the Web Application level.

I'm all for end-to-end semantics, but this reflects a level of paranoia that I
think is inappropriate here.  Yes, it's possible that Gecko will receive data
from the notification server, send a TCP ack, and then crash, losing that
notification.

It's also possible that Gecko will deliver the notification to the app, send
the second ack to the notification server, and /then/ crash, before the app has
a chance to finish acting on the notification.  The effect is the same as if we
crashed after sending the first ack.

You gain only a tiny amount of protection by adding a second, application-level
ack -- that's why application-level acks are not done in any networking
protocol built on TCP I've ever seen.

If you want to involve one of Mozilla's networking experts to get a second
opinion here, I'd be happy to loop one in.

Note that the current scheme promises only best effort delivery.  Apps must be
able to tolerate lost notifications.  A lost notification due to a Gecko crash
should be handled, just as a lost notification due to a server fault or due to
my being disconnected from the network for 5 days should also be handled.  So
whatever problem we're trying to prevent with this second ack will still
manifest in these other ways.

In contrast, if we had a system with guaranteed eventual delivery, we could
guarantee delivery much closer to the end-to-end level.  This is because the app is
happy to receive duplicate notifications of (topic, version).  We can be lazy
about how often we serialize the (topic, version) tuples we've seen, because
it's not a big deal if we see one twice.

> If we lose this state we've lost all subscription information, so yeah, we're
> screwed, but not because of the certificate... mostly because of the "we
> don't know who's subscribed to what anymore" part :)

Sure, I guess I just mean that the cert is yet another piece of hard state that
the servers can't lose.  Note that a Thialfi-like system does not fail to
guarantee eventual delivery even if the server loses all of its state!

>> I'll note that your argument here is basically "let's design the system so
>> that it's hard for an app to make a mistake".  That is /precisely/ the
>> argument I've used earlier and that you've written off as unimportant to API
>> design!
>
>No, in fact this is the "let's design the system so it's hard to be abused"
>argument.

You agree that the system cannot be abused if app devs don't screw up and send
the secret URLs in the clear, right?

What we're protecting against is abuse which stems from those URLs being
leaked, which can happen only if app dev screws up, right?

So therefore the principal at work here is that we should design the protocol
to protect against mistakes by app devs.

> I still don't see what the problem with [requiring a certificate] is.

The problem is that I'm not convinced it's a necessary security feature to
protect against a likely attack.  It still seems to me to be a premature
optimization, just like the application-level ack.

If you wanted certs to be an optional security measure, that would probably be
fine with me.  Since they can be optional, I'd say we should leave it out of
the first revision of this spec, but I'm happy to be flexible on that.

At this point, it seems to me that your top priority is to get something done
as soon as possible, even if you think it's not perfect.  In that sense, having
a smaller spec (and also carefully picking your battles here) is to your
advantage.

Alternatively, if you really think that this whole thing falls apart as soon as
we don't have certs involved, let's come back to this after we've reached
consensus elsewhere.  I don't disagree that there's an upside to requiring the
certificates.  This is definitely not our biggest point of contention.
(In reply to Justin Lebar [:jlebar] from comment #101)
> > If we lose this state we've lost all subscription information, so yeah, we're
> > screwed, but not because of the certificate... mostly because of the "we
> > don't know who's subscribed to what anymore" part :)
> 
> Sure, I guess I just mean that the cert is yet another piece of hard state
> that the servers can't lose.  Note that a Thialfi-like system does not fail to
> guarantee eventual delivery even if the server loses all of its state!

Actually... the only way to do that is if the clients periodically contact the servers to ask them 'hey mate, do you have anything for me'? Cause otherwise, if the server loses his object<->clients interested table, the next time a notification arrives for that object he wont know who to send it, and unless clients proactively ask they won't ever get those notifications. 

And the solution should that information disappear in a puff of electricity is exactly the same than for Thialfi: just re-register. Although since we're registering identities here and not object identifiers, you only have to register once and not once per object-of-interest.

I like particularly this bit of the Thialfi [1] paper:

"The Thialfi client library sends periodic heartbeat messages to the Registrar to indicate that it is online (...). Superficially, these periodic heartbeats might resemble polling (...)"

Yeah, superficially they might resemble polling because you know what? They're polling. On a very efficient way, I'm sure, that doesn't cause any stress on the server. Not so sure about the mobile networks though (the costly part isn't the actual broadband consumed but establishing and closing channels, as I understood it).

> 
> >> I'll note that your argument here is basically "let's design the system so
> >> that it's hard for an app to make a mistake".  That is /precisely/ the
> >> argument I've used earlier and that you've written off as unimportant to API
> >> design!
> >
> >No, in fact this is the "let's design the system so it's hard to be abused"
> >argument.
> 
> You agree that the system cannot be abused if app devs don't screw up and
> send the secret URLs in the clear, right?

Certificates are a definite requisite for broadcasts (yeah, I already know what you feel about them), and are a defense in depth measure for the rest. I still think we have to implement some way of controlling who can post notifications, and that was the signatures on the current version, which we can replace with the self signed certificates if you want to alleviate developer angst. As I told before, this point was argued on the Security Review (where, by the way, nobody asked to remove it, just to provide a way to revoke access to a given key in case of compromise). And it's already implemented (well, the signing version is) so not modifying it is actually the easiest way to get it out of the door :P

> What we're protecting against is abuse which stems from those URLs being
> leaked, which can happen only if app dev screws up, right?

No, actually no. They can leak also if the customer device (which can be a phone, or a desktop computer) is compromised, or if we screw up and publish them... and even them a malicious entity won't be able to post messages.

> So therefore the principal at work here is that we should design the protocol
> to protect against mistakes by app devs.

We should design the protocol to protect the service (OUR service) against mistakes by app devs. Not to protect dev systems/apps against their own errors. That's out of the scope, or should be. 


[1] http://www.michaelpiatek.com/papers/thialfi-sosp11.pdf
> Yeah, superficially they might resemble polling because you know what?
> They're polling.

Correct.  But as I explained earlier, if you don't have guaranteed delivery,
every app has to poll for itself.  So your choice is either to have every
application poll individually, which leads to a slew of problems which I've
outlined above, or to have Gecko poll, which means that we are entirely in
control of the polling mechanism.  The choice is not between polling with
Thialfi and no polling without.

> And the solution should that information disappear in a puff of electricity
> is exactly the same than for Thialfi: just re-register. Although since we're
> registering identities here and not object identifiers, you only have to
> register once and not once per object-of-interest.

I have to register once per object of interest because if we lost the list of
objects I'm interested in, the server doesn't know which messages to forward to
me, right?  You have to do it this way to support broadcast messages, right?
Otherwise the application server has to remember that user X is interested in
object Y, but then to do a broadcast, the app server must send a list of users
to the notification server, and that's not how your broadcast messages work.

Anyway, this protocol that has been proposed here is not designed around
recovery of state upon re-registration.  If you want to propose a new protocol
which is designed to recover gracefully but is somehow different from Thialfi,
then please lay out the details so we can discuss it.  It's not simply a matter
of adding a poll-and-re-register mechanism to the existing protocol, however.
As I hope the paragraph above shows, recovery is a non-trivial matter.

It seems pretty unlikely to me that you'll be able to design in a few days a
better protocol than some experts designed and tested over the course of some
months, so at the point that we're re-designing our protocol, I as a reviewer
would expect us to start with a known-good protocol and to have a good reason
for each point at which we depart from prior art.

> They can leak also if the customer device (which can be a phone, or a desktop
> computer) is compromised

If my device is compromised, you can get all my passwords, read all my e-mails,
and view the contents of any site I visit.  That you could also send me push
notifications seems like the least of our worries.

(In fact, even if we have signatures, if you control my device, you can modify
my push notification subscriptions arbitrarily, so you could re-register all my
push notifications with new certs and then spam me arbitrarily.  But of course
you don't even need to do that; if you control the phone, you can deliver fake
push notifications straight to Gecko without ever touching the network.)

> We should design the protocol to protect the service (OUR service) against
> mistakes by app devs. Not to protect dev systems/apps against their own
> errors

As I've said before, I'm suggesting that we should design the service to
protect /users/ against app developers' errors.  I agree that protecting apps
against their own errors isn't a big deal.

But also, if you're concerned about protecting the notification server against
mistakes by app devs -- which I agree is also important -- you might wish to
consider

 * Storing just one message from apps, rather than an arbitrary number, since
   you don't want your service's storage promise to be unbounded, and

 * Placing a hard limit on how large an app's messages can be, for the same
   reason as above.

This would make our service more like Apple's.

====

To change the subject a bit: Another piece of the existing protocol that
confuses me is the gecko <-> notification service bit where Gecko requests a
UUID from the notification server.

Surely the notification server is just going to return a random number here.
It would be computationally intensive to check that this number hasn't been
handed out before, and if we choose the ID from a sufficiently large namespace
(say, 2^128), it's prohibitively unlikely that we'll ever choose the same ID
twice.

So if the server is merely choosing a random number, why are we doing that on
the server?  We can pick random numbers on the client -- indeed, we have to be
able to choose good random numbers in order to do crypto.

I'll note that the current scheme doesn't protect against impersonation
attacks, since a malicious client which knows the user-id of some other user
can ignore the ID provided by the server and register as that user.

====

One thing that would be helpful here in evaluating what we can do for v1 is to
understand exactly what use-cases you guys would like to use push notifications
for in v1.

That would inform whether a certified-apps-only approach would be acceptable,
or whether we could pare down the protocol significantly and still meet your
needs.  Perhaps there are ways to meet your immediate needs without push
notifications at all.
(In reply to Justin Lebar [:jlebar] from comment #103)
> 
> ====
> 
> To change the subject a bit: Another piece of the existing protocol that
> confuses me is the gecko <-> notification service bit where Gecko requests a
> UUID from the notification server.
> 
> Surely the notification server is just going to return a random number here.
> It would be computationally intensive to check that this number hasn't been
> handed out before, and if we choose the ID from a sufficiently large
> namespace
> (say, 2^128), it's prohibitively unlikely that we'll ever choose the same ID
> twice.

This is not really a "random" number. See https://github.com/telefonicaid/notification_server/blob/develop/src/common/token.js#L18

We are using that token to verify that the user trying to register with this token (in registerUA) has gotten the token from our server, not others. Actually, it's a mere GET /token, but the idea is that Gecko should perform some authentication to get one. Mozilla Persona? In the first-run-experience in B2G? Using OAuth? We need to discuss when the security guys have some spare time.

> 
> So if the server is merely choosing a random number, why are we doing that on
> the server?  We can pick random numbers on the client -- indeed, we have to
> be
> able to choose good random numbers in order to do crypto.
> 
> I'll note that the current scheme doesn't protect against impersonation
> attacks, since a malicious client which knows the user-id of some other user
> can ignore the ID provided by the server and register as that user. 

But this is the same problem that if you get your credit card stolen, you will probably get robbed in some manner. You are losing a secret, like the token.

> 
> ====
> 
> One thing that would be helpful here in evaluating what we can do for v1 is
> to
> understand exactly what use-cases you guys would like to use push
> notifications
> for in v1.
> 
> That would inform whether a certified-apps-only approach would be acceptable,
> or whether we could pare down the protocol significantly and still meet your
> needs.  Perhaps there are ways to meet your immediate needs without push
> notifications at all.

Let's ask Daniel Coloma or Andreas Gal to say if this could be used for other non-system apps in v1.

Thanks Justin!
> We are using that token to verify that the user trying to register with this
> token (in registerUA) has gotten the token from our server, not others.

I see.  I guess I have a few things to say about this:

1. What do you use the data in this identifier for, aside from verifying that I
got the identifier from this service?

Encoding my Persona log-in into this ID is an interesting idea.  I like this
because it makes the protocol less stateful.  But that opens an entirely new
can of security and privacy concerns; I'm not sure we want to tackle it here,
and we certainly wouldn't want to add it lightly!

As it is, I don't see a lot of value in the data you're putting into the
identifier.  In fact, it doesn't even look like you have code to unpack any of
the information in there, aside from checking the MD5, although I'm of course
unfamiliar with this code.  Maybe I'm missing something.

2. Have you considered how long these identifiers will be?  All the numbers in
the identifier are encoded as ASCII, right?  So roughly, we're looking at:

  10^6 serial numbers = 6 bytes
  10^1 servers = 1 byte
  A 13-decimal-digit Date.now() = 13 bytes
  uuid.v1() (e.g. '6c84fb90-12c4-11e1-840d-7b25c5ee775a') = 36 bytes
  Five separator characters ('#', '#', '_', '@') = 5 bytes
  MD5 digest = 20 bytes

  TOTAL: 81 bytes

Then AES rounds this up to the nearest multiple of 16 bytes (the AES block size).

  TOTAL AFTER AES: 96 bytes.

That's ignoring the padding you guys add around the MD5.  It's also assuming
that you're using ECB mode in AES, which you really, really, really should not
do [2].  If you do CBC, we need an IV, which adds another 16 bytes.

96 bytes just for an identifier feels pretty heavy-handed to me, unless you're
getting a lot of value out of the data stored therein.  I guess it goes back to
question (1).

3. You guys are also re-inventing the crypto primitive of a HASHMAC here, but
unfortunately are doing it incorrectly.  One should always encrypt and then MAC
[1]; this code does the reverse.  (Actually, because it's doing a simple MD5
instead of a keyed MD5-HMAC as the MAC step, this code is likely much less
secure than mac-then-encrypt.  This would be particularly true if encryptAES
were using ECB mode, which I suspect is the case.)

This code also uses MD5, which is known to be an insecure hash.

To do this correctly, you'd do

  c = AES(key1, whatever data you want || UUID);
  mac = HASHMAC(key2, c);
  identifier = c || mac;

(where the || operator is concatenation).

4. I'm a pretty strong believer in YAGNI ("you ain't gonna need it").  If we're
not using the data encoded in this identifier for anything important at the
moment, then I think we should switch to the simpler "generate UUID on the
client" approach.

If in the future we want to encode something like your browser-id credentials
into the ID, then we can figure that out then; like I say, I think this is an
interesting idea, but it's rather complicated.  (Just as an example, we
wouldn't want someone who got hold of your token to be able to log into
BrowserID as you indefinitely.  So perhaps we'd want these tokens to expire
after a period of time.  That makes them much less useful as unique
identifiers.)

> But this is the same problem that if you get your credit card stolen, you
> will probably get robbed in some manner. You are losing a secret, like the
> token.

I agree, and I don't think we need to protect against this (unless the token is
used for /authentication/, which is a different story).  My point (which I
didn't make clearly, sorry) is only that neither "generate the UUID on the
client" nor "generate the UUID on the server" protects against impersonation
attacks.

At the point that we're using the token for authentication, then we need to
consider how to protect the token better.  To use the credit card analogy: If
my credit card gets stolen, I just have to get a new credit card -- I don't
have to get a whole new identity!

[1] See your favorite crypto text, or as a poor alternative,
http://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac

[2] I tried to look up what encryption mode method crypto.encryptAES uses, but
the docs I found at http://nodejs.org/api/crypto.html do not include the
encryptAES method.  Maybe you guys are using an older/newer version of node?
I'm not at all familiar with node.js, so sorry if I'm missing an obvious place
to go for the right docs here.
(In reply to Justin Lebar [:jlebar] from comment #105)
> > We are using that token to verify that the user trying to register with this
> > token (in registerUA) has gotten the token from our server, not others.
> 
> I see.  I guess I have a few things to say about this:
> 
> 1. What do you use the data in this identifier for, aside from verifying
> that I
> got the identifier from this service?

Nothing else.

> 
> Encoding my Persona log-in into this ID is an interesting idea.  I like this
> because it makes the protocol less stateful.  But that opens an entirely new
> can of security and privacy concerns; I'm not sure we want to tackle it here,
> and we certainly wouldn't want to add it lightly!

Yes, I know. But this open a lot of more opportunities: used with Sync, we can Sync your devices with notifications. But that's more a dream right now ;)

> 
> As it is, I don't see a lot of value in the data you're putting into the
> identifier.  In fact, it doesn't even look like you have code to unpack any
> of
> the information in there, aside from checking the MD5, although I'm of course
> unfamiliar with this code.  Maybe I'm missing something.
> 
> 2. Have you considered how long these identifiers will be?  All the numbers
> in
> the identifier are encoded as ASCII, right?  So roughly, we're looking at:
> 
>   10^6 serial numbers = 6 bytes
>   10^1 servers = 1 byte
>   A 13-decimal-digit Date.now() = 13 bytes
>   uuid.v1() (e.g. '6c84fb90-12c4-11e1-840d-7b25c5ee775a') = 36 bytes
>   Five separator characters ('#', '#', '_', '@') = 5 bytes
>   MD5 digest = 20 bytes
> 
>   TOTAL: 81 bytes
> 
> Then AES rounds this up to the nearest multiple of 16 bytes (the AES block
> size).
> 
>   TOTAL AFTER AES: 96 bytes.
> 
> That's ignoring the padding you guys add around the MD5.  It's also assuming
> that you're using ECB mode in AES, which you really, really, really should
> not
> do [2].  If you do CBC, we need an IV, which adds another 16 bytes.
> 
> 96 bytes just for an identifier feels pretty heavy-handed to me, unless
> you're
> getting a lot of value out of the data stored therein.  I guess it goes back
> to
> question (1).

Yeah, the token is pretty big, we know that. And yes, we are using CBC, see
https://github.com/telefonicaid/notification_server/blob/develop/src/common/cryptography.js#L53

> 
> 3. You guys are also re-inventing the crypto primitive of a HASHMAC here, but
> unfortunately are doing it incorrectly.  One should always encrypt and then
> MAC
> [1]; this code does the reverse.  (Actually, because it's doing a simple MD5
> instead of a keyed MD5-HMAC as the MAC step, this code is likely much less
> secure than mac-then-encrypt.  This would be particularly true if encryptAES
> were using ECB mode, which I suspect is the case.)
> 
> This code also uses MD5, which is known to be an insecure hash.
> 
> To do this correctly, you'd do
> 
>   c = AES(key1, whatever data you want || UUID);
>   mac = HASHMAC(key2, c);
>   identifier = c || mac;
> 
> (where the || operator is concatenation).

We'll check the code. Thanks for the info!


> 
> 4. I'm a pretty strong believer in YAGNI ("you ain't gonna need it").  If
> we're
> not using the data encoded in this identifier for anything important at the
> moment, then I think we should switch to the simpler "generate UUID on the
> client" approach.

We are using for the "verification" process: that you get the token from us, and not from any other notification server. As I said, this will be really valid when using another way to authenticate this GET /token.

> 
> If in the future we want to encode something like your browser-id credentials
> into the ID, then we can figure that out then; like I say, I think this is an
> interesting idea, but it's rather complicated.  (Just as an example, we
> wouldn't want someone who got hold of your token to be able to log into
> BrowserID as you indefinitely.  So perhaps we'd want these tokens to expire
> after a period of time.  That makes them much less useful as unique
> identifiers.)
> 
> > But this is the same problem that if you get your credit card stolen, you
> > will probably get robbed in some manner. You are losing a secret, like the
> > token.
> 
> I agree, and I don't think we need to protect against this (unless the token
> is
> used for /authentication/, which is a different story).  My point (which I
> didn't make clearly, sorry) is only that neither "generate the UUID on the
> client" nor "generate the UUID on the server" protects against impersonation
> attacks.
> 
> At the point that we're using the token for authentication, then we need to
> consider how to protect the token better.  To use the credit card analogy: If
> my credit card gets stolen, I just have to get a new credit card -- I don't
> have to get a whole new identity!

Thanks again!

> 
> [1] See your favorite crypto text, or as a poor alternative,
> http://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-
> encrypt-then-mac
> 
> [2] I tried to look up what encryption mode method crypto.encryptAES uses,
> but
> the docs I found at http://nodejs.org/api/crypto.html do not include the
> encryptAES method.  Maybe you guys are using an older/newer version of node?
> I'm not at all familiar with node.js, so sorry if I'm missing an obvious
> place
> to go for the right docs here.
>   MD5 digest = 20 bytes

Actually, this is wrong.  It's only 20 bytes if you encode it in binary.  It's 40 hex characters, so 40 bytes if encoded as hex.  Anywho...

Given the YAGNI principal, and given that we're going to have to change the protocol /anyway/ if we want to authenticate ourselves to the notification server as part of getting a token (e.g. using BrowserID), it seems to me we should remove the server-side token generation from this version.  It's just extra complexity without any upside.  (That is, what attack are we preventing by verifying the tokens on the notification server?)

> And yes, we are using CBC, see
> https://github.com/telefonicaid/notification_server/blob/develop/src/common/cryptography.js#L53

Ah, cryptography.js is not the node.js library, got it.  It looks like the functions you're using don't use an IV, though.  If you re-ordered the identifier token so that the UUID came first, that wouldn't be /such/ a big deal, because the UUID has a lot of entropy (but it's not at all uniformly random, because it's ASCII!).  But for most other uses, you really need an IV in there to prevent attackers from being able to guess parts of the encrypted stream.  Either that or you need to use a new key for every encryption.
(In reply to Justin Lebar [:jlebar] from comment #107)
> >   MD5 digest = 20 bytes
> 
> Actually, this is wrong.  It's only 20 bytes if you encode it in binary. 
> It's 40 hex characters, so 40 bytes if encoded as hex.  Anywho...
> 
> Given the YAGNI principal, and given that we're going to have to change the
> protocol /anyway/ if we want to authenticate ourselves to the notification
> server as part of getting a token (e.g. using BrowserID), it seems to me we
> should remove the server-side token generation from this version.  It's just
> extra complexity without any upside.  (That is, what attack are we
> preventing by verifying the tokens on the notification server?)
> 
> > And yes, we are using CBC, see
> > https://github.com/telefonicaid/notification_server/blob/develop/src/common/cryptography.js#L53
> 
> Ah, cryptography.js is not the node.js library, got it.  It looks like the
> functions you're using don't use an IV, though.  If you re-ordered the
> identifier token so that the UUID came first, that wouldn't be /such/ a big
> deal, because the UUID has a lot of entropy (but it's not at all uniformly
> random, because it's ASCII!).  But for most other uses, you really need an
> IV in there to prevent attackers from being able to guess parts of the
> encrypted stream.  Either that or you need to use a new key for every
> encryption.

Yes, we have a small wrapper, but we use the node.js crypto library (which uses OpenSSL underneath). It uses a IV (if I'm not mistaken) here:

https://github.com/telefonicaid/notification_server/blob/develop/src/common/token.js#L10

That is in the consts file there:

https://github.com/telefonicaid/notification_server/blob/develop/src/config.js

And it's used here:

https://github.com/telefonicaid/notification_server/blob/develop/src/common/token.js#L40

Anyway, shall we focus on the Gecko part? :) And the review of the server, as this will be part of a Mozilla-Telefónica project, we can contact experts in node.js in Mozilla (I'm pretty sure they are, BrowserID is in node!) if they want (and, personally, I'd love to).
> Anyway, shall we focus on the Gecko part? :)

Well, this is the Gecko part, inasmuch as if we remove part of the protocol as I've suggested here, it affects Gecko.

We also take the security of the server pretty seriously.  If anything, the crypto issues exposed here (by just a cursory look at a small part of the system) give me a lot less confidence that we can trust the notification server to handle sensitive data.  That again informs our decisions in the protocol, and therefore in Gecko.

> It uses a IV (if I'm not mistaken) here:

This is an encryption key, not an IV.  If you follow the path down:

  crypto.encryptAES(rawToken, cryptokey);         calls
  _encrypt(rawToken, cryptokey, 'aes-128-cbc');   calls
  crypto.createCipher('aes-128-cbc', cryptokey);  which is
  http://nodejs.org/api/crypto.html#crypto_crypto_createcipher_algorithm_password

Note that it's not calling crypto.createCipheriv().

Digging through the node.js code, we end up in src/node_crypto.cc at CipherInit().  CipherInit() does create an IV, but creates it using the key by calling EVP_BytesToKey [1]:

  int key_len = EVP_BytesToKey(cipher, EVP_md5(), NULL,
    (unsigned char*) key_buf, key_buf_len, 1, /* outparam */ key, /* outparam */ iv);

The NULL parameter is the salt, meaning that the IV is a deterministic function of the key.  Which is about as useful as having an IV of 0.

[1] http://www.openssl.org/docs/crypto/EVP_BytesToKey.html
I just go back after one week of vocation.  I am very sorry for my disappearance in last few days while there is a long and hot discussion here.  This is a long list that I take a long time to finish it. I try to study everything had mentioned, but I am very sure I must be missing something.  So, forgive me if I am missing something.

In the following, I have some ideas for some of these topics.

1. For authenticate application servers.  (For DoS/replaying) I like to keep it as simple as possible.  A protocol with multiple steps is not easy for web programmers (backend).  Invoking BrowserID, here, is too much.  All that we needed, here, is to challenge app servers (authentication).  So, I think a procedure of providing a secure challenge is good enough.  For example, include hash(challenge_phrase, shared_secure) in every message.  Or ssl with a public-key provided by the app server.

2. For losing messages.  We can lose version messages with the same reasons of losing non-version messages.  A benefit of version messages is avoiding a massive number of messages after a long time of offline.  But, I still remember we; Jonas, Telefonica members and me, had discussed it at Barcelona.  Telefonica members had proposed to use a message ID to identify messages that app server can update an existing message with the same message ID.  I don't why Guillermo and Fernando stop using it.  (more in 4.)

3. For confidentiality of messages.  Avoiding sending private data in the notification can stop eavesdroppers sitting on the push server, but we also paid for power consumption (connect to the app server for message content).  We can limit size of the payload for 128 bytes (less or more), just like what SMS does, to encourage developers sending only simple messages.  We can also encourage developer to encrypt private data by providing best practices in the document.
We can also add a new argument for registerURL() for passing a secret key for decryption, or generate a secret key as a part of returned value of registerURL() for every application.  Both of them encourage developers to encrypt private data in a very clear way.

4. Soft-state (losing messages again).  Both version messages or non-version messages are best effort.  It is too expansive to prove of delivery, and, for most apps, best effort is good enough.  We have TTL in the message.  And message ID can be used to update an existing message.  There we should have a upper-bound for TTL, it can not be an infinite.  So, by leveraging TTL and message ID, app server can touch the message periodically to make sure of delivery if the message is so important.  The ACKs of messages should be handled by the app and the app server them-self.  In this way, push server is in a way of soft-state, it keeps push server in simple.
> 4. Soft-state (losing messages again).  Both version messages or non-version messages are best 
> effort.

I highly recommend you read through the Google paper.  It describes how a version-message scheme can be better than best-effort.  You can actually guarantee eventual delivery with a version-message scheme, even in the case that the server loses all its data.

The paper also explains how best-effort delivery isn't good enough for most apps, because best-effort  requires apps to poll, and that has all sorts of problems.  (Or alternatively, as you say, best-effort notifications require app servers to keep track of acks on their end, which is a huge burden.)
(In reply to Justin Lebar [:jlebar] from comment #112)
> > 4. Soft-state (losing messages again).  Both version messages or non-version messages are best 
> > effort.
> 
> I highly recommend you read through the Google paper.  It describes how a
> version-message scheme can be better than best-effort.  You can actually
> guarantee eventual delivery with a version-message scheme, even in the case
> that the server loses all its data.
> 
> The paper also explains how best-effort delivery isn't good enough for most
> apps, because best-effort  requires apps to poll, and that has all sorts of
> problems.  (Or alternatively, as you say, best-effort notifications require
> app servers to keep track of acks on their end, which is a huge burden.)

By your suggestion, I read the paper.  Its guarantee of delivery stands by periodically exchanging state digests, a kind of polling, between clients (UA for us) and servers, even there is no any state change.  It is efficient and scalable for desktop or server environments by dividing polling traffic into a large number of local traffic domains.  But, I afraid it is not what we want for mobile devices.

According the paper, thialfi is also a kind of best effort, but it a way for detecting error and doing error recovery, by periodically exchanging digest & re-fetching states from data source (application back-end/or app server for us).  It also provides a way to recover from a storage, server, or network failure.  It is an excellent feature that we don't consider before.  We should refine our protocol in this direction.

We don't need to switch from a messaging service to a signaling service (Thialfi) even there is a protocol for signaling service super matching our requirement.  Because, we still can build a messaging service on the top of a signaling service.  So, I suggest to keep the API as a messaging service.

Not all applications need a reliable messaging service, best-effort is good enough for most apps.  For applications that need a reliable messaging service, it is acceptable and reasonable for their developers to pay time on building a reliable messaging channel based on a best-effort service.  Actually, it is not a hard task, especially, with a help from libraries.

AFAIK, there is at least one weakness of current protocol needed to be fixed.  It can not be recovered from the database failure at push servers.  I suggest to return an error for registerUA() requests to tell UA to re-registering all applications.  And, URLs for pushing returned should be generated by a deterministic function to make sure URLs are not changed.
> According the paper, thialfi is also a kind of best effort, but it a way for detecting error and 
> doing error recovery, by periodically exchanging digest & re-fetching states from data source 
> (application back-end/or app server for us).

Because it has a way of doing error-recovery, it can guarantee eventual delivery.  In section 5, they show that Thialfi has the "reliable delivery property" that "If a wel-behaved client registers for an object X, Thialfi ensures that the client will always eventually learn of the latest version of X."  That's a much stronger guarantee than best effort.

> We don't need to switch from a messaging service to a signaling service (Thialfi) even there is a 
> protocol for signaling service super matching our requirement.  Because, we still can build a 
> messaging service on the top of a signaling service.  So, I suggest to keep the API as a messaging 
> service.

I'm not sure exactly what you mean here.  You're saying that we should build a signaling service and then build a messaging service on top of that, so we can expose a messaging service to applications?  It's not clear to me why that would be an improvement over what we have.

If you're saying that we can build a reliable messaging service, it's not at all obvious to me how that would work, particularly in the face of server failure.  Here's why:

Suppose the notification server loses all its state.  In the Thialfi world, I register with the server, and the server tells me "I have no idea what version that object is; go ask the application server to send you any relevant updates."  (It sends me the RegistrationUnknown() notification, from figure 2.)  It's as though, when the server loses state for an object, it sets the version for that object to infinity, forcing all clients who watch that object to update via the application server.

Suppose instead we designed a system around messages.  The server again crashes and loses all its state.  Now I contact the server and say "do you have any pending messages for me?"  And the server says "Nope."  What else could it say?  It could say "No, but maybe I lost a message," but that doesn't help me either.  The application server should not be responsible for remembering every message it tried to send me so that it can replay those messages in case the notification server loses state; the application server only knows the current state of the world.

Now, maybe you'd suggest a hybrid approach where we send messages to applications, but we also have a special message which says "I may have lost some messages."  This might work, but it is IMO an inferior API, because the failure case is different from the normal case.  So now application developers have to write two codepaths instead of one.  Surely there will be bugs in the failure case, because it's not the common case.

> Not all applications need a reliable messaging service, best-effort is good enough for most apps.  For 
> applications that need a reliable messaging service, it is acceptable and reasonable for their 
> developers to pay time on building a reliable messaging channel based on a best-effort service.  
> Actually, it is not a hard task, especially, with a help from libraries.

Basically any app which synchronizes user state between the client and the application service needs a reliable messaging service.  That would include applications which do e-mail, calendar sync, contacts sync, and photo gallery sync.

I agree that not all apps need reliable messaging.  For example, a weather alerts app probably doesn't need reliable messaging.  But apps which are happy to lose a message would be, in my view, the exception.

Moreover, consider how an app would implement a reliable service on top of a best-effort service.  Basically, the app will need to poll the application server periodically to check for missed messages, right?  So now we have all the downsides that I laid out in comment 88.

> AFAIK, there is at least one weakness of current protocol needed to be fixed.  It can not be recovered 
> from the database failure at push servers.  I suggest to return an error for registerUA() requests to 
> tell UA to re-registering all applications.  And, URLs for pushing returned should be generated by a 
> deterministic function to make sure URLs are not changed.

Anyway, I agree these are good changes, but it's more complicated than that.

What happens if the notification server dies after I register -- how will the client know it has to re-register, in that case?
(In reply to Justin Lebar [:jlebar] from comment #114)
> 
> Because it has a way of doing error-recovery, it can guarantee eventual
> delivery.  In section 5, they show that Thialfi has the "reliable delivery
> property" that "If a wel-behaved client registers for an object X, Thialfi
> ensures that the client will always eventually learn of the latest version
> of X."  That's a much stronger guarantee than best effort.

There is a polling here.  4.1.2 "In the current implementation, the heart-beat interval is 20 minutes." (from client to registrar).  section 5, "Every message from the client includes a digest that summarizes all client registration state."  It is how Thailfi guarantee.  It is what we try to avoid although this one is already better than polling by apps them-self.

> 
> > We don't need to switch from a messaging service to a signaling service (Thialfi) even there is a 
> > protocol for signaling service super matching our requirement.  Because, we still can build a 
> > messaging service on the top of a signaling service.  So, I suggest to keep the API as a messaging 
> > service.
> 
> I'm not sure exactly what you mean here.  You're saying that we should build
> a signaling service and then build a messaging service on top of that, so we
> can expose a messaging service to applications?  It's not clear to me why
> that would be an improvement over what we have.

I said, "we still can ...", but I don't favor that since I still believe current protocol is good enough with some minor changes.  What I want to say, here, is the API should not (or does not) be affected by the underlying protocol.  It is just a reminding!

> 
> If you're saying that we can build a reliable messaging service, it's not at
> all obvious to me how that would work, particularly in the face of server
> failure.  Here's why:
> 
> Suppose the notification server loses all its state.  In the Thialfi world,
> I register with the server, and the server tells me "I have no idea what
> version that object is; go ask the application server to send you any
> relevant updates."  (It sends me the RegistrationUnknown() notification,
> from figure 2.)  It's as though, when the server loses state for an object,
> it sets the version for that object to infinity, forcing all clients who
> watch that object to update via the application server.
> 
> Suppose instead we designed a system around messages.  The server again
> crashes and loses all its state.  Now I contact the server and say "do you
> have any pending messages for me?"  And the server says "Nope."  What else
> could it say?  It could say "No, but maybe I lost a message," but that
> doesn't help me either.  The application server should not be responsible
> for remembering every message it tried to send me so that it can replay
> those messages in case the notification server loses state; the application
> server only knows the current state of the world.

The reliability of Thialfi is only for version, not for the content itself.  For Thailfi, the content associated with a version is maintained by an application back-end.  It means applications need to connect back to application servers for the content in our terms.  So, application servers need to keep all messages by them-self for Thialfi.

We usually need some stakeholders to remember all pending messages.  Currently, push-servers remember all messages for you.  You can still remember messages by your-self if you don't trust the reliability of push-servers and you believe you can do it better.  But, you guess what, we still lost messages at application servers.  Yap!  We are not charged for that. But, for most applications, queuing at push-servers is good enough.  I believe most apps do so.

For apps needed reliable messaging, an application server can implement a reliable messaging channel on the top of current protocol in one of several ways.  For example, the app server periodically send (according TTL, hours or days) state change messages with the same message ID, and UA connect back to the app server for retrieving content and the connection is also an ACK for the messages.  It can be recovered from message losing.

Or, the app server can periodically send last one of all queued messages.  Once some of queued messages are lost, it can be detected by client, because some message ID was lost.  So, it is a reliable delivery without periodically polling from devices to servers.  For power consumption, I think periodically pushing from app servers is better than doing polling on devices.

But, I know that we still need to have the capability of recovering from registrar state losing.  That is what I had said in last part of comment 113, and you had mentioned in last part of comment 114.

> Moreover, consider how an app would implement a reliable service on top of a
> best-effort service.  Basically, the app will need to poll the application
> server periodically to check for missed messages, right?  So now we have all
> the downsides that I laid out in comment 88.

Not really, you can use periodically pushing instead of periodically polling.

> 
> Anyway, I agree these are good changes, but it's more complicated than that.
> 
> What happens if the notification server dies after I register -- how will
> the client know it has to re-register, in that case?

We are making mobile devices.  Almost all devices are moving from network to another and with a lot of sleep-wakeup periods.  All these activities would cause connecting back to the push-server for updating their location (IP address).  Maybe, some devices never off/sleep or leave his LAN, but they are really rare.  And, registrar losing is also very rare.  Most servers have their backup plans, and registration is not changed frequently.  In case of some unlucky man, he can be recovered from roaming, waking up from a sleep, reconnecting to the network or a reboot.  I don't think it is an issue.  It can be solved by 工人智慧 (labor's knowledge) in Taiwanese's terminology.

For desktop, power consumption is not a big issue (not so big).  Persistent connection or periodically polling are acceptable.  So, the issue can be solve by a configuration at building time.
> There is a polling [in Thialfi].

Correct.  But it's Gecko-controlled polling, not application-controlled
polling.  So we don't have all the downsides from comment 88 (except (b), I
suppose, but the battery-life impact of the polling is much less, because
there's just one process polling, and we control the frequency).

> What I want to say, here, is the API should not (or does not) be affected by
> the underlying protocol.  It is just a reminding!

Sorry, I still don't think I understand.  Do you mean that if the protocol is
Thialfi-like, we can expose a DOM API that's identical to the current proposal?

I don't see how we could make that work in a sane way.  Thialfi needs all those
extra callbacks from figure 2 in the paper, and at least some of those need to
be exposed to the DOM API.  Not to mention that it's not clear to me how we can
sanely deliver messages in the DOM API if the protocol only gives us versions.

> For Thailfi, the content associated with a version is maintained by an
> application back-end.  It means applications need to connect back to
> application servers for the content in our terms.  So, application servers
> need to keep all messages by them-self for Thialfi.
>
> We usually need some stakeholders to remember all pending messages.

This is close, but not right, in my view.

Application servers need to keep all state that's relevant to clients.  They
also need to provide a mechanism for synchronizing client state with
application server state.  They need to do these things regardless of whether
or not we have push notifications, of any form.

In a Thialfi-like scheme, application servers do not need to associate a
message with each version change, and they do not need to persist these
messages.

The application only needs a way of synchronizing its data with the client, and
it needs this regardless of whether or not we have push notifications.  Think
of the "refresh now" button on your calendar app.  That button might be
implemented by the server remembering the state of the client and sending a
diff (this is the "persist messages" approach), but it's more likely that
button is implemented by the server and client exchanging information to
determine what is and isn't up to date.  That's very different than the server
queueing up messages to send.

> For power consumption, I think periodically pushing from app servers is
> better than doing polling on devices.

This is because the application server can remember the client's state, and so
only push when it thinks the client is out of date?

Unfortunately that doesn't work for broadcast messages (which I'm happy to get
rid of), but I guess it works otherwise.

Can we also agree that this system is worse than having the notification server
guarantee eventual delivery, in the sense that it adds a lot of complication
for app developers that we are perfectly capable of handling in the Gecko <->
notification server layer?

>> What happens if the notification server dies after I register -- how will the
>> client know it has to re-register, in that case?
>
> Almost all devices are moving from network to another and with a lot of
> sleep-wakeup periods, [so we'll re-register eventually, and this isn't a
> problem.]

Maybe this is good enough, although it would suck for someone who spends all
day in the office with their phone connected to wifi.  But can we agree this is
not good, and that a Thialfi-like system would be better?

> I still believe current protocol is good enough with some minor changes.

Okay, this brings up a good point.  I've been coming at this review mainly from
the perspective of "what do we ideally want?"  That's how we design protocols
for the web -- our process is very much not a "worse is better" [1] iterative
approach.  That's also the standard to which we hold code checked in to
mozilla-central; you can't (in general) get out of addressing a reviewer's
criticism by arguing that your code, while flawed, is good enough.

But we need to separate out what we would ideally want -- which I still think
is an important debate -- and what we can realistically have for v1.  Then we
need to decide whether the thing we can have for v1 is good enough.  And we
need to figure out whether the cost of changing from our v1 solution to what we
ideally want will be too high.

To be honest, it's not clear to me that even the changes "necessary for v1" --
that is, the minimal set of changes we'd need to make to the protocol to make
it technically acceptable, setting aside the cost of rewriting everything for
v2 and completely breaking compatibility for all users of the API -- are
something we can actually accomplish for v1.

I feel that way particularly because we can't even agree on what these minimal
changes are.  I've suggested some relatively simple changes, such as:

  1. Getting rid of the protocol-level ACKs which are redundant with TCP acks
  2. Generating the client UUID on the client side
  3. No longer queueing of arbitrarily many messages to offline clients, since
     that presents scaling difficulties (and is not in line with cited prior
     art).
  4. Simplifying the cryptographic requirements on applications.

  (This is not an exhaustive list, and I reserve the right to ask for other
  changes for a "v1-quality" protocol.)

Additionally, Thinker has suggested that

  5. we should add a mechanism for recovering from server failure.

But we've been met with resistance on 1-4.  (5 hasn't been addressed by the
Telefonica folks.)

If we can't agree on relatively simple changes like these (okay, I'm not sure
(5) is a simple change, but the others are), then it seems to me there's no way
we're going to land this for v1.

Of course, I'm still concerned that

  6. The notification server can read any unencrypted data sent by the
     application.

Jonas and I discussed this on Friday, and we may be able to fix (6) by
requiring the application to give us a public key when registering the
notification.  Gecko would expect all data it receives from the application to
be encrypted with the corresponding private key.  But note that this increases
the cryptographic requirements on the application server, to the point that
they need to do two crypto operations for every message.  This is not
consistent with our approach of designing simple APIs, and I don't think it's
an acceptable long-term solution, given that we have alternatives.

Additionally, I have two concerns about landing something which broadly
resembles the current protocol for v1, even if we fixed the "nits" above.

  A. It may not to accomplish our goal of reducing network traffic.  Since
  applications will have to poll (or repeatedly push) anyway, it's not clear
  that the notification service as currently designed will be a net win for
  carriers.

  B. If we land the current protocol as-is, it will have momentum, and parties
  may not be willing to change it later.  That would be a real shame.  In fact,
  I'd go so far as to say I'm probably not comfortable landing a protocol which
  broadly resembles the current proposal without a public commitment from all
  parties to change it to make it in line with the standards of quality,
  simplicity, and correctness that Mozilla holds Web APIs to.

I feel like we need to figure out what our goal here is.  Is it to design
something "good enough" that we can land for v1 and then rewrite for v2?  If
so, I think the Telefonica folks should be conceding every small protocol
change that they can, because doing anything else is just wasting our time with
more argumentation, and I can't review the patches until we agree on a
protocol, and I can't r+ a protocol which doesn't meet some minimal standard of
quality and correctness.

Alternatively, if v1 is a lost cause (and I think it is, if I'm honest with
myself) , then we should be having a debate not about what's good enough, but
what's right.  In that sense, I don't think there's any serious argument that
the current solution is superior to a Thialfi-like solution.  The only thing
the current system has going for it by comparison, as far as I can tell, is
that it's already written.  But I suppose that's up for debate.

I understand that we can expect a decision from management about whether we're
going to drop this for v1.  We'll see what happens there, I guess.  In the
meantime, we should keep the dichotomy above in mind.

[1] http://www.jwz.org/doc/worse-is-better.html
Hey Justin,

we are not closed to any change we discuss here, we need to think what is feasible for v1, what is correct, and what is a not really good idea:

  1. Getting rid of the protocol-level ACKs which are redundant with TCP acks

The point about: "what happens if the ACK is done by Gecko, but before delivered to the app, and then Gecko crashes?". We can discuss that. But I think ACKs are a good measure to tell the server that: "ok, this has been delivered".
In fact, I think that Gecko should send the ACK after it has been delivered to the app. (And we do not care if the app has crashed while processing the notification).

  2. Generating the client UUID on the client side

If you think we can't add a BrowserID identity or some kind of OAuth before v1 ships, I think we can change this. Let's Antonio Amaya chime in about the security of our current state.

  3. No longer queueing of arbitrarily many messages to offline clients, since
     that presents scaling difficulties (and is not in line with cited prior
     art).

Apple queues just *one* notification per application. We can do either:
a) Store a arbitrarily large number of notifications.
b) Store just some notifications and when that limit is reached, a "generic" notification like: "you exceeded your quota, poll your server for a full sync" is sent.
c) Store just one notification, but this goes back to be a Thialfi-like system that I don't really like _for this case_ (*see end of the comment).

But I think this should not be inside the protocol and more of the implementation of the server. Our server could save 100 notifications for each app-device, but others might decide to store a extremely large number of those. Maybe the message could be defined, though. 

  4. Simplifying the cryptographic requirements on applications.

If our cryptographic measures are a mess or don't have anything good for the user, just we need to get rid of them. We will change to createCypherIV BTW.

  (This is not an exhaustive list, and I reserve the right to ask for other
  changes for a "v1-quality" protocol.)

Additionally, Thinker has suggested that

  5. we should add a mechanism for recovering from server failure.

As Thinker said:
a) If you use the websocket connection, and there is a server failure, it's probably that you will be disconnected, because we want you to re-register again.
b) If you are on 3G, and there is a server failure, we do not have any kind of message right now to tell you re-register, just when you change your IP, as it is done right now. We can think about sending a notification to every 3G device to re-register again (we know all registered devices in 3G since we save it on a persistent MongoDB), but this is a non sense since if we can reach you that means we have the messages on the DDBB.


What do you think?


--- About Thialfi ---

I like the idea of Thialfi around versions. But I think this has some problems:
a) It needs to poll servers to get the content. Yes or yes. In our system it's up to the application developers.
b) It does need the third party server to store the messages and have more bandwidth to spread the messages. Also: what happens if the third party server crashes? They can lose messages, like us.
c) This system needs that developers MUST have an online presence: a hosting + domain. Because their app installations need some way to download the updated content from the servers.

I think Thialfi is a good thing for spread information and syncronize clients eventually, like it's said it's done with Contacts and Chrome, but in our case, I don't think it fixes the problems that we have (polling and more open connections to get the "diffs" between versions).

But we are open to any change, but we need to discuss them!
(In reply to Justin Lebar [:jlebar] from comment #116)
> > For power consumption, I think periodically pushing from app servers is
> > better than doing polling on devices.
> 
> This is because the application server can remember the client's state, and
> so
> only push when it thinks the client is out of date?
> 
> Unfortunately that doesn't work for broadcast messages (which I'm happy to
> get
> rid of), but I guess it works otherwise.
> 
> Can we also agree that this system is worse than having the notification
> server
> guarantee eventual delivery, in the sense that it adds a lot of complication
> for app developers that we are perfectly capable of handling in the Gecko <->
> notification server layer?

This does not add more complication than Thialfi-like protocol.  For apps which don't want to keep messages, it can use the messaging service only for signaling.  The app server send only version number or a time-stamp of a logical clock in the message with a fixed message ID (with periodically pushing).  Then, it is almost a signaling service like what Thialfi provided, but without periodically polling at device side.  Once an application have received the message, it connect back to the app server to sync states.

Not only signaling, the messages can also carry incremental update information from app servers.  An application connects back to app server only for a message losing being detected.

So, an app server does not need to remember client's state if it don't want to remember.  It is not necessary.  This protocol is flexible for delivering various semantics according requirements of applications.

For broadcasting messages, I think it is also workable over this protocol.
>   1. Getting rid of the protocol-level ACKs which are redundant with TCP acks
> 
> The point about: "what happens if the ACK is done by Gecko, but before
> delivered to the app, and then Gecko crashes?". We can discuss that. But I
> think ACKs are a good measure to tell the server that: "ok, this has been
> delivered".
>
> In fact, I think that Gecko should send the ACK after it has been delivered
> to the app. (And we do not care if the app has crashed while processing the
> notification).

I explained why this is unnecessary at the top of comment 101.

>   2. Generating the client UUID on the client side
> 
> If you think we can't add a BrowserID identity or some kind of OAuth before
> v1 ships, I think we can change this. Let's Antonio Amaya chime in about the
> security of our current state.

I do not think at this point we can add any features and still ship for v1.

>   3. No longer queueing of arbitrarily many messages to offline clients, since
>      that presents scaling difficulties (and is not in line with cited prior
>      art).
> 
> Apple queues just *one* notification per application. We can do either:
> a) Store a arbitrarily large number of notifications.
> b) Store just some notifications and when that limit is reached, a "generic" notification like: "you exceeded your quota, poll your server for a full sync" is sent.
> c) Store just one notification, but this goes back to be a Thialfi-like system that I don't really like _for this case_ (*see end of the comment).
> 
> But I think this should not be inside the protocol and more of the
> implementation of the server.

The number of messages buffered must be a protocol-level contract.

Otherwise, applications will detect differences in behavior between the
Telefonica notification server and someone else's notification server.  That
makes it difficult to write an application which will work for all notification
servers.

>   4. Simplifying the cryptographic requirements on applications.
> 
> If our cryptographic measures are a mess or don't have anything good for the
> user, just we need to get rid of them. We will change to createCypherIV BTW.

I wasn't speaking specifically about the correctness of your crypto, but rather
about the requirement that applications must sign their messages and (as
proposed later in that message) also encrypt their messages, with a separate
public key.  I don't think we can get rid of the encryption requirement, but we
can get rid of the signing requirement.

I know you guys don't think that the extra crypto requirements on the
application side are a big deal.  But I'd encourage you to pick your battles
here carefully, because every point that you contend here decreases the chance
that we'll get anything done anytime soon.

The crypto vulnerabilities I found in the UUID-generating code hopefully won't
matter, because I hope we'll just remove all that code.  But they do bring up
an important point, which is that given the large crypto problems we discovered
from a cursory glance at the notification server code, Mozilla will likely want
to do a full audit of the notification server code to look for other security
issues.  I'd probably insist on that as part of my review.

>   5. we should add a mechanism for recovering from server failure.
> 
> As Thinker said:
> a) If you use the websocket connection, and there is a server failure, it's probably that you will be disconnected, because we want you to re-register again.

I think you misunderstand what we're talking about here.

The issue isn't "we need a mechanism for /notifying/ Gecko when the application
server fails."  It is that we need a mechanism for recovering from server
failure -- in particular, we need a mechanism for rebuilding the server's
state, perhaps as part of server registration.

> a) It needs to poll servers to get the content. Yes or yes. In our system it's up to the application developers.

As I explained in comment 88, having Gecko poll is much better than having the
applications poll.

I don't think arguing this point further will be productive; I'd like us to
stop mentioning Thialfi's polling as an explicit downside, because it's an
explicit upside compared to the alternative.

> b) It does need the third party server to store the messages and have more bandwidth to spread the messages. Also: what happens if the third party server crashes? They can lose messages, like us.

As I explained to Thinker in the third section of comment 116, the application
does not need to buffer messages.

The application does need to keep state that it wants the client to be aware
of.  This is a requirement with or without any form of push notifications.
Thialfi adds no new requirements in this respect, except for the requirement
that the applicaiton have an online presence (c).

> c) This system needs that developers MUST have an online presence: a hosting
> + domain. Because their app installations need some way to download the
> updated content from the servers.

Yes, we've been over this before, and I reject the notion that we should be
designing a protocol for apps which don't have an online presence.  This is a
/web/ API.

Moreover, since the current proposed system is a best-effort delivery service,
the client still needs a way to poll the application server to retrieve any
data it might have missed!  The application server requires an online presence
to meet this use-case.

I also don't think arguing this point further will be productive.

> I think Thialfi is a good thing for spread information and syncronize clients
> eventually, like it's said it's done with Contacts and Chrome, but in our
> case, I don't think it fixes the problems that we have (polling and more open
> connections to get the "diffs" between versions).

I don't understand how our case is any different than contacts and FF sync.
Surely B2G's contacts and FF sync could be built upon our push notifications
service.

You're right that Thialfi requires apps to connect to the app servers to sync
their data.  These will be very short-lived connections, so should not add a
large connection overhead to the carrier's network.

But moreover, we do not want our notification service to be the one channel
through which all application data is sent.  Doing so would require that the
notification service allow arbitrarily-large messages and buffer
arbitrarily-many of them.  This is not a realistic requirement, as prior art
demonstrates.

Again, the advantage of the Thialfi protocol is that the operation "sync with
the application server" is the same as the operation "you got a message
indicating something changed."  That means implementation by app developers is
greatly simplified.

If the only points against Thialfi are ones we've discussed before and soundly
rejected, can we agree now that the v2 protocol will be based on Thialfi?  Then
we need to figure out if there will even be a v1 protocol.

(In response to comment 118)
> For apps which don't want to keep messages, it can use the messaging service
> only for signaling.  The app server send only version number or a time-stamp
> of a logical clock in the message with a fixed message ID (with periodically
> pushing).  Then, it is almost a signaling service like what Thialfi provided,
> but without periodically polling at device side.  Once an application have
> received the message, it connect back to the app server to sync states.

Yes, that's almost like Thialfi, except without the guaranteed eventual
delivery, which of course makes it much less useful.  :)

Note that, unless the app server deletes old messages (which requires it to
keep track of which unreceived messages are old, which requires the app to
explicitly ack to the application server every message received)

> An application connects back to app server only for a message losing being detected.
>
> So, an app server does not need to remember client's state if it don't want
> to remember.  It is not necessary.

In order for the client to detect that it's lost a message, the application
server must keep per-client state in the form of a message sequence number,
which is what allows the client to detect that it's missed a message.

Note also that if we periodically push and the notification server buffers
messages, the application server needs to periodically delete the old messages,
so the client doesn't get inundated with a bazillion messages upon
re-connecting.

At the moment, I don't even know if delete is in the current protocol; I've
asked twice (in comment 96 and comment 98) and didn't receive a response.

But supposing that message deletion is part of the protocol (which would add
considerable complexity on the server-side, and means we need UUIDs for every
message), this means that the app server would need to store more per-client
state, in the form of a list of messages to delete.

This is much more complicated than the Thialfi solution, which requires none of
these panegyrics on the application side.

> For broadcasting messages, I think it is also workable over this protocol.

You could have a per broadcast-topic sequence number, and then the client could
notice when it misses one message in the sequence.  The problem is, if you're
using re-push to ensure eventual delivery, then you must re-push to /all/
clients just to get the message out to a few clients!

=====

I think the question at hand isn't whether a sufficiently clever application
developer can achieve something which resembles Thialfi's guarantees; you've
convinced me that's probably possible.  The question is whether we should
prefer the current system to Thialfi.  The more I argue about this, the more
I'm convinced that there's no good reason to prefer the current system, except
that it's what we have.

The main advantage of the current system seems to be that you can send messages
and sometimes avoid connecting back to the app server.

This is a small advantage at best.  In order to take advantage of this, apps
will need to be designed to apply these "diffs" from messages and also to sync
with the application server to recover any data they might have missed.  IOW,
they'll need two data update mechanisms, which adds considerable complexity.
(Or, alternatively, they'll need a system of repeated pushing to the application
server, which is quite complex.)  

Moreover, persisting and sending large messages puts additional strain on the
notification server.  One principal of distributed systems is that you should
push as much work onto the endpoints as possible.  That is, the centralized
part of the distributed system should do as little work as possible, because
it's the bottleneck.  Thialfi is designed to allow the notification server to be
extremely lazy; it can even forget messages, if it likes, without affecting the
correctness of the protocol.  In contrast, the current system puts huge
requirements on the notification server: Persistent queueing of arbitrarily-many
arbitrarily-large messages.

If we relax those requirements, then the fact that we can send messages becomes
a lot less useful, because we fall back to the "sync with application server"
path more often.
(In reply to Justin Lebar [:jlebar] (PTO 9/27 - 10/1) from comment #119)
> > But I think this should not be inside the protocol and more of the
> > implementation of the server.
> 
> The number of messages buffered must be a protocol-level contract.
> 
> Otherwise, applications will detect differences in behavior between the
> Telefonica notification server and someone else's notification server.  That
> makes it difficult to write an application which will work for all
> notification
> servers.

I don't think so. If we add a special message to the protocol that says that: "your notifications exceded the limit quota", you simply go back to your application server and perform a full sync (or whatever your app needs). And you don't know if this will happen with 1-10 or 1000 messages.

> 
> >   4. Simplifying the cryptographic requirements on applications.
> > 
> > If our cryptographic measures are a mess or don't have anything good for the
> > user, just we need to get rid of them. We will change to createCypherIV BTW.
> 
> I wasn't speaking specifically about the correctness of your crypto, but
> rather
> about the requirement that applications must sign their messages and (as
> proposed later in that message) also encrypt their messages, with a separate
> public key.  I don't think we can get rid of the encryption requirement, but
> we
> can get rid of the signing requirement.

So then, all notifications will go inside our server. Without checking if they are valid or not, just accepting. So in the case we have a leaked URL (or a broadcast URL), we can be just DDoSed because everyone in the world will send us messages and try to deliver it to the nodes that are listening to that URL.

I think we *must* avoid that. Yes, we can add the encryption, but also the signing. It's something redundant, but we cannot decrypt the data to check if it's valid or not, and we *can* verify the signature.


> 
> I know you guys don't think that the extra crypto requirements on the
> application side are a big deal.  But I'd encourage you to pick your battles
> here carefully, because every point that you contend here decreases the
> chance
> that we'll get anything done anytime soon.

We are not saying that. We really care about the crypto requirements, and we are willing to change them if it's for the benefit of the user.

> 
> The crypto vulnerabilities I found in the UUID-generating code hopefully
> won't
> matter, because I hope we'll just remove all that code.  But they do bring up
> an important point, which is that given the large crypto problems we
> discovered
> from a cursory glance at the notification server code, Mozilla will likely
> want
> to do a full audit of the notification server code to look for other security
> issues.  I'd probably insist on that as part of my review.

We don't have any Nodejs expert inside Telefónica, AFAIK, so I'd love to got the server reviewed, if it's possible.

> 
> >   5. we should add a mechanism for recovering from server failure.
> > 
> > As Thinker said:
> > a) If you use the websocket connection, and there is a server failure, it's probably that you will be disconnected, because we want you to re-register again.
> 
> I think you misunderstand what we're talking about here.
> 
> The issue isn't "we need a mechanism for /notifying/ Gecko when the
> application
> server fails."  It is that we need a mechanism for recovering from server
> failure -- in particular, we need a mechanism for rebuilding the server's
> state, perhaps as part of server registration.

That means that you can re-register every webapp when you do a registerUA. This will add a large overhead, since it's not necessary all the time, but only when our DDBB fails and we cannot recover from a crash.

So, you mean this?

> 
> > a) It needs to poll servers to get the content. Yes or yes. In our system it's up to the application developers.
> 
> As I explained in comment 88, having Gecko poll is much better than having
> the
> applications poll.
> 
> I don't think arguing this point further will be productive; I'd like us to
> stop mentioning Thialfi's polling as an explicit downside, because it's an
> explicit upside compared to the alternative.
> 
> > b) It does need the third party server to store the messages and have more bandwidth to spread the messages. Also: what happens if the third party server crashes? They can lose messages, like us.
> 
> As I explained to Thinker in the third section of comment 116, the
> application
> does not need to buffer messages.
> 
> The application does need to keep state that it wants the client to be aware
> of.  This is a requirement with or without any form of push notifications.
> Thialfi adds no new requirements in this respect, except for the requirement
> that the applicaiton have an online presence (c).
> 
> > c) This system needs that developers MUST have an online presence: a hosting
> > + domain. Because their app installations need some way to download the
> > updated content from the servers.
> 
> Yes, we've been over this before, and I reject the notion that we should be
> designing a protocol for apps which don't have an online presence.  This is a
> /web/ API.
> 
> Moreover, since the current proposed system is a best-effort delivery
> service,
> the client still needs a way to poll the application server to retrieve any
> data it might have missed!  The application server requires an online
> presence
> to meet this use-case.
> 
> I also don't think arguing this point further will be productive.
> 
> > I think Thialfi is a good thing for spread information and syncronize clients
> > eventually, like it's said it's done with Contacts and Chrome, but in our
> > case, I don't think it fixes the problems that we have (polling and more open
> > connections to get the "diffs" between versions).
> 
> I don't understand how our case is any different than contacts and FF sync.
> Surely B2G's contacts and FF sync could be built upon our push notifications
> service.

Agree.

> 
> You're right that Thialfi requires apps to connect to the app servers to sync
> their data.  These will be very short-lived connections, so should not add a
> large connection overhead to the carrier's network.
> 
> But moreover, we do not want our notification service to be the one channel
> through which all application data is sent.  Doing so would require that the
> notification service allow arbitrarily-large messages and buffer
> arbitrarily-many of them.  This is not a realistic requirement, as prior art
> demonstrates.
> 
> Again, the advantage of the Thialfi protocol is that the operation "sync with
> the application server" is the same as the operation "you got a message
> indicating something changed."  That means implementation by app developers
> is
> greatly simplified.

Really? I think it is a little bit harder:

1) In our system, you listen for a message in your app, and then you have the message. And do whatever you want.

2) In Thialfi: you listen for a message, you have to parse to see what is the version, and then poll the third party server to download the "diff" between the two states: your state and server state. And then, you need to parse again the diff. And do whatever you want.

I don't know how 2) simplifies 1).

> 
> If the only points against Thialfi are ones we've discussed before and
> soundly
> rejected, can we agree now that the v2 protocol will be based on Thialfi? 
> Then
> we need to figure out if there will even be a v1 protocol.

We *can* discuss that after v1 ships. But we are not sure if this will be a good solution. We need to metric the server, the network load and more things to see if something can be improved, or even a protocol change. We cannot say that: "yes, we will change our server to be Thialfi-like in v2", just because we don't know, we don't have enough data to say so.

And again, I don't have anything against Thialfi, I'm just thinking about all the caveats mobile networks have and how we can develop the best server to not flood them (and, again, maybe Thialfi is not *thought* to be a mobile-network-friendly protocol, because the paper says anything about that).

> 
> (In response to comment 118)
> > For apps which don't want to keep messages, it can use the messaging service
> > only for signaling.  The app server send only version number or a time-stamp
> > of a logical clock in the message with a fixed message ID (with periodically
> > pushing).  Then, it is almost a signaling service like what Thialfi provided,
> > but without periodically polling at device side.  Once an application have
> > received the message, it connect back to the app server to sync states.
> 
> Yes, that's almost like Thialfi, except without the guaranteed eventual
> delivery, which of course makes it much less useful.  :)
> 
> Note that, unless the app server deletes old messages (which requires it to
> keep track of which unreceived messages are old, which requires the app to
> explicitly ack to the application server every message received)
> 
> > An application connects back to app server only for a message losing being detected.
> >
> > So, an app server does not need to remember client's state if it don't want
> > to remember.  It is not necessary.
> 
> In order for the client to detect that it's lost a message, the application
> server must keep per-client state in the form of a message sequence number,
> which is what allows the client to detect that it's missed a message.
> 
> Note also that if we periodically push and the notification server buffers
> messages, the application server needs to periodically delete the old
> messages,
> so the client doesn't get inundated with a bazillion messages upon
> re-connecting.
> 
> At the moment, I don't even know if delete is in the current protocol; I've
> asked twice (in comment 96 and comment 98) and didn't receive a response.

No, it's not in the protocol.

> 
> But supposing that message deletion is part of the protocol (which would add
> considerable complexity on the server-side, and means we need UUIDs for every
> message), this means that the app server would need to store more per-client
> state, in the form of a list of messages to delete.
> 
> This is much more complicated than the Thialfi solution, which requires none
> of
> these panegyrics on the application side.
> 
> > For broadcasting messages, I think it is also workable over this protocol.
> 
> You could have a per broadcast-topic sequence number, and then the client
> could
> notice when it misses one message in the sequence.  The problem is, if you're
> using re-push to ensure eventual delivery, then you must re-push to /all/
> clients just to get the message out to a few clients!

No, because clients that has ACKed the message won't be notified again, just the ones that hasn't ACKed the message.

> 
> =====
> 
> I think the question at hand isn't whether a sufficiently clever application
> developer can achieve something which resembles Thialfi's guarantees; you've
> convinced me that's probably possible.  The question is whether we should
> prefer the current system to Thialfi.  The more I argue about this, the more
> I'm convinced that there's no good reason to prefer the current system,
> except
> that it's what we have.
> 
> The main advantage of the current system seems to be that you can send
> messages
> and sometimes avoid connecting back to the app server.
> 
> This is a small advantage at best.

This is a big advantage. You are opening just a single connection, not twice. Imagine you are in a crowd soccer stadium, with 100.000 other people. Every single of them receives a notification in the same moment (broadcast, for example)

With our current implementation, the network will open 100.000 connections. And that's it. Maybe some clients won't receive a notification, OK.

With Thialfi, it will open 100.000 connections for the notification *and* also 100.000 petitions for the polling part, which means that 100.000 connections will be open in the remote server. And even with that, we can drop some notifications, but, remember, each 20 minutes, clients will open another bunch of 100k connections, just to be sure they have the latest data!

If you think this is just a *small* advantage…

> In order to take advantage of this, apps
> will need to be designed to apply these "diffs" from messages and also to
> sync
> with the application server to recover any data they might have missed.  IOW,
> they'll need two data update mechanisms, which adds considerable complexity.
> (Or, alternatively, they'll need a system of repeated pushing to the
> application
> server, which is quite complex.)  
> 
> Moreover, persisting and sending large messages puts additional strain on the
> notification server.  One principal of distributed systems is that you should
> push as much work onto the endpoints as possible.  That is, the centralized
> part of the distributed system should do as little work as possible, because
> it's the bottleneck.  Thialfi is designed to allow the notification server
> to be
> extremely lazy; it can even forget messages, if it likes, without affecting
> the
> correctness of the protocol.  In contrast, the current system puts huge
> requirements on the notification server: Persistent queueing of
> arbitrarily-many
> arbitrarily-large messages.
> 
> If we relax those requirements, then the fact that we can send messages
> becomes
> a lot less useful, because we fall back to the "sync with application server"
> path more often.


And I'm thinking about this: why do not mix the idea of Thialfi and our current implementation?

We can have the message payload *and* a version number of the notification. App developers could have both things: If they want to use just Thialfi (version number), they could fill the attribute: "versionNumber" in the JSON POSTed to our server.
On the other hand, if they prefer to deploy the notification without using version numbers, just use the message payload.

Or even both! They do not need to poll the server about the notification payload, but they can have a track of their current version, and, if they want, check with the application server if the latest notification they have (with the body matched) is in sync.
Here's Moxie Marlinspike laying out with examples why it's so important to encrypt-then-mac:

http://www.thoughtcrime.org/blog/the-cryptographic-doom-principle/
Priority: -- → P1
Whiteboard: [soft][WebAPI:P3] → [soft]
Attachment #658441 - Flags: review?(justin.lebar+bug)
Attachment #658442 - Flags: review?(justin.lebar+bug)
Attachment #660707 - Flags: review?(justin.lebar+bug)
Attachment #660708 - Flags: review?(justin.lebar+bug)
Work is ongoing on this but it won't land in v1 so I'm removing blocking-basecamp.
blocking-basecamp: + → -
Comment on attachment 660707 [details] [diff] [review]
Part 3: push-API, v2

Removing request here until we know what we're going to do.
Attachment #660707 - Flags: review?(jonas)
Removing sec-review request as per previous comment.
Flags: sec-review?(ptheriault)
Depends on: 839757
Competitive parity on major content use case.
Whiteboard: [soft] → [soft][tech-p1]
filed bug 849364 to allow turning on websocket keepalive here.
Depends on: 849364
WONTFIX.

See 822712 for how we are going to address use cases here.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
No longer blocks: 834033
No longer depends on: 849364
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: