Closed Bug 749806 Opened 8 years ago Closed 7 years ago

[Security Review] Notifications Back End

Categories

(mozilla.org :: Security Assurance: Review Request, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jrconlin, Assigned: dchanm+bugzilla)

References

()

Details

(Whiteboard: [pending secreview][start yyyy-mm-dd][target yyyy-mm-dd])

We would like a security review for the Notifications Project back end. This portion of the product deals with collection, temporary storage, and routing of short notification messages from a third party site to a user. A separate security review for the front end service will be scheduled at a later date, once more of that code is finalized. 

Who is/are the point of contact(s) for this review?
 JR Conlin (jrconlin@mozilla.com)
 Jeff Balogh (jbalogh@mozilla.com)

Please provide a short description of the feature / application (e.g. problem solved, use cases, etc.):

 Description:
 Notifications provides a semi-anonymous method for a third party site or 
 service to communicate a short message with a customer without the user
 needing to keep a page open with the site. Messages can be received and 
 processed by any Notifications capable device or browser.

 Sites post to a URL that is then routed to a user and then picked up via
 whatever the user has as their active agent.

 Use Cases:
 At work, a user visits a surf report site that offers surf condition
 notifications. They click a link requesting a notification when surf is 
 perfect. Later, while at home, they receive a notification alert on their 
 browser informing them that Surf's Up. (No identifying information needs to
 be supplied to the site.)

 A user visits a mail site and requests notifications when high priority
 mail arrives. (Site associates notification URL with a user.)

 A user has requested to be notified of pending moves in a Game App.

Please provide links to additional information (e.g. feature page, wiki) if available and not yet included in feature description:

 https://wiki.mozilla.org/Services/Notifications 
 https://wiki.mozilla.org/Services/Notifications/Push
 

Does this request block another bug? If so, please indicate the bug number

 N/A

This review will be scheduled amongst other requested reviews. What is the urgency or needed completion date of this review?

 This product is not scheduled for production until end of Q3.
 Urgency is medium.

To help prioritize this work request, does this project support a goal specifically listed on this quarter's goal list? If so, which goal?

 N/A

Please answer the following few questions: (Note: If you are asked to describe anything, 1-2 sentences shall suffice.)

    Does this feature or code change affect Firefox, Thunderbird or any product or service the Mozilla ships to end users?

      No.

    Are there any portions of the project that interact with 3rd party services?

      Yes. (Third parties provide notification content.)

    Will your application/service collect user data? If so, please describe 

      Unread/Undelivered notifications may persist on our servers for up to three days. We are providing an encryption option for messages. 

If you feel something is missing here or you would like to provide other kind of feedback, feel free to do so here (no limits on size):

  None.

Desired Date of review (if known) and list of invitees:

 Date: When possible
 Invitees: ally@
           dchen@
           jbalogh@
           jrconlin@
           mconnor@
:jrconlin - we will triage the bug this Wed and get a lead assigned to gather background information. Reviews happen on M/W at 13:00 PST and Th/F at 10:00 PST. It per our calendar

https://mail.mozilla.com/home/ckoenig@mozilla.com/Security%20Review.html

If there is an available date that works well for your team and their time zones it helps if we know that.
Whiteboard: [pending secreview] → [pending secreview][triage needed 2012.05.02]
Assignee: nobody → dchan+bugzilla
Status: NEW → ASSIGNED
Whiteboard: [pending secreview][triage needed 2012.05.02] → [pending secreview][start yyyy-mm-dd][target yyyy-mm-dd]
Item to be reiviewed:
    Notifications Server (back-end)
Link to calendar entry: (curtis will add once scheduled)
SecReview Bug: (not the bug itself but the bug requesting secreview of said item per https://wiki.mozilla.org/Security/Reviews/Review_Request_Form)
    https://bugzilla.mozilla.org/show_bug.cgi?id=749806
Security Lead:
Required Reading List:
* https://wiki.mozilla.org/Services/Notifications
Source: https://github.com/jbalogh/push
API docs: http://push.rtfd.org/
(If possible prefill this area for copying to the notes section of the review)
Introduce Feature (5-10 minutes) 
UserAgent - client e.g. browser
Site - website which uses notifications
Channel - server that proxies notifications from Site to UserAgent
- Goal of Feature, what is trying to be achieved (problem solved, use cases, etc)

    Provide  a semi-anonymous method for a site to send a brief message to an  interested user via any registered Agent acting on behalf of the user.

- What solutions/approaches were considered other than the proposed solution?

    There  are several methods that this could be achieved including a permanent  websocket, IM protocol (e.g. XMPP), hidden iframe, etc.

- Why was this solution chosen?

    This method was the easiest for 3rd party sites to implement as well as provided the most control and privacy to the user. 

- Any security threats already considered in the design and why?

     - Spam: remote site could attempt to send spam messages to randomly chosen URLs 

    + URL namespace is 256bit random, making it very large with a low chance of success

     - Site could send malicious or annoying content to user

    + messages are format limited to plain text with separate elements for action url and img

    + User can disable overly chatty or annoying sites easily.

    - Transmit channel could alter or inspect messages:

    + Notifications can be optionally encoded via AES where the UserAgent  generates and shares the per site keypair. Channel unable to decrypt or  decipher message. 

    UserAgent / Site keypair negotiation happens outside of Notification channel

* Threat Brainstorming (30-40 minutes)

    Denial of service

    Site sends a malformed encrypted message to use resource on client.

    This matters more for slower devices

    Attempt to exhaust keyspace

    unlikely due to size of keyspace

    what happens in the event of a collision?

    rate limiting of notifications?

    a leaked URL could result in mass spam to a partitcular user

    Abuse of service

    Are we concerned with malicious parties using notifications as a medium for illegal activity?

    Are there plans for bidirectional notifications?

    related to bipostal for browserid

* Conclusions / Action Items (10-20 minutes)
=============================
////////////////////////////////////////////////////////////
=============================
Notifications let websites send small messages (<1024 bytes) to users without
the user having that website open. Websites ask for push permission when a user
has the website open; the javascript API returns a URL like
https://notifications.mozilla.org/long-string-of-random-characters that is
specific to that user and website. The website backend can the POST messages to
that URL, and our notification server will forward the messages to the user's
Firefox/B2G/etc. User devices try to maintain a persistent connection to the
notification server so that messages are delivered immediately.
* A user may have multiple devices/clients receiving notifications.
* A user may receive notifications from multiple websites/apps.
* Websites send messages to users by POSTing to a URL created by the
  notification server.
* Each user+website pair has a unique notification URL.
* Clients use long-term socket connections (e.g. WebSockets) to receive
  messages.
* All other API interactions use HTTP.
* Clients get a list of socket server addresses through an HTTP call. They
  attempt to connect to each of the addresses and back off if they all fail.
* Socket servers can't be behind Zeus due to licensing restrictions.
http://research.google.com/pubs/archive/37474.pdf describes the architecture of
a similar system built by Google that is used to notify their applications of
data changes.
== Data Stored ==
* Recent Notifications
Notifications are stored in Cassandra, through Queuey, with a three-day TTL.
Each user has a single queue which stores messages from multiple domains.
* Message state
When a client reads a message it sends an update marker to the notification
server, which is stored in the same queue. Other clients (from the same user)
can read this message and treat the message as read. These messages will expire
with the same three-day TTL.
* Registration Data:
To deliver messages, the server maps notification URLs to users:
  {QUEUE_TOKEN: {"user": USER_TOKEN: "domain": DOMAIN}}
The website POSTs a message to a URL like
https://notifications.mozilla.org/QUEUE_TOKEN. The notification server looks up
the user mapping in its data store, writes the message to Cassandra, and
publishes the message to the websocket servers for immediate delivery.
The mapping is currently stored in MySQL.
== Write Traffic ==
* Creating a new push URL. Only happens once per site per user.
* New push notifications from websites. This is the largest traffic source.
  Urban Airship is ramping up their architecture to handle 100,000
  notifications per second.
* Marking messages as read. Occurs (at most) once per device per user.
== Read Traffic ==
* Clients starting up, syncing registered notification URLs.
* Clients starting up, doing initial message state sync.
* Clients starting up, getting list of socket server addresses.
== Push Traffic ==
* Messages coming from websites.
* State updates: mark messages as read, add new notification URL.
== Internal Traffic ==
* Database lookups for queue => user mapping.
* Cassandra writes for new messages.
* Publishing messages from API servers to socket servers.
The current plan is to use zeromq to broadcast messages to the socket servers,
tagged with the user token.  Each server knows which clients it has connected,
so a socket server can pick out and deliver the messages for its connected
users.
Urban Airship was using Kafka for pubsub here, but they were moving to an
architecture with direct RPC calls from API servers to socket servers. Socket
servers tell a central registration server which clients are connected, and the
API servers look up that state to send direct RPC calls.
Here are some potential sharding plans, with a look at scalability,
performance, and fault tolerance.
== No Sharding ==
* Everything in a single data center.
* All data in a single mysql cluster and cassandra cluster.
* Internal pubsub stays in the data center.
PRO:
* easy to develop
* performant as long as it scales
* lets us rely on the power of positive thinking
CON:
* running into the scalability limits of a single data center
* having a meteor hit the data center
== Full Sharding ==
* Multiple data centers.
* Clients are assigned to a cluster like notifications17.mozilla.org when they
  first start up, and stick there forever. Clusters are completely independent.
* The notification server provides push URLs with shard-aware domains like
  notifications17.mozilla.org.
* Internal pubsub stays in the data center.
PRO:
* fault tolerant
* almost infinitely scalable
* performant
* easy to develop
CON:
* sharding is locked in forever since external websites have the shards
  in their push URLs.
* Websites are sending messages to many domains, so they can't optimize HTTP
  connections as well with something like SPDY.
== Limited Sharding ==
* Multiple data centers.
* The notification server provides push URLs pointing to the canonical
  notifications.mozilla.org domain.
* Clients are sticky to a cluster, but their data can be migrated.
* Notifications come into notifications.mozilla.org and may need to be
  propagated across data centers to reach the right user.
PRO:
* fault tolerant, scalable
* websites can optimize connections to notifications.mozilla.org
CON:
* more moving parts, harder to develop.
* cross-data center communication could be slow
== Optimizations ==
* If all clients have read a message, delete it early.
* If there's only one client, don't mark messages as read.
* Randomize reconnect backoff to avoid thundering herd.
== Security/Stability Concerns ==
* Websites DOSing us with valid push notification traffic.
* Spammers DOSing us with invalid push traffic.
* Attackers trying to guess user's queue URLs.
* Attackers connecting as valid users and exhausting socket server resources.
* Attackers filling valid user queues and exhausting resources.
* Mozilla storing data telling what sites you have push notifications for.
* Mozilla storing your push notifications.
* Mozilla storing your IP as your device (re)connects to socket servers.
Blocks: 774497
Duplicate of this bug: 788878
No longer blocks: 774497
marking this resolved-fixed as the review occurred, when all dependent bugs are fixed we can make this verified-fixed
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Please note: It is not known if the B2G Notifications system (see bug 763198) uses this back-end. If not, this should not indicate that the B2G Notifications backend has been successfully reviewed.
You need to log in before you can comment on or make changes to this bug.