Closed Bug 1247089 Opened 4 years ago Closed 4 years ago

Log decryption errors from Web Push

Categories

(Core :: DOM: Push Notifications, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox47 --- affected
firefox48 --- fixed

People

(Reporter: canuckistani, Assigned: Lina)

Details

Attachments

(2 files, 3 obsolete files)

Push notifications are encrypted and it may be common for developers to accidentally introduce bugs that cause messages to fail decryption. We should log these errors to the console with an informative message and a link to help docs.
Jen, not sure if the console team does triage soooo obnoxiously needinfo'ing you to get this bug looked at.
Flags: needinfo?(jfong)
(In reply to Jeff Griffiths (:canuckistani) (:⚡︎) from comment #1)
> Jen, not sure if the console team does triage soooo obnoxiously needinfo'ing
> you to get this bug looked at.

Added to https://public.etherpad-mozilla.org/p/console+debugger and we'll discuss next week in that meeting. Thanks!
Flags: needinfo?(jfong)
(In reply to Jeff Griffiths (:canuckistani) (:⚡︎) from comment #0)
> Push notifications are encrypted and it may be common for developers to
> accidentally introduce bugs that cause messages to fail decryption. We
> should log these errors to the console with an informative message and a
> link to help docs.

Is there some kind of a notification when this happens?  I'm not sure if this is a web-exposed thing or if we need platform APIs to detect this kind of error.

Jeff, do you have a STR for a notification that has a decryption failure?
Flags: needinfo?(jgriffiths)
(In reply to Brian Grinstead [:bgrins] from comment #3)
...
> Is there some kind of a notification when this happens?  I'm not sure if
> this is a web-exposed thing or if we need platform APIs to detect this kind
> of error.

In last week's push dashboard meeting we considered exposing this sort of error to the developer dashboard, and it occurred to me that if the platform supports this we should also provide help in the console. I don't know if we do anything currently, needinfo'ing Kit. 

> Jeff, do you have a STR for a notification that has a decryption failure?

Not currently but that sounds like a good idea for a demo.
Flags: needinfo?(jgriffiths) → needinfo?(kcambridge)
We don't. :-/ The Push code can log a warning to the browser console, if `dom.push.loglevel` is set. Unfortunately, it's not really helpful for figuring out which message failed decryption.

We can add a chrome-only hook for this. Maybe a callback or `nsIObserverService` notification that's triggered with the worker scope and message ID?
Flags: needinfo?(kcambridge)
It's also possible to report error messages to the a pages console's directly using ReportToConsole, and I think there's a helper for service workers to report to all controlled pages here: https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerEvents.cpp#835 / https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerManager.cpp#2620
(In reply to Brian Grinstead [:bgrins] from comment #6)
> It's also possible to report error messages to the a pages console's
> directly using ReportToConsole, and I think there's a helper for service
> workers to report to all controlled pages here:

Ooh, nice! Looks like that would do the trick. Do we need to consider localizing the error?
(In reply to Jeff Griffiths (:canuckistani) (:⚡︎) from comment #4)
> (In reply to Brian Grinstead [:bgrins] from comment #3)
> > Jeff, do you have a STR for a notification that has a decryption failure?
> 
> Not currently but that sounds like a good idea for a demo.

For now, probably the easiest way is to send a message with valid crypto headers, and a bogus payload:

    registration.pushManager.subscribe().then(subscription =>
      fetch(subscription.endpoint, {
        method: 'POST',
        headers: {
          'content-encoding': 'aesgcm128',
          'encryption-key': `keyid=p256dh;dh=${btoa(String.fromCharCode(...new Uint8Array(subscription.getKey('p256dh')))).replace(/\+/g, '-').replace(/\//g, '_').replace(/\=/g, '')}`,
          encryption: `keyid=p256dh;salt=${btoa(String.fromCharCode(...crypto.getRandomValues(new Uint8Array(16)))).replace(/\+/g, '-').replace(/\//g, '_').replace(/\=/g, '')}`,
        },
        body: new Uint8Array([0]),
      })
    );

Invalid padding or truncation would also cause a decryption failure, but I don't have examples offhand.
(In reply to Kit Cambridge [:kitcambridge] from comment #7)
> (In reply to Brian Grinstead [:bgrins] from comment #6)
> > It's also possible to report error messages to the a pages console's
> > directly using ReportToConsole, and I think there's a helper for service
> > workers to report to all controlled pages here:
> 
> Ooh, nice! Looks like that would do the trick. Do we need to consider
> localizing the error?

Possibly - I don't remember exactly why the current messages from service workers aren't but maybe there is a technical reason for it.  Ben, do you know about that?
Flags: needinfo?(bkelly)
Comment on attachment 8718944 [details]
MozReview Request: Bug 1247089 - Log Web Push decryption errors. r?bkelly

Here's a quick sketch, hastily assembled on the train. Some thoughts:

* Localization would be lovely. Bug 1222720 looks relevant.
* We can remove the IPDL acrobatics once bug 1182117 lands.
* I'm not sure what the correct line and column numbers are, or even if they make sense in this case. The closest I can think of is the line that adds the `onpush` listener.
* Is there a better way to find and pass the service worker script URL, instead of overloading `ReportToAllClients`?

Does this seem like it's on the right track?
Attachment #8718944 - Flags: feedback?(bkelly)
Attachment #8718944 - Flags: feedback?(bgrinstead)
https://reviewboard.mozilla.org/r/34809/#review31427

::: dom/push/PushService.jsm:862
(Diff revision 1)
> +          "Message ID: " + messageID + " Original error: " + error,

In most cases, the original error will be something vague, like "The operation failed for an operation-specific reason."

Also, we should probably link to an MDN page instead, and add :jrconlin's encryption test page to that.
https://reviewboard.mozilla.org/r/34809/#review31429

I do think it would be nice to localize these.  Basically you just have to make a version of ReportToAllClients() that passes arguments through to nsContentUtils::ReportToConsole():

  https://dxr.mozilla.org/mozilla-central/source/dom/base/nsContentUtils.h#883
  
And then add your localized messages to dom.properties.

I didn't previously localize the messages mentioned in bug 1222720 because I needed to uplift them at the time.

::: dom/push/PushNotifier.cpp:208
(Diff revision 1)
> +      swm->ReportToAllClients(aPrincipal, scope, message, EmptyString(), 0, 0,

What thread are you on here?  I believe ReportToAllClients() only works on the main thread.

::: dom/workers/ServiceWorkerManager.cpp:2650
(Diff revision 1)
> +  ReportToAllClients(aScope, aMessage, scriptSpec, aLine, aLineNumber,

It would be better just to get a handle to the ServiceWorkerInfo associated with the push and get the script spec.  Then you can just call the existing ReportToAllClients().
Flags: needinfo?(bkelly)
Attachment #8718944 - Flags: feedback?(bkelly) → feedback-
Or pass empty string for the file location.  You can put the service worker scope in the message itself.
https://reviewboard.mozilla.org/r/34809/#review31429

Sounds good.

> What thread are you on here?  I believe ReportToAllClients() only works on the main thread.

This is on the main thread. I'll add an `AssertIsOnMainThread()` for clarity.

> It would be better just to get a handle to the ServiceWorkerInfo associated with the push and get the script spec.  Then you can just call the existing ReportToAllClients().

Sorry, do you mean using `GetActiveWorkerInfoForScope`, like `Send{Push, PushSubscriptionChange}Event`?
(In reply to Kit Cambridge [:kitcambridge] from comment #15)
> Sorry, do you mean using `GetActiveWorkerInfoForScope`, like `Send{Push,
> PushSubscriptionChange}Event`?

Well, after thinking more I came to the conclusion you probably shouldn't use the service worker script at all.  Its only useful if you can tie it back to a specific action the script took.  If you just want to say "hey, this subscription associated with this service worker did a thing", then just use the scope.  Its the true identity of the service worker.  The script can change over time.
Oh, I see. Thanks for clarifying! I thought I needed the script URL so that the console could link to the worker file. Without a line or column number, though, maybe it doesn't matter.

I'll wait for feedback from :ednapiranha and :bgrins about whether this approach is sufficient before requesting a proper review.
Right, you won't get a link to the service worker script, but I think thats ok.  If the script did not directly cause the error, then I don't think it should link to the script.  Referencing the scope should be enough.

Maybe in the future we could somehow link to the ServiceWorkerRegistration object such that you get a clickable link that takes you to about:debugging.  That seems far off, though.
https://reviewboard.mozilla.org/r/34809/#review31663

::: dom/push/PushService.jsm:862
(Diff revision 1)
> +          "Message ID: " + messageID + " Original error: " + error,

Agreed about linking to an MDN page, and also we should put this message in a .properties file so it can be localized.
Comment on attachment 8718944 [details]
MozReview Request: Bug 1247089 - Log Web Push decryption errors. r?bkelly

I don't know enough about the code to review it, but I think it's OK to not link directly to the script as in Comment 18.

Also, is it possible to put together some kind of test case that triggers this error?  We have worked on some mochitest cases for console/service workers here: https://dxr.mozilla.org/mozilla-central/source/devtools/shared/webconsole/test/test_console_serviceworker.html.  If the push part doesn't require a special test harness or infrastructure then maybe it could be based off of that or added into one of those existing tests.
Attachment #8718944 - Flags: feedback?(bgrinstead)
Assignee: nobody → kcambridge
Status: NEW → ASSIGNED
Component: Developer Tools: Console → DOM: Push Notifications
Product: Firefox → Core
`PushService.jsm` omits the script name and line number when reporting
decryption failures. This causes `ReportToConsoleNonLocalized` to query
the current JS context for location info. With e10s disabled, the
reported error links to `PushService.jsm` instead of `<unknown>`.

Review commit: https://reviewboard.mozilla.org/r/35557/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/35557/
Attachment #8721041 - Flags: review?(bkelly)
Comment on attachment 8718944 [details]
MozReview Request: Bug 1247089 - Log Web Push decryption errors. r?bkelly

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34809/diff/1-2/
Attachment #8718944 - Attachment description: MozReview Request: Bug 1247089 - Log Web Push decryption errors. → MozReview Request: Bug 1247089 - Log Web Push decryption errors. r?bkelly
Attachment #8718944 - Flags: feedback- → review?(bkelly)
https://reviewboard.mozilla.org/r/34807/#review35123

::: dom/base/nsContentUtils.h:849
(Diff revision 2)
> +   * information from the current JS context if |aLineNumber| is omitted.

Why do we need a version that does not automatically use location information if the line number is zero?  I find it really confusing to have two methods with the identical method name, but differing semantics based where an argument is a string vs an nsIURI.

::: dom/base/nsContentUtils.cpp:3492
(Diff revision 2)
> +    aURI->GetSpec(spec);

Please use curly braces on single-line blocks.
Comment on attachment 8718944 [details]
MozReview Request: Bug 1247089 - Log Web Push decryption errors. r?bkelly

Can you find a way to avoid the change of semantics in the nsContentUtils method with the same name?

Also, regular bugzilla patch review is appreciated instead of mozreview.

Thanks.
Attachment #8718944 - Flags: review?(bkelly)
Attachment #8721041 - Flags: review?(bkelly)
(In reply to Ben Kelly [:bkelly] from comment #23)
> Why do we need a version that does not automatically use location
> information if the line number is zero?  I find it really confusing to have
> two methods with the identical method name, but differing semantics based
> where an argument is a string vs an nsIURI.

I can rename this method, but I'm wondering if there's a better way. The problem is, if you specify a line number of 0, `ReportToConsoleNonLocalized` extracts the calling location from the current JS context. So, in non-e10s, this will show the log as coming from PushService.jsm. In e10s, it will correctly show `<unknown>`.

Thanks for the heads-up on MozReview; I'll upload a patch for next time.
I guess you should probably break the nsContentUtils stuff out as a separate patch and then ask Boris or someone similar for review.

You could add an additional argument indicating automatic location detection or a sentinal value like line=0xffffffff meaning ignore the information.
Attached patch 1.patchSplinter Review
Is this what you had in mind, Ben?
Attachment #8721041 - Attachment is obsolete: true
Attachment #8729341 - Flags: review?(bkelly)
Attached patch 2.patch (obsolete) — Splinter Review
Part 2, rebased.
Attachment #8718944 - Attachment is obsolete: true
Attachment #8729343 - Flags: review?(bkelly)
Attachment #8729341 - Flags: review?(bkelly) → review+
Comment on attachment 8729343 [details] [diff] [review]
2.patch

Review of attachment 8729343 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!  Overall looks good.  Just a few comments.  r=me with those comments addressed.

Is it possible to write a test for this?

::: dom/push/PushNotifier.cpp
@@ +87,5 @@
> +PushNotifier::NotifyError(const nsACString& aScope, nsIPrincipal* aPrincipal,
> +                          const nsAString& aMessage, uint32_t aFlags)
> +{
> +  if (XRE_IsContentProcess()) {
> +    return NS_ERROR_NOT_IMPLEMENTED;

Can we assert that this is not called in debug builds at least?

@@ +214,5 @@
> +{
> +  AssertIsOnMainThread();
> +
> +  const nsPromiseFlatCString& scope = PromiseFlatCString(aScope);
> +  const nsPromiseFlatString& message = PromiseFlatString(aMessage);

Why do you need these PromiseFlat*String() calls?  You should be able to just pass aScope and aMessage below.

@@ +234,5 @@
> +
> +  // Parent process, e10s enabled.
> +  nsTArray<ContentParent*> contentActors;
> +  ContentParent::GetAll(contentActors);
> +  for (uint32_t i = 0; i < contentActors.Length(); ++i) {

If contentActors.Length() is zero, do you want to log this to the system console?  Otherwise these messages will go nowhere when there is no content process running.
Attachment #8729343 - Flags: review?(bkelly) → review+
Thanks, Ben! I'll land this shortly. It was straightforward to add a test, thanks to wchen's refactoring.

(In reply to Ben Kelly [:bkelly] from comment #29)
> Can we assert that this is not called in debug builds at least?

Removing the check made it a lot easier to write the test, so I just did that instead.

> Why do you need these PromiseFlat*String() calls?  You should be able to
> just pass aScope and aMessage below.

Because IPDL. :-/ Per your suggestion on IRC, I moved the PromiseFlat*String() calls closer to where they were used.

> If contentActors.Length() is zero, do you want to log this to the system
> console?  Otherwise these messages will go nowhere when there is no content
> process running.

Indeed, I do. Added.
Attached patch 2.patchSplinter Review
Rebased, incorporated review feedback; carrying forward r+.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1b0fc9456251
Attachment #8729343 - Attachment is obsolete: true
Attachment #8735644 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/ca8b229e4f20
https://hg.mozilla.org/mozilla-central/rev/ba673d44b388
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in before you can comment on or make changes to this bug.