1404332 - [Shield] Pioneer Enrollment - failure code: 0x80004005 on show pop-up notification

Reporter

Description

•

7 years ago

[Prerequisites:]
Ensure the following preferences are set :
        extensions.pioneer-enrollment-study.updateTimerInterval = 1
        extensions.pioneer-enrollment-study.firstPromptDelay = 1
        extensions.pioneer-enrollment-study.secondPromptDelay = 180000
        extensions.pioneer-enrollment-study.studyEndDelay = 180000
        extensions.pioneer-enrollment-study.studyEnrolledEndDelay = 180000
        extensions.pioneer-enrollment-study.enrollmentState = ‘’ (empty string)

[Steps:]
1. Open FF and set the prerequisites.
2. Open browser console.
3. Install add-on https://net-mozaws-prod-addons-signxpi-output.s3.amazonaws.com/pioneer-enrollment-study-2.0.1.xpi.


[Actual Result:]
When the pop-up notification is displayed, the browser console states:

[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIURI.hostPort]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: resource://gre/modules/PopupNotifications.jsm :: PopupNotifications_refreshPanel/< :: line 816"  data: no]

[Note1:] There is no visible failure in the expected workflow.
[Note2:] There are two additional errors in the console, but not sure if there are also pioneer related:
       a. Object { error: "connectionClosed", message: "'getTab' active request packet to '…" }
        b. NotFoundError: Node was not found  notification.xml:219 in view-source:chrome://global/content/bindings/notification.xml

Osmose [:osmose, :mkelly]

Assignee

Comment 1

•

7 years ago

After talking with Gijs, the PopupNotifications error does not appear to be related to Pioneer (PopunderNotifications has to do with doorhanger notifications, which we don't use at all). I also cannot replicate on MacOS or Windows 10.

The getTab error also looks unrelated, and I cannot replicate. However, the "Node was not found" error does look like it's our fault, and I've filed https://github.com/mozilla/pioneer-enrollment-study/pull/1 to deal with it.

Osmose [:osmose, :mkelly]

Assignee

Updated

•

7 years ago

Assignee: nobody → mkelly

[github robot]

Comment 2

•

7 years ago

Commits pushed to master at https://github.com/mozilla/pioneer-enrollment-study

https://github.com/mozilla/pioneer-enrollment-study/commit/6472143c35fc5d515357f084b7c22c43d46ba984
Bug 1404332: Do not throw if failing to clean up notifications.

https://github.com/mozilla/pioneer-enrollment-study/commit/09a3934341dae127c166cf3024d476524039c5cc
Merge pull request #1 from Osmose/node-not-found

Bug 1404332: Do not throw if failing to clean up notifications.

Osmose [:osmose, :mkelly]

Assignee

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Adrian Florinescu [:aflorinescu]

Reporter

Comment 3

•

7 years ago

Comment 2 fix resolves the node issue. I cannot spot the node error anymore. Mike, any idea what the other two console errors might be? (resolve this and log separate issues for those?)

Also related to browser console, I'm not sure if this is actually related to the Pioneer study, but I do get a bunch of telemetry errors - I wonder if these are expected: https://pastebin.mozilla.org/9069025

Status: RESOLVED → VERIFIED

Flags: needinfo?(mkelly)

Osmose [:osmose, :mkelly]

Assignee

Comment 4

•

7 years ago

(In reply to Adrian Florinescu [:AdrianSV] from comment #3)
> Comment 2 fix resolves the node issue. I cannot spot the node error anymore.
> Mike, any idea what the other two console errors might be? (resolve this and
> log separate issues for those?)

Yeah, I think logging them under Firefox::General with STR is our best bet. The popup notification one in particular could probably use a look from someone more familiar with that code.
 
> Also related to browser console, I'm not sure if this is actually related to
> the Pioneer study, but I do get a bunch of telemetry errors - I wonder if
> these are expected: https://pastebin.mozilla.org/9069025

gfritzsche: chutten: We're getting the errors linked above when sending the persisted custom ping data from a Shield study, but I can't trace them to anything in particular that we're doing. What do you think about that error log?

Flags: needinfo?(mkelly) → needinfo?(chutten)

Chris H-C :chutten

Comment 5

•

7 years ago

`TelemetryStopwatch: key "TELEMETRY_PENDING_LOAD_MS" was already initialized` means that TelemetryStopwatch.start() was called twice without an interleaving TelemetryStopwatch.stop or .cancel for the name "TELEMETRY_PENDING_LOAD_MS

This isn't a good sign as this is a TELEMETRY_* probe which measures internal work. If the internals are misbehaving, something is likely wrong.

`TelemetrySend::sendPersistedPings - failed to send ping b684490f-806e-4b7f-a249-003aac449b99: SyntaxError: The URI is malformed.` as you can imagine this isn't a particularly nice error to receive either.

This means the URI we're trying to send pings to is malformed.

When is this happening? During tests?

Flags: needinfo?(chutten)

Adrian Florinescu [:aflorinescu]

Reporter

Comment 6

•

7 years ago

(In reply to Chris H-C :chutten from comment #5)
> `TelemetryStopwatch: key "TELEMETRY_PENDING_LOAD_MS" was already
> initialized` means that TelemetryStopwatch.start() was called twice without
> an interleaving TelemetryStopwatch.stop or .cancel for the name
> "TELEMETRY_PENDING_LOAD_MS
> 
> This isn't a good sign as this is a TELEMETRY_* probe which measures
> internal work. If the internals are misbehaving, something is likely wrong.
> 
> `TelemetrySend::sendPersistedPings - failed to send ping
> b684490f-806e-4b7f-a249-003aac449b99: SyntaxError: The URI is malformed.` as
> you can imagine this isn't a particularly nice error to receive either.
> 
> This means the URI we're trying to send pings to is malformed.
> 
> When is this happening? During tests?

Yes, but as far as I can imagine, the tests are as relevant as the real thing, since we are using production environment. (production normandy with Shield opt-out study which delivers the add-on from  https://bugzilla.mozilla.org/show_bug.cgi?id=1382761#c10).

I could try to see if I get the same telemetry result having the add-on installed manually (I don't remember seeing these errors during the first tests w/o the Shield Study).

Flags: needinfo?(adrian.florinescu)

Osmose [:osmose, :mkelly]

Assignee

Comment 7

•

7 years ago

(In reply to Chris H-C :chutten from comment #5)
> When is this happening? During tests?

I'm able to replicate this by following the steps from bug 1382761 comment 11 on a brand new profile in Firefox 56 on release. Those steps install a Shield study add-on, and a few minutes after the study is installed and shows its prompts, the errors show up in the console, presumably when Telemetry gets around to sending the custom pings that the study generates.

The add-on sends custom telemetry pings here: https://github.com/mozilla/pioneer-enrollment-study/blob/master/extension/StudyUtils.jsm#L901

Are there any particular usage patterns of submitExternalPing that might cause this? I can provide more context about how we send these pings if there's specific things to look out for.

Chris H-C :chutten

Comment 8

•

7 years ago

tests have sometimes been known to not clean up after themselves, which is why I've gotten used to ignoring TelemetryStopwatch warnings in tests.

To debug this we'd benefit from trace logging (Services.prefs.setCharPref("toolkit.telemetry.log.level", "Trace");)

That ought to give us the information we'd need to get this sorted. I'm especially interested in the server of the _doPing call. The paths ought to be straightforward, so it's only the server portion that remains unknown.

Osmose [:osmose, :mkelly]

Assignee

Comment 9

•

7 years ago

Here's a gist of my logs with tracing enabled: https://gist.github.com/Osmose/20f955fdbb26cdd12641db4ab5b9e9c6

Note that I'm only able to replicate the stopwatch errors, and not the invalid URL errors. I'll leave it to Adrian to provide a trace log for that.

Adrian Florinescu [:aflorinescu]

Reporter

Comment 10

•

7 years ago

I can see the malformed URI lines for Ubuntu and Mac alike. Adding the browser console with Trace enabled:
https://pastebin.mozilla.org/9069143

The steps to reproduce are:
1. Open FF with a new profile.
2. Set extensions.shield-recipe-client.user_id pref to the string "pioneer-quick-timer-2".
3. Set toolkit.telemetry.log.level - Trace
4. Set "extensions.shield-recipe-client.dev_mode" pref to true
5. Restart FF.
6. Open Browser console.


The Pioneer notification is shown but not interacted with. Might this be the part we are missing?

Flags: needinfo?(adrian.florinescu)

Chris H-C :chutten

Comment 11

•

7 years ago

Well, the "URL is malformed" part is easy to explain:

1507013861888   Toolkit.Telemetry  TRACE     TelemetrySend::_doPing - server: undefined, persisted: true, id: 89e1cc3f-a761-4343-bf90-ad37ac87cda7

We're trying to send a ping before TelemetrySend::setup (which is when the server host is grabbed from the prefs).

Now, this should just be a temporary failure. We'll store that ping and resend in a scheduled interval. In fact, it should (and is) caught much earlier than _doPing:

1507013857014   Toolkit.Telemetry  TRACE     TelemetrySend::submitPing - can't send ping now, persisting to disk - canSendNow: false

However, accepting the data submission policy seems to kick off triggerSendingPings (via _userNotified) which doesn't bother to check if it can actually send before trying.

My guess is that this triggers an unlikely error flow that breaks our net of async functions allowing crazy things like re-entrant loadPendingPings:

1507013861855   Toolkit.Telemetry  TRACE     TelemetryStorage::loadPendingPing - id: 89e1cc3f-a761-4343-bf90-ad37ac87cda7
1507013861856   Toolkit.Telemetry  TRACE     TelemetryStorage::loadPendingPing - id: 2d8c1919-b9bb-408e-a058-8c0513108225

Which in turn causes the TelemetryStopwatch traces.

tl;dr - It seems as though Telemetry doesn't handle (and doesn't test) the user accepting the data submission policy before TelemetrySend::setup, which causes a lot of noise in the logs. I'm not sure if this eventually results in the error as originally reported, so I'll file a bug specific to the Telemetry parts of this.

Chris H-C :chutten

Updated

•

7 years ago

Depends on: 1405299

Bugzilla

Quick Search

[Shield] Pioneer Enrollment - failure code: 0x80004005 on show pop-up notification

Categories

(Shield :: Shield Study, defect)

Tracking

(Not tracked)

People

(Reporter: aflorinescu, Assigned: osmose)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated