Failure to get a Hello call URL at browser startup, never recovers

RESOLVED WORKSFORME

Status

P1
normal
RESOLVED WORKSFORME
4 years ago
4 years ago

People

(Reporter: jesup, Assigned: mikedeboer)

Tracking

unspecified
Points:
5
Bug Flags:
firefox-backlog +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
Sometimes (perhaps related to some server problems over the weekend) starting the browser and opening the Hello panel doesn't show a URL, it just spins forever.

There are console errors (with debugs turned on) about websocket problems.  However, it never appears to attempt to recover/try-again.

Browser console:
Use of getPreventDefault() is deprecated.  Use defaultPrevented instead. jquery.js:2:0
ReferenceError: reference to undefined property json._dataType nsSearchService.js:3384:49
ReferenceError: reference to undefined property param.purpose nsSearchService.js:926:8
ReferenceError: reference to undefined property window.mozInnerScreenX browser.js:16304:4
"delayedInitialize" MozLoopService.jsm:1019
"assigning to deferredRegistrations for sessionType:" 1 MozLoopService.jsm:395
"registerForNotification" "801f754b-686b-43ec-bd83-1419bbf58388" MozLoopService.jsm:334
"registerForNotification" "19d3f799-a8f3-4328-9822-b7cd02765832" MozLoopService.jsm:334
push.services.mozilla.com : server does not support RFC 5746, see CVE-2009-3555 <unknown>
"MozLoopPushHandler::_registerChannel websocket.sendMsg() failure" MozLoopPushHandler.jsm:356
Loop Push server web socket closed (server)! Code: 1000 MozLoopPushHandler.jsm:178:0
Loop Push server web socket closed! Code: 2152136706 MozLoopPushHandler.jsm:164:0
1416154203163	Services.HealthReport.HealthReporter	WARN	No prefs data found.
"assigning to deferredRegistrations for sessionType:" 2 MozLoopService.jsm:395
"registerForNotification" "25389583-921f-4169-a426-a4673658944b" MozLoopService.jsm:334
"registerForNotification" "6add272a-d316-477c-8335-f00f73dfde71" MozLoopService.jsm:334
"promiseRegisteredWithServers: registration already completed or in progress:" 2 MozLoopService.jsm:389
"promiseRegisteredWithServers: registration already completed or in progress:" 2 MozLoopService.jsm:389
I'm pondering if what happens here, is that because the push handler fails to register, it backs off for a retry later on.

However, as registration is still "in progress" any attempts to re-register cause a failure of the registration process, rather than a retry of the registration attempt straight away.

Mike/Paul, any ideas here?
backlog: --- → Fx34?
Flags: needinfo?(pkerr)
Flags: needinfo?(mdeboer)
i.e. it could recover, but only after about 30 mins.
I just happened on this whilst testing some other code. Here's the log:

"assigning to deferredRegistrations for sessionType:" 2 MozLoopService.jsm:395
"registerForNotification" "25389583-921f-4169-a426-a4673658944b" MozLoopService.jsm:334
"registerForNotification" "6add272a-d316-477c-8335-f00f73dfde71" MozLoopService.jsm:334
"delayedInitialize" MozLoopService.jsm:1019
"assigning to deferredRegistrations for sessionType:" 1 MozLoopService.jsm:395
"registerForNotification" "801f754b-686b-43ec-bd83-1419bbf58388" MozLoopService.jsm:334
"registerForNotification" "19d3f799-a8f3-4328-9822-b7cd02765832" MozLoopService.jsm:334
"[Dispatcher] Dispatching action" Object { name: "getAllRooms" } dispatcher.js:71:8
GET http://localhost:5000/v0/push-server-config [HTTP/1.1 304 Not Modified 1ms]
SyntaxError: test for equality (==) mistyped as assignment (=)? VariablesView.jsm:2058:41
SyntaxError: test for equality (==) mistyped as assignment (=)? ViewHelpers.jsm:810:54
GET https://push.services.mozilla.com/ [HTTP/1.1 101 Switching Protocols 937ms]
POST http://ocsp.digicert.com/ [HTTP/1.1 200 OK 50ms]
push.services.mozilla.com : server does not support RFC 5746, see CVE-2009-3555 <unknown>
"delayedInitialize" MozLoopService.jsm:1019
"promiseRegisteredWithServers: registration already completed or in progress:" 1 MozLoopService.jsm:389
POST https://safebrowsing.google.com/safebrowsing/downloads [HTTP/2.0 200 OK 190ms]
POST http://clients1.google.com/ocsp [HTTP/1.1 200 OK 69ms]
"MozLoopPushHandler::_registerChannel websocket.sendMsg() failure" MozLoopPushHandler.jsm:356
Loop Push server web socket closed (server)! Code: 1000 MozLoopPushHandler.jsm:178:0
Loop Push server web socket closed! Code: 2152136706 MozLoopPushHandler.jsm:164:0
1416173175747	Services.HealthReport.HealthReporter	WARN	No prefs data found.
"promiseRegisteredWithServers: registration already completed or in progress:" 2 MozLoopService.jsm:389
GET https://push.services.mozilla.com/ [HTTP/1.1 101 Switching Protocols 1258ms]
"promiseRegisteredWithServers: registration already completed or in progress:" 2 MozLoopService.jsm:389

What's interesting here, is that the initial registration fails. I then try and open the panel a couple more times, and that gives the "registration already completed or in progress" messages.

The retry then kicks in after a minute which gives the GET on https://push.services.mozilla.com/ successful, but this obviously doesn't complete the registration process.

I think I'd expect the initial registration to be returning a failure result, and then either allowing register on panel open, or retrying every so often.

Comment 4

4 years ago
(In reply to Mark Banner (:standard8) from comment #1)
> I'm pondering if what happens here, is that because the push handler fails
> to register, it backs off for a retry later on.
> 
> However, as registration is still "in progress" any attempts to re-register
> cause a failure of the registration process, rather than a retry of the
> registration attempt straight away.
> 
> Mike/Paul, any ideas here?

The retry logic is only started when a hard failure during registration is encountered. That is, an OnStop, OnServerClose, or a timeout when waiting for the Hello response, in addition to a 500 returned during registration. For the first three, the websocket is abandoned and a new one is opened as per the SimplePushServer specification.

What I suggest is that MozPushHandler not respond to the user action to perform a registration when the registration is "In Progress". The user action can either clear the back-off time back to the minimum or set a flag to kick off an immediate retry, but defer when registration is in progress.

The 30 minute ping will only run when a complete registration is in effect.
Flags: needinfo?(pkerr)
(Assignee)

Updated

4 years ago
Flags: needinfo?(mdeboer)
backlog: Fx34? → Fx35?

Updated

4 years ago
Flags: needinfo?(mreavy)
backlog: Fx35? → Fx35+
Flags: needinfo?(mreavy)
Priority: -- → P1
[Tracking Requested - why for this release]:
We need this to improve Hello service reliability
tracking-firefox35: --- → ?
tracking-firefox36: --- → ?

Comment 6

4 years ago
needs owner
Flags: needinfo?(mreavy)
Created attachment 8534413 [details]
Screen Shot 2014-12-09 at 10.41.12 PM.png

Not sure if it is the same thing, but my Loop client in 34 looks like this since last night.
When double checking last night with another browser Nightly took for ever (like 30s or more) to make the "Start a conversation" button clickable.

For me this translates into two issue:
1) How do we notice server side issues, before end users of the client open bug reports?
2) As this has happened now the second time to me I'm pretty sure that the Loop client does not properly recover from certain failures.
(Assignee)

Comment 8

4 years ago
I'll take a look at this, hoping to find the cause.
Assignee: nobody → mdeboer
Status: NEW → ASSIGNED
Iteration: --- → 37.1
Points: --- → 5
Flags: needinfo?(mreavy)
(Assignee)

Updated

4 years ago
Flags: needinfo?(mmucci)
Added to IT 37.2
Iteration: 37.1 → 37.2
Flags: qe-verify?
Flags: needinfo?(mmucci)
Flags: firefox-backlog+
Potential important issue, tracking.
tracking-firefox35: ? → +
tracking-firefox36: ? → +
(Assignee)

Comment 11

4 years ago
After several days of trying to reproduce - on Firefox 35/ 36! - , setting up all kinds of custom servers, I wasn't able to.

The work :pkerr has been doing to improve connection handling in various scenarios seems to be paying of here. Please re-open this bug if you can reproduce on Firefox 35/ 36, preferably with STR. Thanks!
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WORKSFORME
tracking-firefox35: + → ---
tracking-firefox36: + → ---
Flags: qe-verify?
You need to log in before you can comment on or make changes to this bug.