Closed Bug 1043994 Opened 11 years ago Closed 11 years ago

TPS tests are failing to sign in because of "Error: signIn() failed with: null"

Categories

(Testing Graveyard :: TPS, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cosmin-malutan, Assigned: whimboo)

References

Details

Attachments

(2 files)

Attached file exception.txt
This fails only via jenkins so far so my first guess is that the failure is caused by an issue with the account that has been created via fxa-python-client package. >03:49:50 CROSSWEAVE INFO: starting action: Login >03:49:50 CROSSWEAVE INFO: Setting client credentials and login. >03:49:50 CROSSWEAVE INFO: Login user: coversheet-499adc4720bc@restmail.net >03:52:50 1406199169574 FirefoxAccounts ERROR error POSTing /account/login?keys=true: {"error":{},"message":null,"code":null,"errno":null} >03:52:50 CROSSWEAVE ERROR: [phase1] Exception caught: Error: signIn() failed with: null (resource://tps/auth/fxaccounts.jsm:93:12) JS Stack trace: signIn@fxaccounts.jsm:93:13 < Login@tps.jsm:862:5 < TPS.RunNextTestAction@tps.jsm:590:1 >03:52:50 CROSSWEAVE INFO: ----------event observed: quit-application-requested >03:52:50 CROSSWEAVE INFO: Wiping data from server. >03:52:50 CROSSWEAVE INFO: Setting client credentials and login. >03:52:50 CROSSWEAVE INFO: Login user: coversheet-499adc4720bc@restmail.net >03:53:49 >03:53:49 TEST-UNEXPECTED-FAIL | test_prefs.js | [phase1] Exception caught: Error: signIn() failed with: null (resource://tps/auth/fxaccounts.jsm:93:12) JS Stack trace: signIn@fxaccounts.jsm:93:13 < Login@tps.jsm:862:5 < TPS.RunNextTestAction@tps.jsm:590:1 Attached is the full log.
Attached file exception-debug.txt
I couldn't reproduce the issue locally, but luckily I did in a retrigger with --debug argument, so here is an more detailed log.
http://mxr.mozilla.org/mozilla-central/source/services/sync/tps/extensions/tps/resource/auth/fxaccounts.jsm#93: [phase1] Exception caught: Error: signIn() failed with: null (resource://tps/auth/fxaccounts.jsm:93:12) JS Stack trace: signIn@fxaccounts.jsm:93:13 < Login@tps.jsm:862:5 < TPS.RunNextTestAction@tps.jsm:590:1 So it looks like we get an error for the login REST call: http://mxr.mozilla.org/mozilla-central/source/services/common/rest.js#444 FirefoxAccounts ERROR error POSTing /account/login?keys=true: {"error":{},"message":null,"code":null,"errno":null} 04:33:59 1406547238359 Sync.RESTResponse DEBUG Caught exception fetching HTTP status code:[Exception... "Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIHttpChannel.responseStatus]" nsresult: "0x80040111 (NS_ERROR_NOT_AVAILABLE)" location: "JS frame :: resource://services-common/rest.js :: RESTResponse.prototype.status :: line 620" data: no] Stack trace: RESTResponse.prototype.status()@resource://services-common/rest.js:620 < _onComplete()@resource://services-common/hawkclient.js:193 < onComplete()@resource://services-common/hawkclient.js:250 < onStopRequest()@resource://services-common/rest.js:444 < waitForSyncCallback()@resource://services-common/async.js:102 < makeSpinningCallback/callback.wait()@resource://services-common/async.js:145 < signIn()@resource://tps/auth/fxaccounts.jsm:82 < Login()@resource://tps/tps.jsm:862 < TPS.RunNextTestAction()@resource://tps/tps.jsm:590 < <file:unknown> Richard do you have any idea? We don't have this problem with a Firefox Account created manually, but always see this when using the fxa-python-client. Whereby for the latter the account has indeed be verified!
Flags: needinfo?(rnewman)
OS: Linux → All
Hardware: x86_64 → All
Summary: TPS failure: test_sync.js | [phase1] Exception caught: Error: signIn() failed with: null → TPS tests are failing to sign in because of "Error: signIn() failed with: null"
Component: Firefox Sync: Backend → TPS
Product: Mozilla Services → Testing
Whiteboard: [lang=js]
One thing to mention is that the new CI is running on a puppetized host, which is controlled via PuppetAgain. So while following the console output of an active job (e.g. http://tps-ci-production.qa.scl3.mozilla.com:8080/job/mozilla-aurora_fx-account/174/console) I can see that the test is stalled during the login call. That means we most likely don't get any response back from the auth server. I'm not sure yet if that even might be a proxy issue or of its local settings. I will do some experiments in parallel.
Ok, so I logged into the machine via VNC. While Firefox is open and running one of tests, I tried to load google.com and I also get a hang here. So this is indeed a problem with the system proxy settings, which are managed via PuppetAgain. I think fixing that will make the test work.
Flags: needinfo?(rnewman)
So on July 16th tests were actually working for release builds of Firefox 31.0: http://tps-ci-production.qa.scl3.mozilla.com:8080/job/release-mozilla-release_fx-account/2/console I retriggered such one now, and it fails the same way as listed above: http://tps-ci-production.qa.scl3.mozilla.com:8080/job/release-mozilla-release_fx-account/4/console $ uptime 14:58:46 up 19 days, 8:30, 2 users, load average: 1.46, 1.07, 0.71 That means someone has restarted the box after July 16th, and maybe because of puppet a change has been applied, which broke the network connections through the proxy.
So this box got changes on July 9th and July 18th: Jul 9 02:05:09 puppetmaster1 puppet-master[10243]: Compiled catalog for tps-ub-1404-64-2.qa.scl3.mozilla.com in environment hskupin in 2.32 seconds Jul 18 06:28:26 puppetmaster1 puppet-master[19289]: Compiled catalog for tps-ub-1404-64-2.qa.scl3.mozilla.com in environment production in 1.15 seconds Actually nothing new landed during those days: https://hg.mozilla.org/qa/puppet/pushloghtml?startdate=2014-07-07&enddate=2014-07-22 The changes from July 8th have been picked up on July 9th, but I wonder what happened on July 18th.
A restart of the box actually didn't help. But when I start Firefox manually through the terminal network connections are working well. So something might be broken here in combination with Jenkins and Java.
I think I found the problem: http://tps-ci-production.qa.scl3.mozilla.com:8080/job/release-mozilla-release_fx-account/4/injectedEnvVars/ Please scroll down to the proxy env variables as passed through by Jenkins. All except for HTTPS are lowercase. Firefox needs uppercase ones. So in case of our current systems we never set the uppercase env variables to work around this jenkins bug. But now we might not be able to do so, given that releng needs those. So what we may have to do is to unset those before starting the Jenkins client via the terminal.
Depends on: 1050268
It's indeed a proxy bug and I might have unset one of the proxy env variables at some point for testing. With the reboot of the machine all env variables are back at their original value. See the newly filed bug for details.
Depends on: 1050313
Another problem I'm facing here is that restmail.net is down. So the verification of the dynamically created fx account can not be done, and we bail out early.
Tests depend on external resources? :(
That's why those do not run on buildbot, exactly! So with the fix of restmail.net all is working fine again. Closing as fixed.
Assignee: nobody → hskupin
Status: NEW → RESOLVED
Closed: 11 years ago
No longer depends on: 1050268
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: