Closed Bug 1275125 Opened 4 years ago Closed 4 years ago

TPS failing: test_client_wipe.js: ASSERTION FAILED! password not found

Categories

(Testing Graveyard :: TPS, defect)

Version 3
defect
Not set

Tracking

(firefox50 fixed)

VERIFIED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: markh, Assigned: tcsc)

Details

(Whiteboard: [data-integrity])

Attachments

(1 file)

TPS running on jenkins is reliably failing with:

TEST-UNEXPECTED-FAIL | test_client_wipe.js | [phase 3] RunNextTestAction failed - Error: ASSERTION FAILED! password not found (resource://tps/logger.jsm:77:11) JS Stack trace: Logger.AssertTrue@logger.jsm:77:11 < TPS.HandlePasswords@tps.jsm:436:13 < Passwords__verify@tps.jsm:1021:5 < TPS.RunNextTestAction@tps.jsm:608:7 < TPS.RunNextTestAction@tps.jsm:628:5 < TPS__FinishAsyncOperation/<@tps.jsm:274:9

We should diagnose and fix that.
Flags: firefox-backlog+
QA Contact: kthiessen
Assignee: nobody → tchiovoloni
Can't replicate (at all) on mac or linux, unfortunately. Any ideas how the jenkins build machine might be set up differently?
Flags: needinfo?(markh)
I should have added a link to a failure log :) but it doesn't really help :( (I'll send the details in a chat - I'm not sure they should be public)

Just before that failure every engine is failing with:

> 1464032913946	Sync.ErrorHandler	DEBUG	prefs failed: Error (resource://services-sync/resource.js:405:19) JS Stack trace: waitForSyncCallback@async.js:98:7 ...

But what I find odd is the lack of a good error there - it's just "Error". If I simulate a network error locally by disconnecting my network, I get a more informative:

> 1464749900344   Sync.ErrorHandler       DEBUG   tabs failed: Error: NS_ERROR_UNKNOWN_HOST (resource://services-sync/resource.js:406:19) ...

This error is thrown by onStopRequest in the |if (!statusSuccess) {| block, and I can't see when that wouldn't give an error string (even though it's convoluted)

Note I've seem similar messages in user submitted errors which is frustrating - it's impossible to know what the error was. Note also that this is *not* a non-200 case - this is the "request failed" case.

Additional logging might help solve this, but I'm not sure how to go about that for TPS.

Karl/Richard, is there some way we can arrange for "experimental" changes to be run on TPS? Eg, if Thom pushed a build to try with a Linux host, would be be possible to get some runs from that build?
Flags: needinfo?(rpappalardo)
Flags: needinfo?(markh)
Flags: needinfo?(kthiessen)
(In reply to Mark Hammond [:markh] from comment #2)
> I should have added a link to a failure log :) but it doesn't really help :(
> (I'll send the details in a chat - I'm not sure they should be public)
> 
> Just before that failure every engine is failing with:
> 
> > 1464032913946	Sync.ErrorHandler	DEBUG	prefs failed: Error (resource://services-sync/resource.js:405:19) JS Stack trace: waitForSyncCallback@async.js:98:7 ...
> 
> But what I find odd is the lack of a good error there - it's just "Error".
> If I simulate a network error locally by disconnecting my network, I get a
> more informative:
> 
> > 1464749900344   Sync.ErrorHandler       DEBUG   tabs failed: Error: NS_ERROR_UNKNOWN_HOST (resource://services-sync/resource.js:406:19) ...
> 
> This error is thrown by onStopRequest in the |if (!statusSuccess) {| block,
> and I can't see when that wouldn't give an error string (even though it's
> convoluted)
> 
> Note I've seem similar messages in user submitted errors which is
> frustrating - it's impossible to know what the error was. Note also that
> this is *not* a non-200 case - this is the "request failed" case.
> 
> Additional logging might help solve this, but I'm not sure how to go about
> that for TPS.
> 
> Karl/Richard, is there some way we can arrange for "experimental" changes to
> be run on TPS? Eg, if Thom pushed a build to try with a Linux host, would be
> be possible to get some runs from that build?

Yes, that should be relatively easy once we get our infrastructure back online.  
:kthiessen, do you have some time to work on this next week?
Flags: needinfo?(rpappalardo)
Yeah, I'll have time for this next week.
Flags: needinfo?(kthiessen)
Do you still have time for this :kthiessen?
Flags: needinfo?(kthiessen)
Sure.  Just give me an http or ftp URL for the try build, and I can run TPS against it.
Flags: needinfo?(kthiessen)
This got pushed aside in favour of the FxA device-detaching work (see bug 1281095.)  But whenever you're ready, Thom, I'll make time to run TPS against a try URL.
I've reported an issue for FxA on their github: https://github.com/mozilla/fxa-auth-server/issues/1312, but it doesn't seem like we really care about this -- AFAICT it was just a pref that was chosen to test syncing, and not chosen because it would test some part of the server configuration (but I could be wrong).
(In reply to Thom Chiovoloni [:tcsc] from comment #9)
> I've reported an issue for FxA on their github:
> https://github.com/mozilla/fxa-auth-server/issues/1312, but it doesn't seem
> like we really care about this -- AFAICT it was just a pref that was chosen
> to test syncing, and not chosen because it would test some part of the
> server configuration (but I could be wrong).

IIUC, that's only partially true - I'm not sure how that pref was chosen (it may have been arbitrary) but it looks like it might be highlighting a real bug in that a user flipping that pref will stop working with Sync.

Is it possible for you to verify if that's the case - just flipping that pref on a profile before attempting to signin is probably enough. If it is true I think we need to escalate that githib issue you opened and leave the pref in the test.
Flags: needinfo?(tchiovoloni)
I can't replicate this outside of TPS. That said, as mentioned in comment 2, I wasn't always able to replicate this even in TPS (it started failing last week while I was working on bug 1281095).

I did some experimentation, and I'm not sure I think it's an FxA bug (may have jumped the gun with the github issue, but I didn't want to upload that patch avoiding a problem without an issue on file for it...).

For example, if add code to TPS to login at the start of every phase, and logout at the end, the bug becomes something that only happens intermittently (and rarely, at that), instead of every time. Interestingly, if I take that code out again, it sometimes still works for one run of the test before stopping.

The only thing I can think of is that something is being cached across profiles (and even possibly test runs), and it's only a problem if we're being thorough about our OCSP checks, and that for some reason, changing tps.jsm causes this cache to be cleared. 

Worth noting that with the code for us logging out/in every time, when it errors, the error is a slightly different security error.  Specifically, 0x805a1f70 -- which is SEC_ERROR_OCSP_INVALID_SIGNING_CERT.
Flags: needinfo?(tchiovoloni)
Whiteboard: [data-integrity]
Comment on attachment 8766069 [details]
Bug 1275125 - Replace 'security.OCSP.require' with a less problematic synced preference in TPS tests

https://reviewboard.mozilla.org/r/61114/#review58344

As discussed, we can only reproduce a problem with this pref in TPS on AWS and not when run locally - so we should just stop using this pref. However, instead of just deleting it, can you try and find another benign boolean pref we can use?
Attachment #8766069 - Flags: review?(markh)
Comment on attachment 8766069 [details]
Bug 1275125 - Replace 'security.OCSP.require' with a less problematic synced preference in TPS tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61114/diff/1-2/
Attachment #8766069 - Attachment description: Bug 1275125 - Remove the 'security.OCSP.require' pref from TPS tests → Bug 1275125 - Replace 'security.OCSP.require' with a less problematic synced preference in TPS tests
Attachment #8766069 - Flags: review?(markh)
Comment on attachment 8766069 [details]
Bug 1275125 - Replace 'security.OCSP.require' with a less problematic synced preference in TPS tests

https://reviewboard.mozilla.org/r/61114/#review58528

cool, thanks.
Attachment #8766069 - Flags: review?(markh) → review+
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cff140d78443
Replace 'security.OCSP.require' with a less problematic synced preference in TPS tests. r=markh
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/cff140d78443
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Verified.
Status: RESOLVED → VERIFIED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.