Firefox 31.0b2 update tests are failing on releasetest

RESOLVED FIXED

Status

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: ashughes, Assigned: whimboo)

Tracking

Version 2

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

4 years ago
We kicked off the 31b2 update tests on betatest today without any problems.
We kicked off the 31b2 update tests on releasetest and everything fell apart.
I have been unable to reproduce this manually and the AUS URLs we are getting appear to be valid. 

Here's a sample failure:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/75654/console

This currently blocks release of Firefox 31.0b2.
Created attachment 8441704 [details]
diff of betatest and releasetest logs

The equivalent test on betatest:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/75379/console

Comparing that console output results in the attached diff. There appears to be some sort of crash or unexpected teardown, and things go south from there. 

I checked the histories of the slaves (eg at http://mm-ci-master.qa.scl3.mozilla.com:8080/computer/mm-win-7-64-2/builds) but it doesn't look like there is a rogue (ie nothing failing everything you throw at it). Don't know enough about mozmill to say any more.
(Reporter)

Comment 2

4 years ago
Jonathan, could you help out here while Henrik is out?
Flags: needinfo?(jgriffin)
The problem appears to be that test2.js is designed to complete a download of an update file, but in this case it's being aborted:

14:03:18 TEST-START | test2.js | testDirectUpdate_Download
14:06:19 Parent process 2084 exited with children alive:
14:06:19 PIDS: 904
14:06:19 Attempting to kill them...
14:06:20 *** AUS:SVC Creating UpdateService
14:06:20 *** AUS:SVC gCanCheckForUpdates - able to check for updates
14:06:20 *** AUS:SVC isServiceInstalled = true
14:06:20 *** AUS:SVC gCanApplyUpdates - bypass the write checks because we'll use the service
14:06:20 *** AUS:SVC gCanApplyUpdates - able to apply updates
14:06:20 *** AUS:SVC readStatusFile - status: downloading, path: C:\Users\mozauto\AppData\Local\Mozilla\updates\76194DC1F198F7F0\updates\0\update.status
14:06:20 *** AUS:SVC readStatusFile - status: downloading, path: C:\Users\mozauto\AppData\Local\Mozilla\updates\76194DC1F198F7F0\updates\0\update.status
14:06:20 *** AUS:SVC UpdateService:_postUpdateProcessing - patch found in downloading state
14:06:20 *** AUS:SVC Creating Downloader
14:06:20 *** AUS:SVC UpdateService:_downloadUpdate
14:06:20 *** AUS:SVC readStatusFile - status: downloading, path: C:\Users\mozauto\AppData\Local\Mozilla\updates\76194DC1F198F7F0\updates\0\update.status
14:06:20 *** AUS:SVC Downloader:_selectPatch - found existing patch with state: downloading
14:06:20 *** AUS:SVC Downloader:_selectPatch - resuming download
14:06:20 *** AUS:SVC Downloader:downloadUpdate - downloading from http://download.mozilla.org/?product=firefox-31.0b2-complete&os=win&lang=en-US&force=1 to C:\Users\mozauto\AppData\Local\Mozilla\updates\76194DC1F198F7F0\updates\0\update.mar
14:06:21 *** UTM:SVC TimerManager:registerTimer - id: browser-cleanup-thumbnails
14:06:21 *** AUS:SVC Downloader:onStartRequest - original URI spec: http://download.mozilla.org/?product=firefox-31.0b2-complete&os=win&lang=en-US&force=1, final URI spec: http://download.cdn.mozilla.net/pub/firefox/releases/31.0b2/update/win32/en-US/firefox-31.0b2.complete.mar
14:06:21 *** AUS:SVC Downloader:onProgress - progress: 146101/39412455
14:06:21 *** AUS:SVC Downloader:onProgress - progress: 300000/39412455

I can't tell why this is happening, but the timing suggests it may be hitting a 3-minute timeout before it actually starts the download, and then the download is getting stopped immeidatley.  Is there such a timeout somewhere?  Has the UI changed such that the "start and wait for a download" code should change?
Flags: needinfo?(jgriffin)
(Reporter)

Comment 4

4 years ago
My personal observation in manual testing is that there is no difference in the updater UI since previous Firefox versions. Even if there were UI changes it wouldn't explain why betatest worked and releasetest isn't.
Tracking for now while under investigation.
tracking-firefox31: ? → +
(Reporter)

Comment 6

4 years ago
Created attachment 8441744 [details]
releastest.cfg

Just in case it's useful, I've attached the config file we were using.
(Assignee)

Comment 7

4 years ago
> 14:06:20 *** AUS:SVC UpdateService:_postUpdateProcessing - patch found in downloading state
[..]
> 14:06:23 *** AUS:SVC Downloader: cancel
[..]
> 14:06:23 *** AUS:SVC Downloader:onStopRequest - original URI spec: http://download.mozilla.org/?product=firefox-31.0b2-complete&os=win&lang=en-US&force=1, final URI spec: http://download.cdn.mozilla.net/pub/firefox/releases/31.0b2/update/win32/en-US/firefox-31.0b2.complete.mar, status: 2152398850

So what's strange here is that we find a patch, which has already been started to download. We continue but cancel it. Not sure why yet. I will kick off another testrun for this failed one. Lets see if there were some network glitches.
(Assignee)

Comment 8

4 years ago
All fine with the previously triggered update:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/75713/console

Nick, can you see something strange in the HTTP log of the testrun Anthony initially mentioned?
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/75654/artifact/http.log
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
(Assignee)

Comment 9

4 years ago
Given that this was working for Anthony when he tried manually, I wonder if we had proxy issues or other networking glitches in SCL3.
(Assignee)

Comment 10

4 years ago
Based on the given update config file, I re-triggered a full ondemand update testrun on releasetest now.
(Assignee)

Comment 11

4 years ago
I really suspect a network glitch here. All updates were working fine and passed, except a single one:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/75724/console

I assume that bug 797033 has nothing to do with it here. But anyway all is working fine now. No reason to keep this a blocker.
Severity: blocker → normal
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
status-firefox31: affected → ---
tracking-firefox31: + → ---
Resolution: --- → FIXED
Bug 797033 is on aus4.m.o, and we're not using that for the beta channel (yet!).
(Reporter)

Comment 13

4 years ago
Thanks for looking into this Henrik. I haven't seen any new failures with the rebuilt ondemand testruns on releasetest. Assuming this is fixed based on that.
Status: RESOLVED → VERIFIED
(Assignee)

Comment 14

4 years ago
Hm, actually lets reopening this bug for further investigation what was wrong here.

Nick, can you please have a look at comment 8? Especially at the http log? Is there anything you can see which is suspicious? If not we may have to ask Michael if he is aware of some outages in SCL3 yesterday evening.
Status: VERIFIED → REOPENED
Flags: needinfo?(nthomas)
Resolution: FIXED → ---
(Assignee)

Comment 15

4 years ago
Interesting are those lines:

14:03:17 TEST-START | test2.js | testDirectUpdate_Download
14:06:19 Parent process 2084 exited with children alive:
14:06:19 PIDS: 904
14:06:19 Attempting to kill them...

It's something Nick already mentioned earlier. There are 3 minutes without any activity. Absolutely not sure what happened here. 

I used the EventViewer on that Windows machine, but was not able to detect any problems around that time. Same with the syslog for OS X and Ubuntu.
I looked a several hours ago and the NSPR log was already a 404. Would have found it pretty hard to tell anything anyway, because of all the traffic for OSCP, firstrun page, and so on.
Flags: needinfo?(nthomas)
(Assignee)

Comment 17

4 years ago
Oh, damn. We missed to attach the HTTP log here. But yeah, you are right. With all the OSCP its kinda hard to read now.

Michael, are you aware of any network or proxy glitches in qa.scl3.mozilla.com around 2014-06-17 14:16:18 PDT ?
Flags: needinfo?(mhenry)
I was not aware.  Alas, I'm currently transitioning out of Mozilla and not part of any operations group (I only helped build the proxies).   Someone in #netops might be able to answer better than I ever could.
Flags: needinfo?(mhenry)
(Assignee)

Comment 19

4 years ago
Half a month passed by, so I don't think that the logs are still around. Lets observe and ask for feedback from operations, if it should happen once more.

Otherwise thanks a lot Michael for all the help in the past.
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.