Closed Bug 797898 Opened 12 years ago Closed 7 years ago

Test Failure "Update successfully downloaded. - 'errors' should not equal 'errors' " in /update/testFallbackUpdate

Categories

(Mozilla QA Graveyard :: Mozmill Tests, defect, P2)

x86
Windows 7
defect

Tracking

(firefox17 affected)

RESOLVED INVALID
Tracking Status
firefox17 --- affected

People

(Reporter: AndreeaMatei, Unassigned)

References

()

Details

(Whiteboard: [needs fix by RelEng])

Attachments

(1 file)

This is failing on Windows, with Aurora branch, fr locale. 
I wasn't able to reproduce it manually, tried on Vista and Win 7.
Errors:
* Update successfully downloaded. - 'errors' should not equal 'errors' 
* The post buildid is equal to the buildid of the update. - '20121002042010' should equal '20121003042009'
There is nothing we can do for investigation anymore. Once it happens please directly check the jenkins console for this testrun what happened when downloading the update.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
Happened again today. We're getting an transfer error: Failed (unknown reason), default code: 2152398849.
Here is the console log:
http://10.250.73.243:8080/job/ondemand_update/5233/console
So this is because of bug 817688. We need RelEng to fix the issue.
Status: RESOLVED → REOPENED
Depends on: 817688
Priority: -- → P2
Resolution: INCOMPLETE → ---
Whiteboard: [needs fix by RelEng]
Status: REOPENED → NEW
Didn't happen again in the last month. Lets close it as WFM.
Status: NEW → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → WORKSFORME
Reopening due to:
http://mozmill-ci.blargon7.com/#/update/report/740ab23e062758ae8a240b263d2eed85

This bug might become a tracker for RelEng issues. I will file a bug appropriately.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
(In reply to Daniela Petrovici from comment #6)
> Happened on Windows 8 x86 with Nightly FR
> http://mozmill-ci.blargon7.com/#/update/report/
> cb9e33213cf3d2f37d7b0f8dcc1d078f

What does the log say about the reason?
Sorry about that, I just looked at the log (http://mm-ci-master.qa.scl3.mozilla.com:8080/job/mozilla-central_update/2470/) and it seems this is related to bug 816470 which was already updated yesterday with another failure log. The error is here:
*** AUS:SVC Downloader:onProgress - progress: 29400000 is higher than patch size: 28839096
Happened again today on Linux Ubuntu 12.04 (x86_64) with Firefox 23.0 en-US (Beta update):
http://mozmill-release.blargon7.com/#/update/report/59e7437f35dec9e49daab388d300e0df
This failed again with Beta en-US on Windows 8.1 mm-win-81-64-4
http://mozmill-release.blargon7.com/#/update/report/44e0169acb64b68a44c1ad880b63e333 
I can't tell if this is caused by the same issue from comment 8 because we don't have complete log on windows nodes (bug 1008135)
It may be related but we indeed need logs. So adding bug 1008135 to the dep list.
Status: REOPENED → NEW
Depends on: 1008135
I found a log on a mac testrun:
>08:49:55 *** AUS:SVC Downloader:onStartRequest - original URI spec: http://download.mozilla.org/?product=firefox-30.0b4-complete&os=osx&lang=en-GB&force=1, final URI spec: http://download.mozilla.org/?product=firefox-30.0b4-complete&os=osx&lang=en-GB&force=1
>08:49:55 *** AUS:SVC Downloader:onStopRequest - original URI spec: http://download.mozilla.org/?product=firefox-30.0b4-complete&os=osx&lang=en-GB&force=1, final URI spec: http://download.mozilla.org/?product=firefox-30.0b4-complete&os=osx&lang=en-GB&force=1, status: 2147549183
>08:49:55 *** AUS:SVC Downloader:onStopRequest - status: 2147549183, current fail: 0, max fail: 10, retryTimeout: 2000
>08:49:55 *** AUS:SVC Downloader:onStopRequest - non-verification failure
>08:49:55 *** AUS:SVC getStatusTextFromCode - transfer error: Failed (unknown reason), default code: 2152398849
>08:49:55 *** AUS:SVC Downloader:onStopRequest - setting state to: download-failed
>08:49:55 *** AUS:SVC Downloader:onStopRequest - all update patch downloads failed
>08:49:55 ERROR | Test Failure | {
>08:49:55   "exception": {
>08:49:55     "message": "Update successfully downloaded. - 'errors' should not equal 'errors'", 
>08:49:55     "lineNumber": 595, 
>08:49:55     "name": "Error", 
>08:49:55     "fileName": "file:///Users/mozauto/jenkins/workspace/mozilla-esr24_addons/data/addons/jenkins/workspace/ondemand_update/data/mozmill-tests/firefox/lib/software-update.js"
>08:49:55   }
>08:49:55 }
>08:49:55 TEST-UNEXPECTED-FAIL | test2.js | testDirectUpdate_Download

I couldn't reproduce it in testruns with the affected build.
Those lines are not that helpful. Why do you still not attach full logs? Also have you checked the http.log, which gets created and contains some more information? Which Mac testrun was it?
Flags: needinfo?(cosmin.malutan)
Attached file jenkins_log.txt
This is the jenkins log, I haven't seen anything in http.log, all responses are successful or found (200 || 302) 
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/68647/artifact/http.log
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/68647
Flags: needinfo?(cosmin.malutan)
Ben or Nick, can you see something? I wonder what the status code 2147549183 actually is.
Flags: needinfo?(nthomas)
Flags: needinfo?(bhearsum)
I have no idea, I thought those errors came from https://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/common/errors.h.
Flags: needinfo?(bhearsum)
I was pointed to http://james-ross.co.uk/mozilla/misc/nserror on IRC, which fingers 2147549183 as NS_ERROR_UNEXPECTED. There are a several of those in 
  http://dxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsIncrementalDownload.cpp
which is what is downloading files for the update service.

One use of NS_ERROR_UNEXPECTED is at line 576, which we reach if the HTTP response code is not 206, 416, or 200. And sure enough in the http log there's a 500 response from download.m.o:

2014-05-14 15:56:03.933415 UTC - 2044965648[1003342d0]: http request [
2014-05-14 15:56:03.933432 UTC - 2044965648[1003342d0]:   GET http://download.mozilla.org/?product=firefox-30.0b4-complete&os=osx&lang=en-GB&force=1 HTTP/1.1
2014-05-14 15:56:03.933439 UTC - 2044965648[1003342d0]:   Host: download.mozilla.org
2014-05-14 15:56:03.933445 UTC - 2044965648[1003342d0]:   User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:30.0) Gecko/20100101 Firefox/30.0
2014-05-14 15:56:03.933451 UTC - 2044965648[1003342d0]:   Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
2014-05-14 15:56:03.933457 UTC - 2044965648[1003342d0]:   Accept-Language: en-gb,en;q=0.5
2014-05-14 15:56:03.933465 UTC - 2044965648[1003342d0]:   Cookie: optimizelySegments=%7B%22245617832%22%3A%22none%22%2C%22245875585%22%3A%22direct%22%2C%22245677587%22%3A%22ff%22%2C%22246048108%22%3A%22false%22%7D; optimizelyEndUserId=oeu1400082954637r0.13259389026856827; optimizelyBuckets=%7B%7D; optimizelyPendingLogEvents=%5B%5D; __utma=150903082.247467372.1400082955.1400082955.1400082955.1; __utmb=150903082.4.9.1400082959094; __utmz=150903082.1400082955.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=150903082
2014-05-14 15:56:03.933481 UTC - 2044965648[1003342d0]:   Connection: keep-alive
2014-05-14 15:56:03.933488 UTC - 2044965648[1003342d0]: ]
2014-05-14 15:56:03.933571 UTC - 83562496[100334b20]: Resolving host [proxy.dmz.scl3.mozilla.com].
2014-05-14 15:56:03.933585 UTC - 83562496[100334b20]:   Using cached record for host [proxy.dmz.scl3.mozilla.com].
2014-05-14 15:56:03.933623 UTC - 83562496[100334b20]: Checking blacklist for host [proxy.dmz.scl3.mozilla.com], host record [118e8ca20].
2014-05-14 15:56:04.758400 UTC - 83562496[100334b20]: http response [
2014-05-14 15:56:04.758451 UTC - 83562496[100334b20]:   HTTP/1.0 500 Internal Server Error
2014-05-14 15:56:04.758463 UTC - 83562496[100334b20]:   Server: Apache
2014-05-14 15:56:04.758473 UTC - 83562496[100334b20]:   X-Backend-Server: bouncer1.webapp.phx1.mozilla.com
2014-05-14 15:56:04.758483 UTC - 83562496[100334b20]:   Content-Type: text/html; charset=UTF-8
2014-05-14 15:56:04.758492 UTC - 83562496[100334b20]:   Date: Wed, 14 May 2014 15:55:07 GMT
2014-05-14 15:56:04.758503 UTC - 83562496[100334b20]:   Set-Cookie: dmo=10.8.81.219.1400082907153856; path=/; expires=Thu, 14-May-15 15:55:07 GMT
2014-05-14 15:56:04.758513 UTC - 83562496[100334b20]:   X-Cache-Info: not cacheable; response code not cacheable
2014-05-14 15:56:04.758522 UTC - 83562496[100334b20]:   Content-Length: 3666
2014-05-14 15:56:04.758531 UTC - 83562496[100334b20]:   X-Cache: MISS from proxy2.dmz.scl3.mozilla.com
2014-05-14 15:56:04.758541 UTC - 83562496[100334b20]:   X-Cache-Lookup: MISS from proxy2.dmz.scl3.mozilla.com:3128
2014-05-14 15:56:04.758551 UTC - 83562496[100334b20]:   Via: 1.0 proxy2.dmz.scl3.mozilla.com (squid/3.1.10)
2014-05-14 15:56:04.758560 UTC - 83562496[100334b20]:   Connection: keep-alive
2014-05-14 15:56:04.758569 UTC - 83562496[100334b20]: ]

There are later requests for the same url that succeed, so this looks like an intermittent error (see also coming and going in this bug). I looked in new relic (IT's monitoring tool) and it didn't catch any errors at this time.
Flags: needinfo?(nthomas)
Usually we are seeing this when QA is running update tests for beta and release channels. During that duration it happens a couple of times. I was never able to see the problem when I retriggered some of the jobs a couple of hours later. Not sure if that only happens closely when we open up the update channel for a new release.
(In reply to Nick Thomas [:nthomas] from comment #17)
> 2014-05-14 15:56:03.933415 UTC - 2044965648[1003342d0]: http request [
[..]
> 2014-05-14 15:56:04.758400 UTC - 83562496[100334b20]: http response [
> 2014-05-14 15:56:04.758451 UTC - 83562496[100334b20]:   HTTP/1.0 500
> Internal Server Error

Hm, I'm not an expert in networking but shouldn't the response use the same HTTP channel as the request was using? That would mean 2044965648, but here we have 83562496. I thought that this was something different. Looks like I was wrong. So something new I have learned here.
It happened again today for

releasetest:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/69085/

betatest:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/69016/

We haven't seen any problem with updates on the beta channel.
(In reply to Henrik Skupin (:whimboo) from comment #19)
> Hm, I'm not an expert in networking but shouldn't the response use the same
> HTTP channel as the request was using? 

Me neither, I don't know. Possible the 500 could have come from one of the oscp or firstrun page requests. ondemand_update/68647 is gone so I can't go back to check. Are the latest two the same error ? The console output looks different from attachment 8423082 [details].
Hm, maybe not. I will continue to watch when it happens the next time.
No longer depends on: 1008135
Mozmill tests have been superseded by Marionette tests.
Status: NEW → RESOLVED
Closed: 11 years ago7 years ago
QA Contact: hskupin
Resolution: --- → INVALID
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: