Closed
Bug 1231938
Opened 10 years ago
Closed 6 years ago
Intermittent test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: Update has been downloaded (transfer error: Failed) (2152398849)
Categories
(Testing :: Firefox UI Tests, defect)
Testing
Firefox UI Tests
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: whimboo, Unassigned)
References
Details
(Keywords: intermittent-failure)
Attachments
(1 obsolete file)
I can see a couple of transfer issues for our update tests by downloading the MAR files:
05:04:22 INFO - *** AUS:SVC Downloader:onProgress - progress: 1500000/4221828
05:04:22 INFO - *** AUS:SVC Downloader:onStopRequest - original URI spec: https://mozilla-nightly-updates.s3.amazonaws.com/mozilla-aurora/20151210004006/Firefox-mozilla-aurora-44.0a2-macosx64-ru-20151206004008-20151210004006.partial.mar?versionId=46vPz_RkI1vMdmWLhWSdh3E3VKotWk3I, final URI spec: https://mozilla-nightly-updates.s3.amazonaws.com/mozilla-aurora/20151210004006/Firefox-mozilla-aurora-44.0a2-macosx64-ru-20151206004008-20151210004006.partial.mar?versionId=46vPz_RkI1vMdmWLhWSdh3E3VKotWk3I, status: 2152398924
05:04:22 INFO - *** AUS:SVC Downloader:onStopRequest - status: 2152398924, current fail: 0, max fail: 10, retryTimeout: 2000
05:04:22 INFO - *** AUS:SVC Downloader:onStopRequest - non-verification failure
05:04:22 INFO - *** AUS:SVC getStatusTextFromCode - transfer error: Неудача (причина неизвестна), default code: 2152398849
This looks very similar to bug 1219934 which is about downloading the installer of Firefox.
| Reporter | ||
Updated•10 years ago
|
Keywords: intermittent-failure
| Reporter | ||
Comment 1•10 years ago
|
||
I missed an important line in the former comment:
05:03:06 INFO - *** AUS:SVC Downloader:onProgress - progress: 1496661/4221828
05:04:02 INFO - *** AUS:SVC Downloader:onProgress - progress: 1550693/4221828
05:04:22 INFO - *** AUS:SVC Downloader:onProgress - progress: 1500000/4221828
So the progress meter decreases maybe due the networking problems we still have in SCL3. As result Firefox cancels the download.
Jeremy, can you confirm that this is because of bug 1219934 and bug 1170832?
Flags: needinfo?(oremj)
| Reporter | ||
Comment 2•10 years ago
|
||
Here a result as pushed to treeherder:
https://treeherder.mozilla.org/logviewer.html#?job_id=1603261&repo=mozilla-aurora
Comment 3•10 years ago
|
||
This bucket isn't owned by Cloud Services, it also doesn't go through cloudfront, so is likely unrelated to the two bugs listed.
Flags: needinfo?(oremj)
| Reporter | ||
Updated•9 years ago
|
Summary: TEST-UNEXPECTED-FAIL | test_direct_update.py TestDirectUpdate.test_update | AssertionError: Update has been downloaded → TEST-UNEXPECTED-FAIL | test_direct_update.py TestDirectUpdate.test_update | AssertionError: Update has been downloaded (status: 2152398924)
| Reporter | ||
Comment 4•9 years ago
|
||
The status codes seem to be different. Got another one here which reports as 2152398849:
https://treeherder.mozilla.org/logviewer.html#?job_id=2887237&repo=mozilla-central
05:41:35 INFO - *** AUS:SVC Downloader:onProgress - progress: 1067745/7894430
05:41:55 INFO - *** AUS:SVC Downloader:onProgress - progress: 900000/7894430
05:41:55 INFO - *** AUS:SVC Downloader:onStopRequest - original URI spec: https://mozilla-nightly-updates.s3.amazonaws.com/mozilla-central/20151213030241/Firefox-mozilla-central-45.0a1-macosx64-en-US-20151211030207-20151213030241.partial.mar?versionId=.4DCj2N7fEHuAxy5.34H.kK.74aa.rLw, final URI spec: https://mozilla-nightly-updates.s3.amazonaws.com/mozilla-central/20151213030241/Firefox-mozilla-central-45.0a1-macosx64-en-US-20151211030207-20151213030241.partial.mar?versionId=.4DCj2N7fEHuAxy5.34H.kK.74aa.rLw, status: 2152398924
05:41:55 INFO - *** AUS:SVC Downloader:onStopRequest - status: 2152398924, current fail: 0, max fail: 10, retryTimeout: 2000
05:41:55 INFO - *** AUS:SVC Downloader:onStopRequest - non-verification failure
05:41:55 INFO - *** AUS:SVC getStatusTextFromCode - transfer error: Failed (unknown reason), default code: 2152398849
Summary: TEST-UNEXPECTED-FAIL | test_direct_update.py TestDirectUpdate.test_update | AssertionError: Update has been downloaded (status: 2152398924) → TEST-UNEXPECTED-FAIL | test_direct_update.py TestDirectUpdate.test_update | AssertionError: Update has been downloaded (transfer error: Failed)
| Reporter | ||
Comment 5•9 years ago
|
||
Nick or Rail, I assume this bucket is owned by RelEng then? Do we have any logs which could be checked for the above reported issue?
Flags: needinfo?(rail)
Flags: needinfo?(nthomas)
Comment 6•9 years ago
|
||
I do have access to that bucket and logging is enabled for it. I can try to dig into the logs and extract some information, but I doubt it would help, because we have no way to fix S3. I've seen a bunch of different errors coming from S3: socket, ssl, 5xx, 419, etc. So far the best solution is to retry with exponential backoff. Not sure if it would be useful for tests though.
Flags: needinfo?(rail)
| Reporter | ||
Comment 7•9 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #6)
> I do have access to that bucket and logging is enabled for it. I can try to
> dig into the logs and extract some information, but I doubt it would help,
> because we have no way to fix S3. I've seen a bunch of different errors
So why can't we get those failures investigated and reported to Amazon? It's clearly on their side to get those issues fixed. And I think they would need some logs which showcase the exact failure.
> coming from S3: socket, ssl, 5xx, 419, etc. So far the best solution is to
> retry with exponential backoff. Not sure if it would be useful for tests
> though.
I don't think that this is something we should do for our update tests, because it would mean that we do no longer test what we actually want to verify. A failure like that could be a potential problem for lots of people if it comes to a release and Firefox fails to update to it.
Comment 8•9 years ago
|
||
FTR, to make any progress with tickets against Amazon you need the header that identifies the request, x-amz-request-id I think. Even then it's not a high priority to them.
It would be helpful to know what code 2152398849 translates to, and why this appears to go backwards:
05:41:35 INFO - *** AUS:SVC Downloader:onProgress - progress: 1067745/7894430
05:41:55 INFO - *** AUS:SVC Downloader:onProgress - progress: 900000/7894430
Flags: needinfo?(nthomas)
| Reporter | ||
Comment 9•9 years ago
|
||
Based on the message printed to the log it should be one of those:
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#4130
} else if (status != Cr.NS_BINDING_ABORTED &&
status != Cr.NS_ERROR_ABORT &&
status != Cr.NS_ERROR_DOCUMENT_NOT_CACHED) {
LOG("Downloader:onStopRequest - non-verification failure");
I believe that Brian will know what 2152398924 means.
Flags: needinfo?(netzen)
Comment 10•9 years ago
|
||
I believe it means: http://james-ross.co.uk/mozilla/misc/nserror?0x804B004C
Flags: needinfo?(netzen)
| Reporter | ||
Comment 11•9 years ago
|
||
Hm it doesn't tell an exact failure. When I check the list (http://james-ross.co.uk/mozilla/misc/nserror_list) for 0x804B004C I do not find any entry.
The entry before is NS_ERROR_REMOTE_XUL (0x804B004B) (75). Currently it can be found here:
https://dxr.mozilla.org/mozilla-central/source/xpcom/base/ErrorList.h#233
The next error in that list will be NS_ERROR_LOAD_SHOWED_ERRORPAGE (77) but not sure if that maps due to the difference of 2.
| Reporter | ||
Comment 12•9 years ago
|
||
The HTTP log can be found here:
https://firefox-ui-tests.s3.amazonaws.com/170c2332-621b-4ff2-babd-21ac77484a2c/http.log
First HTTP response:
2015-12-13 13:41:55.387702 UTC - 245157888[1003299f0]: http response [
2015-12-13 13:41:55.387732 UTC - 245157888[1003299f0]: HTTP/1.1 200 OK
2015-12-13 13:41:55.387739 UTC - 245157888[1003299f0]: x-amz-id-2: xskWWw8hDXMSY+R9qYWtVkxb1tVd7UJzTkijdJ77uNhaJEYFN1z/B/6MiwLiexQl1rNdI/qJRQk=
2015-12-13 13:41:55.387745 UTC - 245157888[1003299f0]: x-amz-request-id: 492E91CDF7C23A9B
2015-12-13 13:41:55.387751 UTC - 245157888[1003299f0]: Date: Sun, 13 Dec 2015 13:41:52 GMT
2015-12-13 13:41:55.387756 UTC - 245157888[1003299f0]: x-amz-version-id: x7Uq3NKdMuO_Kxg0fkczOFR_fyfLgZqG
2015-12-13 13:41:55.387762 UTC - 245157888[1003299f0]: Last-Modified: Sun, 13 Dec 2015 13:38:34 GMT
2015-12-13 13:41:55.387769 UTC - 245157888[1003299f0]: Etag: "18a3d8154c66fff54c71c832370851b3"
2015-12-13 13:41:55.387778 UTC - 245157888[1003299f0]: Accept-Ranges: bytes
2015-12-13 13:41:55.387787 UTC - 245157888[1003299f0]: Content-Type: application/octet-stream
2015-12-13 13:41:55.387795 UTC - 245157888[1003299f0]: Content-Length: 108719352
2015-12-13 13:41:55.387804 UTC - 245157888[1003299f0]: Server: AmazonS3
2015-12-13 13:41:55.387814 UTC - 245157888[1003299f0]: ]
Here the interesting part. Have a look at the Content-Length value:
2015-12-13 13:42:04.959719 UTC - 196444160[1003299f0]: http response [
2015-12-13 13:42:04.959760 UTC - 196444160[1003299f0]: HTTP/1.1 206 Partial Content
2015-12-13 13:42:04.959773 UTC - 196444160[1003299f0]: x-amz-id-2: gbMaffi7IdRY7vJQzRUOalsz3M0XKIwTyLMLN062eSXsMI96UNuzwBPIyQLQBzD+lBC9EMDEym0=
2015-12-13 13:42:04.959783 UTC - 196444160[1003299f0]: x-amz-request-id: CCE0A7F52ED37B45
2015-12-13 13:42:04.959793 UTC - 196444160[1003299f0]: Date: Sun, 13 Dec 2015 13:42:02 GMT
2015-12-13 13:42:04.959802 UTC - 196444160[1003299f0]: x-amz-version-id: x7Uq3NKdMuO_Kxg0fkczOFR_fyfLgZqG
2015-12-13 13:42:04.959811 UTC - 196444160[1003299f0]: Last-Modified: Sun, 13 Dec 2015 13:38:34 GMT
2015-12-13 13:42:04.959820 UTC - 196444160[1003299f0]: Etag: "18a3d8154c66fff54c71c832370851b3"
2015-12-13 13:42:04.959829 UTC - 196444160[1003299f0]: Accept-Ranges: bytes
2015-12-13 13:42:04.959838 UTC - 196444160[1003299f0]: Content-Range: bytes 900000-108719351/108719352
2015-12-13 13:42:04.959847 UTC - 196444160[1003299f0]: Content-Type: application/octet-stream
2015-12-13 13:42:04.959856 UTC - 196444160[1003299f0]: Content-Length: 107819352
2015-12-13 13:42:04.959864 UTC - 196444160[1003299f0]: Server: AmazonS3
2015-12-13 13:42:04.959873 UTC - 196444160[1003299f0]: ]
As you can see we have 108719352 vs. 107819352. Two numbers have been flipped. Brian, could this be the reason for the failure?
Flags: needinfo?(netzen)
Comment 13•9 years ago
|
||
Go to http://james-ross.co.uk/mozilla/misc/nserror
Enter the error and you'll get
2152398924 = NS_BINDING_FAILED
Flags: needinfo?(netzen)
| Assignee | ||
Updated•9 years ago
|
Product: Mozilla QA → Testing
| Reporter | ||
Comment 15•9 years ago
|
||
So what does NS_BINDING_FAILED in this case mean? We hit the same issue 3 times today for the ru Nightly build on OS X:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=a1843cf99df1&filter-searchStr=fxup%20os&filter-tier=1&filter-tier=2&filter-tier=3
Comment 16•9 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #15)
> So what does NS_BINDING_FAILED in this case mean? We hit the same issue 3
> times today for the ru Nightly build on OS X:
>
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> aurora&revision=a1843cf99df1&filter-searchStr=fxup%20os&filter-tier=1&filter-
> tier=2&filter-tier=3
That comes from the networking code and someone on the networking team should be able to answer that.
| Reporter | ||
Comment 17•9 years ago
|
||
Patrick, would you be able to have a look at the comment 8 and comment 12? We see some strange behavior with our update tests when downloading the MAR files from S3. In case you need HTTP logs, you can find those attached on Treeherder (see the given links in this bug). Thanks.
Flags: needinfo?(mcmanus)
| Comment hidden (Intermittent Failures Robot) |
| Reporter | ||
Updated•9 years ago
|
Summary: TEST-UNEXPECTED-FAIL | test_direct_update.py TestDirectUpdate.test_update | AssertionError: Update has been downloaded (transfer error: Failed) → TEST-UNEXPECTED-FAIL | test_direct_update.py TestDirectUpdate.test_update | AssertionError: Update has been downloaded (transfer error: Failed) (2152398849)
| Comment hidden (Intermittent Failures Robot) |
| Reporter | ||
Updated•9 years ago
|
Summary: TEST-UNEXPECTED-FAIL | test_direct_update.py TestDirectUpdate.test_update | AssertionError: Update has been downloaded (transfer error: Failed) (2152398849) → TEST-UNEXPECTED-FAIL | test_direct_update.py, test_fallback_update.py | AssertionError: Update has been downloaded (transfer error: Failed) (2152398849)
| Comment hidden (Intermittent Failures Robot) |
| Reporter | ||
Comment 21•9 years ago
|
||
Rail, interestingly this test failure happened a lot on Saturday 30th of January for mostly all de-2 funsize updates across platforms. It is not visible for de-3 on Sunday, so maybe something was wrong with the mar files creation? Whether en-US not the other 3 locals we test were affected.
Flags: needinfo?(rail)
| Comment hidden (Intermittent Failures Robot) |
Comment 23•9 years ago
|
||
Sorry it took so long to reply... TBH I have troubles to debug the issue here. Definitely it is somehow related to way how S3 works (or doesn't! :) ). I know we see similar issues when try to run our update verification tests for releases and hit a lot of errors just after we upload the bits. Sometimes it takes hours before we stop seeing errors, sometimes minutes...
Flags: needinfo?(rail)
| Reporter | ||
Comment 24•9 years ago
|
||
So given that this seems to be an issue with S3 can we get a ticket filed with them? I think that we have the necessary information for them to take a look at this. Having dozen of test failures for releases is something I'm a bit concerned about because it would always require additional checks and if necessary retriggering of test jobs.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Reporter | ||
Updated•9 years ago
|
Summary: TEST-UNEXPECTED-FAIL | test_direct_update.py, test_fallback_update.py | AssertionError: Update has been downloaded (transfer error: Failed) (2152398849) → TEST-UNEXPECTED-FAIL | test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: Update has been downloaded (transfer error: Failed) (2152398849)
| Comment hidden (Intermittent Failures Robot) |
| Reporter | ||
Comment 32•9 years ago
|
||
I think that I hit this problem today while trying to upgrade my Surface Tablet from an old Nightly build. As attached the log which finally results in that known error message. As it looks like the access to the file AUS suggests is denied.
Flags: needinfo?(mcmanus) → needinfo?(rail)
| Reporter | ||
Comment 33•9 years ago
|
||
So the last comment was actually a different issue which is now on file as bug 1273899 and is related to the current expire settings of S3 for old mar files.
Flags: needinfo?(rail)
| Reporter | ||
Updated•9 years ago
|
Attachment #8753846 -
Attachment is obsolete: true
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Reporter | ||
Updated•9 years ago
|
Summary: TEST-UNEXPECTED-FAIL | test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: Update has been downloaded (transfer error: Failed) (2152398849) → Intermittent test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: Update has been downloaded (transfer error: Failed) (2152398849)
| Reporter | ||
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•