Closed
Bug 1058073
Opened 10 years ago
Closed 10 years ago
Frequent timeouts downloading from pvtbuilds and usw2 proxxy
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: RyanVM, Assigned: coop)
References
Details
Attachments
(1 file)
2.35 KB,
patch
|
catlee
:
review+
coop
:
checked-in+
|
Details | Diff | Splinter Review |
Trunk trees closed.
https://tbpl.mozilla.org/php/getParsedLog.php?id=46691626&tree=Mozilla-Inbound
07:58:43 ERROR - Can't download from http://ftp.mozilla.org.proxxy.srv.releng.usw2.mozilla.com/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-asan/1408975067/firefox-34.0a1.en-US.linux-x86_64-asan.tests.zip to /builds/slave/test/build/firefox-34.0a1.en-US.linux-x86_64-asan.tests.zip!
https://tbpl.mozilla.org/php/getParsedLog.php?id=46691660&tree=Mozilla-Inbound
07:49:31 INFO - Downloading http://tooltool.pvt.build.mozilla.org/build/sha512/7140e026b7b747236545dc30e377a959b0bdf91bb4d70efd7f97f92fce12a9196042503124b8df8d30c2d97b7eb5f9df9556afdffa0b5d9625008aead305c32b to /builds/slave/talos-slave/cached/AVDs-armv7a-gingerbread-build-2014-01-23-ubuntu.tar.gz
command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/android_emulator_unittest.py', '--cfg', 'android/androidarm.py', '--test-suite', 'mochitest-1', '--blob-upload-branch', 'mozilla-inbound', '--download-symbols', 'ondemand'], attempting to kill
Assignee | ||
Comment 1•10 years ago
|
||
The tooltool download part seems fine now, but I'm unsure whether I should be able to get to ftp.mozilla.org.proxxy.srv.releng.usw2.mozilla.com manually or not.
Reporter | ||
Comment 2•10 years ago
|
||
Android download corruption:
https://tbpl.mozilla.org/php/getParsedLog.php?id=46697858&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=46698014&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=46698733&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=46697919&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=46697898&tree=Mozilla-Central
Download timeouts:
https://tbpl.mozilla.org/php/getParsedLog.php?id=46700571&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=46700948&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=46701332&tree=B2g-Inbound
Reporter | ||
Comment 3•10 years ago
|
||
Reporter | ||
Comment 4•10 years ago
|
||
We're starting to run low on jobs to fail given the length of the tree closure, so I'm reopening for now so we can see how things go with more volume.
Assignee | ||
Comment 5•10 years ago
|
||
Looking at https://tbpl.mozilla.org/php/getParsedLog.php?id=46706670&tree=Mozilla-Inbound, the sha512sum for the AVD file does *not* match what we see in tooltool:
[cltbld@tst-linux64-spot-745.test.releng.usw2.mozilla.com ~]$ sha512sum /builds/slave/talos-slave/cached/AVDs-armv7a-gingerbread-build-2014-01-23-ubuntu.tar.gz
3721d31b60b501b77805eaa16eebecd54d8f860d98c6346a82f0f090afe0da4aa2b8db6c6c20f6f7be40d2c2c1208c773f24fe1eb033d89cdaeca85f252441e8 /builds/slave/talos-slave/cached/AVDs-armv7a-gingerbread-build-2014-01-23-ubuntu.tar.gz
Should be:
12:03:54 INFO - 'tooltool_cacheable_artifacts': {'avd_tar_ball': ('AVDs-armv7a-gingerbread-build-2014-01-23-ubuntu.tar.gz',
12:03:54 INFO - '7140e026b7b747236545dc30e377a959b0bdf91bb4d70efd7f97f92fce12a9196042503124b8df8d30c2d97b7eb5f9df9556afdffa0b5d9625008aead305c32b')},
Assignee | ||
Comment 6•10 years ago
|
||
https://hg.mozilla.org/build/mozharness/file/4148c2a93b1b/scripts/android_emulator_unittest.py#l472
We're assuming a previously downloaded file is correct if it's simply present without checking that the sha512sum matches what we expect.
Comment 7•10 years ago
|
||
It looks like that in some cases the process of fetching files from tooltool servers is not performed via tooltool fetch command (e.g.: https://hg.mozilla.org/build/mozharness/file/4148c2a93b1b/scripts/android_emulator_unittest.py#l476); also, a custom cache mechanism has been implemented for emulator tests, which also does not verify sha sum when a cached artifact is used.
This is why tooltool_cache_path, tooltool_cacheable_artifacts and tooltool_url have been added to the task configuration.
I would consider using the standard tooltool fetch command to retrieve files and the native tooltool caching mechanism instead, since both perform sha validation allowing an early detection of corruption issues.
See also https://wiki.mozilla.org/ReleaseEngineering/Tooltool.
Assignee | ||
Comment 8•10 years ago
|
||
Simone's not wrong -- we *should* be using tooltool here because it provides built-in verification -- but that's a more invasive patch.
This patch should get us unblocked, but it's completely untested.
Attachment #8478552 -
Flags: review?(catlee)
Updated•10 years ago
|
Attachment #8478552 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 9•10 years ago
|
||
Comment on attachment 8478552 [details] [diff] [review]
Check hash of existing, cached file.
Review of attachment 8478552 [details] [diff] [review]:
-----------------------------------------------------------------
https://hg.mozilla.org/build/mozharness/rev/916fd25c7692
::: scripts/android_emulator_unittest.py
@@ +469,5 @@
> for artifact_name in artifacts.keys():
> file_name = artifacts[artifact_name][0]
> file_path = os.path.join(c["tooltool_cache_path"], file_name)
> + if not os.path.exists(file_path) or self.file_sha512sum(file_path) != file_shasum:
> + os.remove(filepath)
Var name should be file_path.
Should also check if the file exists before removing it, because we could have entered this conditional either way.
Attachment #8478552 -
Flags: checked-in+
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → coop
Assignee | ||
Comment 10•10 years ago
|
||
The tooltool part has bee mitigated, but the better fix will happen in bug 1058286.
According to Tomcat, there was one proxxy-related failure again last night: https://tbpl.mozilla.org/php/getParsedLog.php?id=46749860&tree=Mozilla-Inbound
Let's re-open if we see another rash of failures today.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment 11•10 years ago
|
||
Rolled out to prod with reconfig on 2014-08-26 08:21 PT
Comment 12•10 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #10)
> The tooltool part has bee mitigated, but the better fix will happen in bug
> 1058286.
>
> According to Tomcat, there was one proxxy-related failure again last night:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=46749860&tree=Mozilla-
> Inbound
We're always going to have intermittent failures downloading from proxxy. In this case the machine fell back to downloading from the original URL and succeeded. I'd call this successful error handling, not a proxxy failure :)
Test test ended up failing because of an overall timeout.
Comment 13•10 years ago
|
||
Comment 14•10 years ago
|
||
Comment 15•10 years ago
|
||
Comment 16•10 years ago
|
||
this could be related to the usw2 issues we are having. dep'ing that bug
Depends on: 1060407
Comment 17•10 years ago
|
||
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•