Closed Bug 1058073 Opened 10 years ago Closed 10 years ago

Frequent timeouts downloading from pvtbuilds and usw2 proxxy

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RyanVM, Assigned: coop)

References

Details

Attachments

(1 file)

Trunk trees closed. https://tbpl.mozilla.org/php/getParsedLog.php?id=46691626&tree=Mozilla-Inbound 07:58:43 ERROR - Can't download from http://ftp.mozilla.org.proxxy.srv.releng.usw2.mozilla.com/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-asan/1408975067/firefox-34.0a1.en-US.linux-x86_64-asan.tests.zip to /builds/slave/test/build/firefox-34.0a1.en-US.linux-x86_64-asan.tests.zip! https://tbpl.mozilla.org/php/getParsedLog.php?id=46691660&tree=Mozilla-Inbound 07:49:31 INFO - Downloading http://tooltool.pvt.build.mozilla.org/build/sha512/7140e026b7b747236545dc30e377a959b0bdf91bb4d70efd7f97f92fce12a9196042503124b8df8d30c2d97b7eb5f9df9556afdffa0b5d9625008aead305c32b to /builds/slave/talos-slave/cached/AVDs-armv7a-gingerbread-build-2014-01-23-ubuntu.tar.gz command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/android_emulator_unittest.py', '--cfg', 'android/androidarm.py', '--test-suite', 'mochitest-1', '--blob-upload-branch', 'mozilla-inbound', '--download-symbols', 'ondemand'], attempting to kill
The tooltool download part seems fine now, but I'm unsure whether I should be able to get to ftp.mozilla.org.proxxy.srv.releng.usw2.mozilla.com manually or not.
We're starting to run low on jobs to fail given the length of the tree closure, so I'm reopening for now so we can see how things go with more volume.
Looking at https://tbpl.mozilla.org/php/getParsedLog.php?id=46706670&tree=Mozilla-Inbound, the sha512sum for the AVD file does *not* match what we see in tooltool: [cltbld@tst-linux64-spot-745.test.releng.usw2.mozilla.com ~]$ sha512sum /builds/slave/talos-slave/cached/AVDs-armv7a-gingerbread-build-2014-01-23-ubuntu.tar.gz 3721d31b60b501b77805eaa16eebecd54d8f860d98c6346a82f0f090afe0da4aa2b8db6c6c20f6f7be40d2c2c1208c773f24fe1eb033d89cdaeca85f252441e8 /builds/slave/talos-slave/cached/AVDs-armv7a-gingerbread-build-2014-01-23-ubuntu.tar.gz Should be: 12:03:54 INFO - 'tooltool_cacheable_artifacts': {'avd_tar_ball': ('AVDs-armv7a-gingerbread-build-2014-01-23-ubuntu.tar.gz', 12:03:54 INFO - '7140e026b7b747236545dc30e377a959b0bdf91bb4d70efd7f97f92fce12a9196042503124b8df8d30c2d97b7eb5f9df9556afdffa0b5d9625008aead305c32b')},
https://hg.mozilla.org/build/mozharness/file/4148c2a93b1b/scripts/android_emulator_unittest.py#l472 We're assuming a previously downloaded file is correct if it's simply present without checking that the sha512sum matches what we expect.
It looks like that in some cases the process of fetching files from tooltool servers is not performed via tooltool fetch command (e.g.: https://hg.mozilla.org/build/mozharness/file/4148c2a93b1b/scripts/android_emulator_unittest.py#l476); also, a custom cache mechanism has been implemented for emulator tests, which also does not verify sha sum when a cached artifact is used. This is why tooltool_cache_path, tooltool_cacheable_artifacts and tooltool_url have been added to the task configuration. I would consider using the standard tooltool fetch command to retrieve files and the native tooltool caching mechanism instead, since both perform sha validation allowing an early detection of corruption issues. See also https://wiki.mozilla.org/ReleaseEngineering/Tooltool.
Simone's not wrong -- we *should* be using tooltool here because it provides built-in verification -- but that's a more invasive patch. This patch should get us unblocked, but it's completely untested.
Attachment #8478552 - Flags: review?(catlee)
Attachment #8478552 - Flags: review?(catlee) → review+
Comment on attachment 8478552 [details] [diff] [review] Check hash of existing, cached file. Review of attachment 8478552 [details] [diff] [review]: ----------------------------------------------------------------- https://hg.mozilla.org/build/mozharness/rev/916fd25c7692 ::: scripts/android_emulator_unittest.py @@ +469,5 @@ > for artifact_name in artifacts.keys(): > file_name = artifacts[artifact_name][0] > file_path = os.path.join(c["tooltool_cache_path"], file_name) > + if not os.path.exists(file_path) or self.file_sha512sum(file_path) != file_shasum: > + os.remove(filepath) Var name should be file_path. Should also check if the file exists before removing it, because we could have entered this conditional either way.
Attachment #8478552 - Flags: checked-in+
Assignee: nobody → coop
The tooltool part has bee mitigated, but the better fix will happen in bug 1058286. According to Tomcat, there was one proxxy-related failure again last night: https://tbpl.mozilla.org/php/getParsedLog.php?id=46749860&tree=Mozilla-Inbound Let's re-open if we see another rash of failures today.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Rolled out to prod with reconfig on 2014-08-26 08:21 PT
(In reply to Chris Cooper [:coop] from comment #10) > The tooltool part has bee mitigated, but the better fix will happen in bug > 1058286. > > According to Tomcat, there was one proxxy-related failure again last night: > https://tbpl.mozilla.org/php/getParsedLog.php?id=46749860&tree=Mozilla- > Inbound We're always going to have intermittent failures downloading from proxxy. In this case the machine fell back to downloading from the original URL and succeeded. I'd call this successful error handling, not a proxxy failure :) Test test ended up failing because of an overall timeout.
this could be related to the usw2 issues we are having. dep'ing that bug
Depends on: 1060407
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: