Closed Bug 1300048 Opened 9 years ago Closed 9 years ago

Intermittent No extraction method found for: /home/worker/workspace/build/target.common.tests.zip

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: whimboo, Unassigned)

References

Details

Seen this today: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=4c0092cbaf251d6253a07f103cabf8b2b1fbb9fc&filter-searchStr=fxfn&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable 19:42:12 INFO - Downloading packages: [u'target.common.tests.zip'] for test suite category: common 19:42:12 INFO - proxxy config: {'regions': ['.use1.', '.usw2.', '.scl3'], 'instances': ['proxxy1.srv.releng.use1.mozilla.com', 'proxxy1.srv.releng.usw2.mozilla.com', 'proxxy1.srv.releng.scl3.mozilla.com'], 'urls': [('http://ftp.mozilla.org', 'ftp.mozilla.org'), ('https://ftp.mozilla.org', 'ftp.mozilla.org'), ('https://ftp-ssl.mozilla.org', 'ftp.mozilla.org'), ('http://pypi.pvt.build.mozilla.org', 'pypi.pvt.build.mozilla.org'), ('http://pypi.pub.build.mozilla.org', 'pypi.pub.build.mozilla.org')]} 19:42:12 INFO - trying https://queue.taskcluster.net/v1/task/LuO5hf2XTMm__FaRfZKKgA/artifacts/public/build/target.common.tests.zip 19:42:12 INFO - Downloading https://queue.taskcluster.net/v1/task/LuO5hf2XTMm__FaRfZKKgA/artifacts/public/build/target.common.tests.zip to /home/worker/workspace/build/target.common.tests.zip 19:42:12 INFO - retry: Calling _download_file with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/LuO5hf2XTMm__FaRfZKKgA/artifacts/public/build/target.common.tests.zip', 'file_name': u'/home/worker/workspace/build/target.common.tests.zip'}, attempt #1 19:42:12 INFO - Downloaded 1792586 bytes. 19:42:12 FATAL - No extraction method found for: /home/worker/workspace/build/target.common.tests.zip 19:42:12 FATAL - Running post_fatal callback... 19:42:12 FATAL - Exiting 2 We cannot extract the zip file because only 1792586 bytes were downloaded. I feel this is related to bug 1266624, whereby taskcluster reports the failure right away. But we fail later when trying to extract.
Flags: needinfo?(aki)
It looks like download_and_extract is getting a rewrite in bug 1272083.
Flags: needinfo?(aki)
(In reply to Aki Sasaki [:aki] from comment #1) > It looks like download_and_extract is getting a rewrite in bug 1272083. We will see if that helped now that it got landed over the weekend.
See Also: → 1300345
We now have a different signature.
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #4) > We now have a different signature. Only for those cases when we download the binary or test archives. The old download method is still around and could fail similarly for test suite specific downloads. Given that this bug has been filed first we may want to mark the others depending on it. Btw... > 19:42:12 INFO - Downloaded 1792586 bytes. This clearly indicates the download was complete. But 1.7MB for the tests.common.zip file is nothing. So the server we downloaded it from has dropped a lot of chunks.
Flags: needinfo?(garndt)
Background for garndt. We're seeing a lot of issues spike when downloading files from TaskCluster. garndt: is there someone that could help us look into this? The original issue started in this bug, however, after we landed bug 1272083 the new signature is being reported in bug 1300345. A spike is also noticeable in https://bugzilla.mozilla.org/show_bug.cgi?id=1266624#c29
Looking at the original bug description, it appears that mozharness tries to extract a file that was from an incomplete download. We get this as well elsewhere within taskcluster and we compare the expected file size to what was downloaded to ensure we received all the data before trying to extract. Is the expected and actual file sizes being compared first here?
Flags: needinfo?(garndt)
Those sizes aren't getting compared yet as far as I know. But in case of early aborts an exception is usually thrown. The question here is why it does not happen. This behavior reminds me on bug 1219934 where we had the same for Firefox binaries. Fixing some parts in AWS or routing stopped this problem finally. I really wonder if we see the same here. And you are right, the _download method should check that the correct size has been downloaded.
I will look into fixing this.
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
We also found that there were a couple of scenarios where an exception in our workers wasn't caught correctly and once that was fixed, we noticed timeout errors being thrown. I'm not sure if this applies to the mozharness logic or not, just letting you know of something we encountered.
Depends on: 1300812
I will investigate this on bug 1300812.
Assignee: armenzg → nobody
Status: ASSIGNED → NEW
Do the responses to these requests not contain a "Content-Length" header? Without that header, a client can't distinguish a partial from a full download (the transaction is terminated by EOF). Comment 8 suggests that this was the issue before? Whimboo, do you know more than was written in that bug?
Sadly not. I never got a full description of the issue from IT. Maybe they even didn't know themselves, given that this is all EC2 magic and Amazon was not that responsive. Sorry.
philor: are we getting anymore instances of this? Or should we close since we got a new signature?
Flags: needinfo?(philringnalda)
A new signature in-tree, so I'd assume we would still get the old one on release branches, but I don't watch release branches and this doesn't have the intermittent-failure keyword, so I'd assume there these are just getting starred as "infra" and either retriggered or ignored.
Flags: needinfo?(philringnalda)
Thanks philor! I believe we're fine.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.