Closed
Bug 1300048
Opened 9 years ago
Closed 9 years ago
Intermittent No extraction method found for: /home/worker/workspace/build/target.common.tests.zip
Categories
(Release Engineering :: Applications: MozharnessCore, defect)
Release Engineering
Applications: MozharnessCore
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: whimboo, Unassigned)
References
Details
Seen this today:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=4c0092cbaf251d6253a07f103cabf8b2b1fbb9fc&filter-searchStr=fxfn&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable
19:42:12 INFO - Downloading packages: [u'target.common.tests.zip'] for test suite category: common
19:42:12 INFO - proxxy config: {'regions': ['.use1.', '.usw2.', '.scl3'], 'instances': ['proxxy1.srv.releng.use1.mozilla.com', 'proxxy1.srv.releng.usw2.mozilla.com', 'proxxy1.srv.releng.scl3.mozilla.com'], 'urls': [('http://ftp.mozilla.org', 'ftp.mozilla.org'), ('https://ftp.mozilla.org', 'ftp.mozilla.org'), ('https://ftp-ssl.mozilla.org', 'ftp.mozilla.org'), ('http://pypi.pvt.build.mozilla.org', 'pypi.pvt.build.mozilla.org'), ('http://pypi.pub.build.mozilla.org', 'pypi.pub.build.mozilla.org')]}
19:42:12 INFO - trying https://queue.taskcluster.net/v1/task/LuO5hf2XTMm__FaRfZKKgA/artifacts/public/build/target.common.tests.zip
19:42:12 INFO - Downloading https://queue.taskcluster.net/v1/task/LuO5hf2XTMm__FaRfZKKgA/artifacts/public/build/target.common.tests.zip to /home/worker/workspace/build/target.common.tests.zip
19:42:12 INFO - retry: Calling _download_file with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/LuO5hf2XTMm__FaRfZKKgA/artifacts/public/build/target.common.tests.zip', 'file_name': u'/home/worker/workspace/build/target.common.tests.zip'}, attempt #1
19:42:12 INFO - Downloaded 1792586 bytes.
19:42:12 FATAL - No extraction method found for: /home/worker/workspace/build/target.common.tests.zip
19:42:12 FATAL - Running post_fatal callback...
19:42:12 FATAL - Exiting 2
We cannot extract the zip file because only 1792586 bytes were downloaded. I feel this is related to bug 1266624, whereby taskcluster reports the failure right away. But we fail later when trying to extract.
Flags: needinfo?(aki)
Comment 1•9 years ago
|
||
It looks like download_and_extract is getting a rewrite in bug 1272083.
Flags: needinfo?(aki)
| Comment hidden (Intermittent Failures Robot) |
| Reporter | ||
Comment 3•9 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #1)
> It looks like download_and_extract is getting a rewrite in bug 1272083.
We will see if that helped now that it got landed over the weekend.
Comment 4•9 years ago
|
||
We now have a different signature.
| Reporter | ||
Comment 5•9 years ago
|
||
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #4)
> We now have a different signature.
Only for those cases when we download the binary or test archives. The old download method is still around and could fail similarly for test suite specific downloads.
Given that this bug has been filed first we may want to mark the others depending on it.
Btw...
> 19:42:12 INFO - Downloaded 1792586 bytes.
This clearly indicates the download was complete. But 1.7MB for the tests.common.zip file is nothing. So the server we downloaded it from has dropped a lot of chunks.
Flags: needinfo?(garndt)
Comment 6•9 years ago
|
||
Background for garndt.
We're seeing a lot of issues spike when downloading files from TaskCluster.
garndt: is there someone that could help us look into this?
The original issue started in this bug, however, after we landed bug 1272083 the new signature is being reported in bug 1300345.
A spike is also noticeable in https://bugzilla.mozilla.org/show_bug.cgi?id=1266624#c29
Comment 7•9 years ago
|
||
Looking at the original bug description, it appears that mozharness tries to extract a file that was from an incomplete download. We get this as well elsewhere within taskcluster and we compare the expected file size to what was downloaded to ensure we received all the data before trying to extract. Is the expected and actual file sizes being compared first here?
Flags: needinfo?(garndt)
| Reporter | ||
Comment 8•9 years ago
|
||
Those sizes aren't getting compared yet as far as I know. But in case of early aborts an exception is usually thrown. The question here is why it does not happen. This behavior reminds me on bug 1219934 where we had the same for Firefox binaries. Fixing some parts in AWS or routing stopped this problem finally.
I really wonder if we see the same here.
And you are right, the _download method should check that the correct size has been downloaded.
Comment 9•9 years ago
|
||
I will look into fixing this.
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Comment 10•9 years ago
|
||
We also found that there were a couple of scenarios where an exception in our workers wasn't caught correctly and once that was fixed, we noticed timeout errors being thrown. I'm not sure if this applies to the mozharness logic or not, just letting you know of something we encountered.
Comment 11•9 years ago
|
||
I will investigate this on bug 1300812.
Assignee: armenzg → nobody
Status: ASSIGNED → NEW
Comment 12•9 years ago
|
||
Do the responses to these requests not contain a "Content-Length" header? Without that header, a client can't distinguish a partial from a full download (the transaction is terminated by EOF). Comment 8 suggests that this was the issue before? Whimboo, do you know more than was written in that bug?
| Reporter | ||
Comment 13•9 years ago
|
||
Sadly not. I never got a full description of the issue from IT. Maybe they even didn't know themselves, given that this is all EC2 magic and Amazon was not that responsive. Sorry.
Comment 14•9 years ago
|
||
philor: are we getting anymore instances of this?
Or should we close since we got a new signature?
Flags: needinfo?(philringnalda)
Comment 15•9 years ago
|
||
A new signature in-tree, so I'd assume we would still get the old one on release branches, but I don't watch release branches and this doesn't have the intermittent-failure keyword, so I'd assume there these are just getting starred as "infra" and either retriggered or ignored.
Flags: needinfo?(philringnalda)
Comment 16•9 years ago
|
||
Thanks philor!
I believe we're fine.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•