Closed Bug 1300345 Opened 9 years ago Closed 8 years ago

Intermittent-infra BadZipfile: File is not a zip file

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell infra])

Attachments

(1 obsolete file)

Component: General → Mozharness
Product: Taskcluster → Release Engineering
QA Contact: jlund
I think this is just the new signature for bug 1300048 with the patch on bug 1272083 landed.
Depends on: 1272083
Just offhand, I'd say the frequency of this is somewhere around 20 times the frequency of that.
(In reply to Phil Ringnalda (:philor) from comment #4) > Just offhand, I'd say the frequency of this is somewhere around 20 times the > frequency of that. Sure, because downloading the build and tests is performed via the new method now. That's why the higher frequency of this bug compared to bug 1300048 now.
Depends on: 1300812
Summary: Intermittent BadZipfile: File is not a zip file → Intermittent-infra BadZipfile: File is not a zip file
This link instead: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1300345&startday=2016-09-01&endday=2016-09-15&tree=all I see 52 for yesterday. I expect today to be less than 10 and tomorrow close to 0.
This got fixed by bug 1300812.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Depends on: 1305501
Well... the spike from last week is from someone pushing to inbound code a couple of weeks old. See comments on bug 1305501 for details.
It seems that a lot of code was retriggered or backfilled and starred which meant making this spike again. 3:49 PM <armenzg> now I'm even more confused 3:49 PM this log 3:49 PM https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=36399206#L612 3:49 PM says September 23rd 3:50 PM which takes me to inbound 3:50 PM https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=72c8ea085d7227204666b528d001f34c1cf4b113 3:50 PM now that says Sep. 13th 3:50 PM <philor> talk to jmaher and his heavy retrigger finger 3:51 PM he wants to know when his Linux talos broke, so he ran around fifty billion retriggers, and various pushes to try 3:52 PM <armenzg> philor: I think I should make pulse_actions takes Paypal payments after a certain threshold philor> armenzg: yes! though charge RyanVM for that one, it's mostly chrome that he starred so he probably also retriggered 3:53 PM since nobody else would have seen it that much later 3:54 PM <armenzg> Armen Zambrano Gasparnian I wonder if we need to modify OF 3:54 PM hrmm 3:54 PM this is tough 3:54 PM <philor> ITYM "rewrite", that being what we do instead of fixing 3:55 PM <armenzg> Armen Zambrano Gasparnian philor: we should track this bug on the rewrite 3:55 PM it seems that we would want to ignore retriggers 3:55 PM wlach: do you have a document to track requirements for the OF rewrite? 3:55 PM maybe we should start one 3:56 PM for whoever ends up working on it 3:56 PM philor: but who starred the retries from old pushes? 3:56 PM <armenzg> Armen Zambrano Gasparnian or did auto-starring do that? 3:56 PM <philor> armenzg: same person who triggered them, no doubt 3:56 PM no, there is no such thing as auto-starring 3:57 PM <philor> there is only autoclassify, which does a thing distinct from starring 3:57 PM <armenzg> Armen Zambrano Gasparnian philor: but that would have meant that somebody backfilled over few weeks of pushes and starred those oranges? 3:57 PM I'm scared of the thought of taking such painful path 3:58 PM <philor> armenzg: yes, that is far far far less surprising than they they would retrigger the piss out of things and then leave them unstarred 3:58 PM <ekyle> armenzg: https://docs.google.com/document/d/1LB4Ppj55rw9IFC-ggQr6Vy0yx8Gq-c6rcD7aNt_0Ib4/edit 3:58 PM <philor> well, jmaher may well have, because he believes in "green or it was my failure", but whatever RyanVM was looking for, he would have been looking for a particular failure, so he would have wanted the things which were not it to go away 3:59 PM <armenzg> Armen Zambrano Gasparnian jmaher|afk: RyanVM|mtg did you guys retrigger a lot on the 22nd and starred the failures?
Blocks: 1305752
Attachment #8795777 - Attachment is obsolete: true
Attachment #8795777 - Flags: review?(dustin)
Whiteboard: [stockwell infra]
This is not fixed and happens quite often recently.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Flags: needinfo?(jmaher)
Flags: needinfo?(catlee)
Well, there is something clearly broken in the content delivery on the TaskCluster (AWS) side: > [task 2018-04-04T22:40:17.473Z] 22:40:17 INFO - Downloading packages: [u'target.common.tests.zip', u'target.mochitest.tests.zip'] for test suite categories: ['mochitest'] > [task 2018-04-04T22:40:17.474Z] 22:40:17 INFO - Downloading and extracting to /builds/worker/workspace/build/tests these dirs * from https://queue.taskcluster.net/v1/task/VdeVvSVLQQep9yV4J5x3Lw/artifacts/public/build/target.common.tests.zip > [task 2018-04-04T22:40:17.475Z] 22:40:17 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/VdeVvSVLQQep9yV4J5x3Lw/artifacts/public/build/target.common.tests.zip'}, attempt #1 > [task 2018-04-04T22:40:17.475Z] 22:40:17 INFO - Fetch https://queue.taskcluster.net/v1/task/VdeVvSVLQQep9yV4J5x3Lw/artifacts/public/build/target.common.tests.zip into memory > [task 2018-04-04T22:40:18.986Z] compiz (core) - Warn: Attempted to restack relative to 0x1400006 which is not a child of the root window or a window compiz owns > [task 2018-04-04T22:40:22.233Z] 22:40:22 INFO - Content-Length response header: 282 > [task 2018-04-04T22:40:22.233Z] 22:40:22 INFO - Bytes received: 282 The file target.common.tests.zip cannot be 282 bytes in size! John, who from the Taskcluster team could have a look at this? Do we have some AWS logs which might show what was going wrong at this time?
Flags: needinfo?(jmaher)
Flags: needinfo?(jhford)
Flags: needinfo?(catlee)
Sounds like bug 1364463 again.
Until we've moved to the new Artifact API, we cannot really do anything here other than delete bad objects as they are found. We'll be able to move things to the new API as soon as a few Queue PRs land. I will communicate broadly when this happens.
Flags: needinfo?(jhford)
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: