Closed Bug 1197950 Opened 9 years ago Closed 8 years ago

Intermittent "[taskcluster-vcs:error] Error when extracting archive" after an unrecoverable tar error

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RyanVM, Unassigned)

References

Details

(Keywords: intermittent-failure)

      No description provided.
Summary: Intermittent AssertionError: /home/worker/.tc-vcs-repo/sources/git.mozilla.org/external/caf/platform/external/bsdiff/master.tar.gz must exist to extract... → Intermittent "[taskcluster-vcs:error] Error when extracting archive" after an unrecoverable tar error
Hi Selena, anyone that could bring down this intermittent orange?
Flags: needinfo?(sdeckelmann)
(In reply to Armen Zambrano Gasparnian [:armenzg] from comment #85)
> Hi Selena, anyone that could bring down this intermittent orange?

The current proposal is to move all caching to our decision tasks, but there might be some interim measures taken. I don't have a firm timeline other than we're aiming to fix this in q4.
Flags: needinfo?(sdeckelmann)
If this isn't looked into by tomorrow when I return to work, I'll check it out.  At least from what I can tell from the logs, the cached tarballs that are used for extract have been used by many tasks, some fail when extracting, some don't.  So perhaps some filesystem issues at play.
(In reply to Greg Arndt [:garndt] from comment #93)
> If this isn't looked into by tomorrow when I return to work, I'll check it
> out.  At least from what I can tell from the logs, the cached tarballs that
> are used for extract have been used by many tasks, some fail when
> extracting, some don't.  So perhaps some filesystem issues at play.

I refreshed our archive of gaia-central, and then retriggered tasks. All retriggered tasks are now succeeding.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This one has certainly increased over the last week to be our top orange for the week.
I haven't looked at all the failures, but I believe this was an issue with the cache expiring for Aries, emulator jb, and one other type of build that I can't remember.  philor reported the issue and I fixed the caches and jobs turned green again.  Last starred job was 2 days ago when I was fixing this and it seems to be concentrated on the 12th and 13th.
This is the #5 intermittent orange over the past 3 days.  It would be helpful to get this assigned appropriately (whether that's a fix in Taskcluster or better diagnostics that would lead to a fix elsewhere).
Flags: needinfo?(sdeckelmann)
FYI Selena's on PTO atm.
:garndt, this is still showing up!  also another git vcs related top intermittent is bug 1198092.  can you take a look?
Flags: needinfo?(garndt)
I will be taking a look at this when I'm back from PTO on Monday.  I'll keep the ni? here until then.  I have a patch up for review to at least reduce some of the errors related to cache expiring so we can get to the core issues.

Here is the try push for a new version of tc-vcs that will give better error messages for when the cache expires like in some of the previous comments.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b732ba12182
keep in mind this is mostly on b2g related builds (which the above try push looks at).  Thanks for looking into this and we will see if we can actually reduce errors!
Flags: needinfo?(sdeckelmann)
this is now getting more frequent
(In reply to Carsten Book [:Tomcat] from comment #117)
> this is now getting more frequent

it's permafail on all the try requests I have sent recently:
 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=96a4c7794e34
 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=612953c7b3e0
 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf8086251e83
Hm, it feels like a tc-vcs cache problem. I am running cache update. NI Selena and Greg here to not forget to discuss why caches weren't updated.
Flags: needinfo?(sdeckelmann)
The latest failures were caused by emulator and flame-kk based caches not being updated.  The hook that's configured is for non-emulator tasks so it's not part of the scheduled hook.  Need to look into getting these into the hook scheduler.  I'm not sure what the reason for them not being included are.  It might be because of disk space issues we were having with creating the caches.

I recreated the missing caches this morning and things turned green.
Flags: needinfo?(garndt)
The tc-vcs caches are up to date, green and running 6x a day. b2g-device-image seems to be still having issues, so I'll make a bug to investigate that separately.
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Flags: needinfo?(sdeckelmann)
Resolution: --- → FIXED
Also, recent items starred here were more related to repo sync issues (bug 1198092) than untarring problems.
You need to log in before you can comment on or make changes to this bug.