Closed Bug 1365219 Opened 2 years ago Closed 2 years ago

Windows Build fail with 403 pulling vs2015u3 zip

Categories

(Infrastructure & Operations :: CIDuty, task, blocker)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cbook, Unassigned)

References

Details

(Whiteboard: [stockwell infra])

We have massive Windows build Problems currently and only several retrigger helps.

Its always like

ownloading vs2015u3.zip
03:48:12     INFO -   0:07.39 attempt 1/5
03:48:12     INFO -   0:07.39 Downloading to temporary location c:\builds\tooltool_cache\babc414ffc0457d27f5a1ed24a8e4873afbe2f1c1a4075469a27c005e1babc3b2a788f643f825efedff95b79686664c67ec4340ed535487168a3482e68559bc7
03:48:12     INFO -   0:07.45 403 Client Error: FORBIDDEN for url: https://api.pub.build.mozilla.org/tooltool/sha512/babc414ffc0457d27f5a1ed24a8e4873afbe2f1c1a4075469a27c005e1babc3b2a788f643f825efedff95b79686664c67ec4340ed535487168a3482e68559bc7
03:48:12     INFO -   0:07.45 Failed to download vs2015u3.zip

https://treeherder.mozilla.org/logviewer.html#?job_id=99405429&repo=mozilla-inbound&lineNumber=1775

might be related to https://bugzilla.mozilla.org/show_bug.cgi?id=1362356#c25
more and more failures in windows builds, so closing inbound and autoland
Severity: critical → blocker
I've reverted b9c1aed49990b4a7e7ad28a90e030219d2634f5f to see if that will help.
Depends on: 1362356
I believe vs2015u3.zip is the only INTERNAL download in that releng.manifest -- the rest are visibility=PUBLIC.

(The manifest entry for mozmake.exe does not specify a visibility, but api.pub.build.mozilla.org/tooltool reports that mozmake.exe is PUBLIC.)

An attempt to download an INTERNAL file without an appropriate token will result in a 403 -- consistent with the ideas in and around https://bugzilla.mozilla.org/show_bug.cgi?id=1362356#c25.
The latest theory on this is that runner is starting before the one of the instantiation scripts is finished copying over data to c:\builds. The job fails because the tokens aren't there yet, then runner reboots the machine, so the copy of the keys never finishes. That instance is bad from the start and keeps burning jobs. Instances that do not pick up jobs before the copy is finished have the keys and operate normally.

Markco is going to try creating a sempahore file so that runner doesn't start until the token copy has finished
Because we're still having issues with AMI generation, we've hand modified the AMI from May 2nd and taken a snapshot for b-2008 and y-2008. These have been copied over to usw2 now as well, so any new instantiations should happen from the AMIs modified to look for he semaphore.
We think this issue (and a host of others) may be solved now. There's a lengthy explanation in the blocking bug.
Duplicate of this bug: 1352456
Whiteboard: [stockwell infra]
tjr's issue was unrelated.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.