Closed Bug 1421114 Opened 7 years ago Closed 7 years ago

Windows tc-SM jobs are permafailing on mozilla-release

Categories

(Firefox Build System :: Task Configuration, task)

Unspecified
Windows
task
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: RyanVM, Assigned: grenade)

Details

Looks like they started to fail about a week ago and retriggers on previously-green runs are also failing now. No issues on Beta, so presumably something changed with the workers that depends on something in the tree? https://treeherder.mozilla.org/logviewer.html#?job_id=148038357&repo=mozilla-release 0:06.36 403 Client Error: FORBIDDEN for url: https://tooltool.mozilla-releng.net/sha512/babc414ffc0457d27f5a1ed24a8e4873afbe2f1c1a4075469a27c005e1babc3b2a788f643f825efedff95b79686664c67ec4340ed535487168a3482e68559bc7 0:06.36 Failed to download vs2015u3.zip Traceback (most recent call last): File "./src/js/src/devtools/automation/autospider.py", line 95, in <module> UNAME_M = subprocess.check_output(['uname', '-m']).strip() File "c:\mozilla-build\python\lib\subprocess.py", line 566, in check_output process = Popen(stdout=PIPE, *popenargs, **kwargs) File "c:\mozilla-build\python\lib\subprocess.py", line 710, in __init__ errread, errwrite) File "c:\mozilla-build\python\lib\subprocess.py", line 958, in _execute_child startupinfo) WindowsError: [Error 2] The system cannot find the file specified
I'm guessing this is a bum AMI?
Flags: needinfo?(pmoore)
I suspect this may be an OpenCloudConfig issue. Rob is on PTO until 4 December - is this something that can wait until he is back?
Flags: needinfo?(pmoore)
I don't think a Tier 1 job failing on mozilla-release is something we can wait another week on, no.
Oh interesting. This is *not* the bug that I was seeing a week or so ago. That was failing with IOError: [Errno 13] y:\hg-shared\8ba995b74e18334ab3707f27e9eb8f4e37ba3d29\.hg/store\data/modules/libpref/init/all.js.i: Access is denied (it wasn't always the same file, but it was always retrieving something from hg-shared.)
Pete can we try re-running OCC?
Is anybody working on this? We're trying to ship a dot release off m-r today and have had failing Tier 1 jobs for 2+ weeks now.
Flags: needinfo?(rthijssen)
Flags: needinfo?(pmoore)
Flags: needinfo?(dustin)
Add gps to the cc in case he has insight into hg-shared.
found a possibly related issue - we didn't have anything setting acl permissions that would allow task users to write to c:\builds\tooltool_cache. - we do seed the tooltool cache during ami creation using: - the latest in-tree manifest - tt artifacts from api.pub.build.mozilla.org - majority of firefox builds work because they use the same tt artifact url and manifest so they successfully find tt artifacts in the local pre-seeded cache. - spidermonkey builds appear to use a different tt artifact url (tooltool.mozilla-releng.net) and maybe a different manifest? i don't know much about this and might be wrong but that's what it looks like. they throw an error suggesting they can't download from tooltool.mozilla-releng.net so maybe sm builds don't always find pre-seeded local artefacts and then fail to download??? i've patched the acl issue. it won't fix whatever is causing the failure to download though. maybe someone familiar with tooltool.mozilla-releng.net can help us debug wether or not the artifact exists and if a different token is needed to get at it.
Flags: needinfo?(rthijssen)
looks like grenade is working on this..
Flags: needinfo?(dustin)
Flags: needinfo?(pmoore)
(In reply to Rob Thijssen (:grenade UTC+2) from comment #10) > - spidermonkey builds appear to use a different tt artifact url > (tooltool.mozilla-releng.net) and maybe a different > manifest? i don't know much about this and might be wrong but that's what > it looks like. they throw an error suggesting they > can't download from tooltool.mozilla-releng.net so maybe sm builds don't > always find pre-seeded local artefacts and then > fail to download??? Rok, is this related to https://bugzilla.mozilla.org/show_bug.cgi?id=1394358#c15 ?
Flags: needinfo?(rgarbas)
:pmore :grenade(In reply to Pete Moore [:pmoore][:pete] from comment #12) > (In reply to Rob Thijssen (:grenade UTC+2) from comment #10) > > > - spidermonkey builds appear to use a different tt artifact url > > (tooltool.mozilla-releng.net) and maybe a different > > manifest? i don't know much about this and might be wrong but that's what > > it looks like. they throw an error suggesting they > > can't download from tooltool.mozilla-releng.net so maybe sm builds don't > > always find pre-seeded local artefacts and then > > fail to download??? > > Rok, is this related to > https://bugzilla.mozilla.org/show_bug.cgi?id=1394358#c15 ? I looked quickly at it and I would say that it is an authentication issue since the artificat is clearly private. both old and new tooltool use the same database to verify if you are allowed to download an artifact. Did something change regarding authentication? I'm happy to dig deeper on my end, but timing for this is poor since i'm flying soon to Austin. If critical maybe we can debug this in Austin?
Flags: needinfo?(rgarbas)
ok probably not a token issue then if both urls use the same auth db and this is only intermittent (if the token was no good we'd have failures everywhere).
Assignee: nobody → rthijssen
since we changed nothing in infrastructure but no longer see failures, i'm closing the bug and assuming that something in-tree must have corrected this.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.