Closed Bug 1113144 Opened 6 years ago Closed 6 years ago

linux64-br-haz_try_dep does not work with mozharness pinning

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

Attachments

(1 file)

The purge action cannot succeed:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=8e764a594c80
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/armenzg@mozilla.com-8e764a594c80/try-linux64-br-haz/linux64-br-haz_try_dep-bm87-try1-build2482.txt.gz
caught OS error 2: No such file or directory while running ['/builds/slave/l64-br-haz_try_dep-00000000000/build/scripts/external_tools/clobberer.py', '-s', 'scripts', '-s', 'logs', '-s', 'buildprops.json', '-s', 'token', '-s', 'oauth.txt', '-t', '168', 'https://api.pub.build.mozilla.org/clobberer/lastclobber', u'try', u'linux64-br-haz_try_dep', 'l64-br-haz_try_dep-00000000000', u'b-linux64-hp-0002', u'http://buildbot-master87.srv.releng.scl3.mozilla.com:8101/']

Normal workflow where we don't touch mozharness.json (run with the default repositories) has no issues.

This blocks bug 1110286.
Phew. I've been digging through this.

In a non-pinned linux64-br-haz run, mozharness gets checked out into /tools/checkouts/mozharness, then during the purge step it blows away /builds/slave/l64-br-haz_try_dep-00000000000/build and continues on. No problem.

In a pinned linux64-br-haz run, mozharness gets checked out via repository_manifest.py into /builds/slave/l64-br-haz_try_dep-00000000000/build/scripts, then during the purge step it blows away /builds/slave/l64-br-haz_try_dep-00000000000/build. Then when it tries to rmtree the always_clobber_dirs value that was passed in, it fails because that uses /builds/slave/l64-br-haz_try_dep-00000000000/build/scripts/external_tools/clobberer.py, which no longer exists.

Which is straightforward enough, but why do the desktop builds work?

When looking at a random desktop build, it runs into no problems at all because it doesn't clobber. It turns out that the linux64-br-haz build forces a clobber via its config; fx_desktop_build.py only clobbers for a nightly.

So why doesn't a nightly fail? Because it checks out mozharness into <base_work>/scripts instead of <base_work>/build/scripts. Which certainly seems like a more sensible place to put it.

I'm still tracking through where that comes from. It appears to be a buildbot property script_repo_checkout set from script_repo_cache which is only set if a branch claims to support it via use_mozharness_repo_cache, and then it gets the value from a 'mozharness_repo_cache' setting that could come from 2 different places. use_mozharness_repo_cache just seems to be globally set to True. mozharness_repo_cache appears to be set to /tools/checkouts/mozharness for the linux64-br-haz build. So I lost my way somewhere, and I'm going to a corner to cry.

I'll look into this more tomorrow, but I'm going to ni? jlund, who added the mozharness_repo_cache setting to these builds.
Flags: needinfo?(jlund)
(In reply to Steve Fink [:sfink, :s:] from comment #1)
> Phew. I've been digging through this.

feel your pain. there are many branches and edge cases.

> In a pinned linux64-br-haz run, mozharness gets checked out via
> repository_manifest.py into
> /builds/slave/l64-br-haz_try_dep-00000000000/build/scripts, then during the
> purge step it blows away /builds/slave/l64-br-haz_try_dep-00000000000/build.
> Then when it tries to rmtree the always_clobber_dirs value that was passed
> in, it fails because that uses
> /builds/slave/l64-br-haz_try_dep-00000000000/build/scripts/external_tools/
> clobberer.py, which no longer exists.

hmm, something seems wrong there. so IIUC, you're saying the abs_work_dir, '/builds/slave/l64-br-haz_try_dep-00000000000/build', also holds the mozharness checkout itself?

this doesn't happen in the test world because we never have test slaves don't have a checkout of mozharness so it uses this: http://mxr.mozilla.org/build/source/buildbotcustom/process/factory.py?rev=8e7940ff9558#5444

however, linux build slaves do have a checkout so it uses this repository_manifest call: 
http://mxr.mozilla.org/build/source/buildbotcustom/process/factory.py?rev=8e7940ff9558#5389

armen, are you cloning mozharness into mozharness' work dir on purpose (line 5389 above)? Should we be doing '%(basedir)s/scripts/scripts' or maybe '%(basedir)s/scripts'


> 
> Which is straightforward enough, but why do the desktop builds work?
> 
> When looking at a random desktop build, it runs into no problems at all
> because it doesn't clobber. It turns out that the linux64-br-haz build
> forces a clobber via its config; fx_desktop_build.py only clobbers for a
> nightly.

non nightlies should be clobbering + purging too. grep for 'external_tools/clobberer.py' and 'purge_builds.py' here: https://tbpl.mozilla.org/php/getParsedLog.php?id=55602993&full=1&branch=mozilla-central

maybe it's not doing so correctly?

> 
> So why doesn't a nightly fail? Because it checks out mozharness into
> <base_work>/scripts instead of <base_work>/build/scripts. Which certainly
> seems like a more sensible place to put it.
> 
> I'm still tracking through where that comes from. It appears to be a
> buildbot property script_repo_checkout set from script_repo_cache which is
> only set if a branch claims to support it via use_mozharness_repo_cache, and
> then it gets the value from a 'mozharness_repo_cache' setting that could
> come from 2 different places. use_mozharness_repo_cache just seems to be
> globally set to True. mozharness_repo_cache appears to be set to
> /tools/checkouts/mozharness for the linux64-br-haz build. So I lost my way
> somewhere, and I'm going to a corner to cry.

use_mozharness_repo_cache, when I created it, was defaulted to True but overridden by 'ash' as ash used to use a custom mozharness repo so I didn't want to use the cache on the slaves. mozharness_repo_cache is always '/tools/checkouts/mozharness' because it uses Runners' cache copy and runners' cache is only added on linux slaves. this value will be different once we enable windows.
Flags: needinfo?(jlund) → needinfo?(armenzg)
(In reply to Jordan Lund (:jlund) from comment #2)
> (In reply to Steve Fink [:sfink, :s:] from comment #1)
> however, linux build slaves do have a checkout so it uses this
> repository_manifest call: 
> http://mxr.mozilla.org/build/source/buildbotcustom/process/factory.
> py?rev=8e7940ff9558#5389
> 
> armen, are you cloning mozharness into mozharness' work dir on purpose (line
> 5389 above)? Should we be doing '%(basedir)s/scripts/scripts' or maybe
> '%(basedir)s/scripts'

Yeah, this is where I finally ended up.

> > Which is straightforward enough, but why do the desktop builds work?
> > 
> > When looking at a random desktop build, it runs into no problems at all
> > because it doesn't clobber. It turns out that the linux64-br-haz build
> > forces a clobber via its config; fx_desktop_build.py only clobbers for a
> > nightly.
> 
> non nightlies should be clobbering + purging too. grep for
> 'external_tools/clobberer.py' and 'purge_builds.py' here:
> https://tbpl.mozilla.org/php/getParsedLog.
> php?id=55602993&full=1&branch=mozilla-central
> 
> maybe it's not doing so correctly?

Sorry, I'm being imprecise. First, fx_desktop_build.py doesn't change the clobber behavior at all; it uses the default purge.py behavior. And I'm really talking about whether purge.py calls super(PurgeMixin, self).clobber() or not, which isn't the only sort of clobbering that these things do. (The super.clobber generally refers to BaseScript.clobber, which nukes abs_work_dir.)

super.clobber runs if we aren't in automation, or we are and we're either a nightly or have a force_clobber in the config.

Regular builds *do* call self.purge_builds(), which check the clobberer etc. So I think everything is fine. (Analysis builds have to force everything to be rebuilt, so the analysis can see the full picture, which is why they force_clobber.)

Perhaps there are two meaningful concepts that could be labeled "purge" vs "clobber", but maybe we already do? (clobber == do not reuse any generated files, purge == free up space?)

Anyway, just to be clear, I don't see any problem here. I was just describing differing behavior between 2 specific jobs.
I will look at this NI tomorrow.
This is to fix the haz jobs which end up clobbering the workdir.

See comment from sfink:
> In a pinned linux64-br-haz run, mozharness gets checked out via
> repository_manifest.py into /builds/slave/l64-br-haz_try_dep-00000000000/build/scripts,
> then during the purge step it blows away /builds/slave/l64-br-haz_try_dep-00000000000/build.
> Then when it tries to rmtree the always_clobber_dirs value that was passed in,
>  it fails because that uses /builds/slave/l64-br-haz_try_dep-00000000000/build/scripts/external_tools/clobberer.py,
> which no longer exists.
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Flags: needinfo?(armenzg)
Attachment #8546194 - Flags: review?(rail)
This change could *only* affect try build jobs which are the only ones currently using mozharness cached checkouts.
Attachment #8546194 - Flags: review?(rail) → review+
This is now fixed!
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Blocks: 791924
No longer blocks: 1110286
thanks armen!
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.