Closed Bug 1070074 Opened 10 years ago Closed 7 years ago

Local mozharness B2G builds fail if B2G.git doesn't already exist

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mshal, Unassigned)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1930] )

Attachments

(1 file)

When trying to run b2g_build.py locally, if I point the --work-dir to a directory that doesn't yet have a clone of B2G.git, mozharness/tools do the following operations:

b2g_build.py --work-dir=/home/worker/volume_cache/B2G ...

18:37:06     INFO - mkdir: /home/worker/volume_cache/B2G
18:37:06     INFO - Changing directory to /home/worker/volume_cache/B2G.
18:37:06     INFO - Running command: ['gittool.py', 'https://git.mozilla.org/b2g/B2G.git', '/home/worker/volume_cache/B2G']
18:37:06     INFO -  2014-09-19 18:37:06,219 /home/worker/volume_cache/B2G doesn't appear to be a valid git directory; clobbering

Then log_cmd() calls getcwd(), which fails with:

18:37:06     INFO -  OSError: [Errno 2] No such file or directory
18:37:06    ERROR - Return code: 1
sh: 0: getcwd() failed: No such file or directory

In other words, mozharness creates/chdirs to the B2G directory, then the git.py in the tools repo checks B2G, realizes that it's not a valid git repository and deletes the B2G tree. getcwd() now fails since the mozharness process is still chdir'd into the defunct directory.

jlund and I took a look yesterday, but we haven't yet figured out the best place to fix it, nor why the same steps happen in buildbot builds but without the failure.
Attached patch bug1070074Splinter Review
This seems like a pretty simple fix for the issue. Ultimately I think the problem stems from the fact that vcs_checkout_repos() assumes a parent_dir of 'work_dir' if parent_dir is not set, and it thinks that parent_dir is one level up from where the repo will be checked out, which means it should be fine to chdir into even if the repo gets clobbered.

However, buildb2gbase.py is passing in a full path for the repo destination, which also happens to be set to 'work_dir'. Since parent_dir is unset, it takes the default value, and then parent_dir==(repo destination), which breaks the assumptions. This workaround explicitly sets parent_dir in buildb2gbase.py to be the parent of 'work_dir'.
Attachment #8492400 - Flags: review?(jlund)
(In reply to Michael Shal [:mshal] from comment #1)
> Created attachment 8492400 [details] [diff] [review]
> bug1070074
> 
> This seems like a pretty simple fix for the issue. Ultimately I think the
> problem stems from the fact that vcs_checkout_repos() assumes a parent_dir
> of 'work_dir' if parent_dir is not set, and it thinks that parent_dir is one
> level up from where the repo will be checked out, which means it should be
> fine to chdir into even if the repo gets clobbered.
> 
> However, buildb2gbase.py is passing in a full path for the repo destination,
> which also happens to be set to 'work_dir'. Since parent_dir is unset, it
> takes the default value, and then parent_dir==(repo destination), which
> breaks the assumptions. This workaround explicitly sets parent_dir in
> buildb2gbase.py to be the parent of 'work_dir'.

catlee, why do we use the work_dir itself as the B2G repo dest: http://mxr.mozilla.org/build/source/mozharness/mozharness/mozilla/building/buildb2gbase.py#309

seems like it's dangerous creating/clobbering/creating the script's work_dir (comment 0)

/me tries to grep what why this dance is happening in this snippet:
12:58:00     INFO - mkdir: /builds/slave/b2g_ced_flm_dep-00000000000000/build
12:58:00     INFO - Changing directory to /builds/slave/b2g_ced_flm_dep-00000000000000/build.
12:58:00     INFO - retry: Calling <bound method B2GBuild._get_revision of <__main__.B2GBuild object at 0x165da10>> with args: (<mozharness.base.vcs.gittool.GittoolVCS object at 0x1615fd0>, '/builds/slave/b2g_ced_flm_dep-00000000000000/build'), kwargs: {}, attempt #1
12:58:00     INFO - Running command: ['gittool.py', 'https://git.mozilla.org/b2g/B2G.git', '/builds/slave/b2g_ced_flm_dep-00000000000000/build']
12:58:00     INFO - Copy/paste: gittool.py https://git.mozilla.org/b2g/B2G.git /builds/slave/b2g_ced_flm_dep-00000000000000/build
12:58:00     INFO - Using env: {'GIT_SHARE_BASE_DIR': '/builds/git-shared/git',
12:58:00     INFO -  'PATH': '/usr/local/bin:/usr/lib64/ccache:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/cltbld/bin'}
12:58:00     INFO -  2014-09-12 12:58:00,357 creating bare repo /builds/git-shared/git/git.mozilla.org/b2g%2FB2G.git
12:58:00     INFO -  2014-09-12 12:58:00,358 removing /builds/git-shared/git/git.mozilla.org/b2g%2FB2G.git
12:58:00     INFO -  2014-09-12 12:58:00,358 git init --bare -q /builds/git-shared/git/git.mozilla.org/b2g%2FB2G.git
12:58:00     INFO -  2014-09-12 12:58:00,372 Checking dest /builds/slave/b2g_ced_flm_dep-00000000000000/build
12:58:00     INFO -  fatal: Not a git repository (or any parent up to mount parent /builds)
12:58:00     INFO -  Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
12:58:00     INFO -  2014-09-12 12:58:00,374 /builds/slave/b2g_ced_flm_dep-00000000000000/build doesn't appear to be a valid git directory; clobbering
12:58:00     INFO -  2014-09-12 12:58:00,376 removing /builds/slave/b2g_ced_flm_dep-00000000000000/build
12:58:00     INFO -  2014-09-12 12:58:00,376 git init -q /builds/slave/b2g_ced_flm_dep-00000000000000/build
12:58:00     INFO -  2014-09-12 12:58:00,395 command: START
12:58:00     INFO -  2014-09-12 12:58:00,395 command: git fetch -q https://git.mozilla.org/b2g/B2G.git +refs/heads/*:refs/remotes/origin/*
12:58:00     INFO -  2014-09-12 12:58:00,395 command: cwd: /builds/git-shared/git/git.mozilla.org/b2g%2FB2G.git
12:58:00     INFO -  2014-09-12 12:58:00,395 command: output:
12:58:02     INFO -  2014-09-12 12:58:02,836 command: END (2.44s elapsed)
12:58:02     INFO -  2014-09-12 12:58:02,837 command: START
12:58:02     INFO -  2014-09-12 12:58:02,837 command: git fetch -q /builds/git-shared/git/git.mozilla.org/b2g%2FB2G.git +refs/remotes/origin/*:refs/remotes/origin/*
12:58:02     INFO -  2014-09-12 12:58:02,837 command: cwd: /builds/slave/b2g_ced_flm_dep-00000000000000/build
12:58:02     INFO -  2014-09-12 12:58:02,837 command: output:
12:58:02     INFO -  2014-09-12 12:58:02,928 command: END (0.09s elapsed)
12:58:02     INFO -  2014-09-12 12:58:02,931 /builds/slave/b2g_ced_flm_dep-00000000000000/build: adding remote origin https://git.mozilla.org/b2g/B2G.git
12:58:02     INFO -  2014-09-12 12:58:02,932 command: START
12:58:02     INFO -  2014-09-12 12:58:02,932 command: git remote add origin https://git.mozilla.org/b2g/B2G.git
12:58:02     INFO -  2014-09-12 12:58:02,932 command: cwd: /builds/slave/b2g_ced_flm_dep-00000000000000/build
12:58:02     INFO -  2014-09-12 12:58:02,932 command: output:
12:58:02     INFO -  2014-09-12 12:58:02,934 command: END (0.00s elapsed)
12:58:02     INFO -  2014-09-12 12:58:02,934 Updating local copy refname: None; revision: None
12:58:02     INFO -  2014-09-12 12:58:02,935 command: START
12:58:02     INFO -  2014-09-12 12:58:02,935 command: git checkout -q -f origin/master^0
12:58:02     INFO -  2014-09-12 12:58:02,935 command: cwd: /builds/slave/b2g_ced_flm_dep-00000000000000/build
12:58:02     INFO -  2014-09-12 12:58:02,935 command: output:
12:58:02     INFO -  2014-09-12 12:58:02,961 command: END (0.03s elapsed)
12:58:02     INFO -  Got revision 4be35b239e7b090f8b5b4b39485812975f67000f
12:58:02     INFO - Return code: 0

I feel like the fix is to put B2G checkout within work_dir not work_dir itself. not sure I'm grepping that right or the implications
Flags: needinfo?(catlee)
FYI I also stumbled across this comment in buildb2gbase.py:

                # That may have blown away our build-tools checkout. It would
                # be better if B2G were checked out into a subdirectory, but
                # for now, just redo it.
                self.checkout_tools()

So it sounds like we should consider moving B2G to a subdir. Thoughts?
I'm not really sure what's going on TBH. On the build slaves, the directory layout is like this:

/builds/slave/b2g_b2g-in_emu-d_dep-000000000

'base_work_dir': '/builds/slave/b2g_b2g-in_emu-d_dep-000000000'
'work_dir': 'build'

so that means that 'abs_work_dir' is /builds/slave/b2g_b2g-in_emu-d_dep-000000000/build
logs go into /builds/slave/b2g_b2g-in_emu-d_dep-000000000/logs

The B2G repo is checked out into abs_work_dir, clobbering it if necessary.

I think perhaps the problem is in the difference between base_work_dir and work_dir. The code probably assumes that work_dir is a child of base_work_dir, and when you override workdir on the command-line that assumption is broken?
Flags: needinfo?(catlee)
(In reply to Chris AtLee [:catlee] from comment #4)
> I'm not really sure what's going on TBH. On the build slaves, the directory
> layout is like this:
> 
> /builds/slave/b2g_b2g-in_emu-d_dep-000000000
> 
> 'base_work_dir': '/builds/slave/b2g_b2g-in_emu-d_dep-000000000'
> 'work_dir': 'build'
> 
> so that means that 'abs_work_dir' is
> /builds/slave/b2g_b2g-in_emu-d_dep-000000000/build

> logs go into /builds/slave/b2g_b2g-in_emu-d_dep-000000000/logs

I don't think it's mozharness that is complaining about logs: log_cmd (from tools/util/commands.py) is complaining about os.getcwd().

> 
> The B2G repo is checked out into abs_work_dir, clobbering it if necessary.

granted I have not dived into this too much but we have things in abs_work_dir that are not just the B2G checkout. e.g. tools repo (causing a vcs checkout within another checkout), 'upload' dir (also holds a copy of the script log and props). By making the B2G checkout abs_work_dir itself (/builds/slave/b2g_b2g-in_emu-d_dep-000000000/build) we will clobber all of that if we want to clobber B2G checkout. In our automation, we must be able to do this in a way that things don't complain even though we comment that it isn't optimal (comment 3)
 
> I think perhaps the problem is in the difference between base_work_dir and
> work_dir. The code probably assumes that work_dir is a child of
> base_work_dir, and when you override workdir on the command-line that
> assumption is broken?

hmm, maybe. mshal are you overriding the work_dir? I thought you tried and failed without playing with the work_dir at all.
Comment on attachment 8492400 [details] [diff] [review]
bug1070074

Review of attachment 8492400 [details] [diff] [review]:
-----------------------------------------------------------------

I'm going to reset this review request. It's been the black sheep in my queue. feel free to send the request again once recent comments derive a conclusion. IMO - I still see the fix as putting src in a subdir instead of the work_dir itself for reasons stated in my last comment.
Attachment #8492400 - Flags: review?(jlund)
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1930]
I no longer have any context here, so I'm not planning to fix it.
Assignee: mshal → nobody
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: