Closed Bug 1537574 Opened 5 years ago Closed 5 years ago

./mach build when files are missing from UNIFIED_SOURCES prevents future calls of ./mach build from working

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox68 fixed)

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: barret, Assigned: mshal)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

After a bad rebase, I had a file that didn't exist listed in a moz.build file's UNIFIED_SOURCES array. Attempting to run ./mach build resulted in further calls to ./mach build not working and mach exited with an unhelpful error message.

STR:

  1. Add a file that doesn't exist to a moz.build file's UNIFIED_SOURCES
  2. ./mach clobber
  3. ./mach build
  4. Remove the missing file from UNIFIED_SOURCES as instructed from (3)
  5. ./mach build

Expected results:

Firefox builds

Actual results:

The build fails with the following error message:

$ ./mach build
 0:01.85 Clobber not needed.
 0:01.87 Adding make options from c:\Users\Barret\Workspace\src\hg.mozilla.org\mozilla-central\mozconfig
    MOZ_MAKE_FLAGS=-j56
    MOZ_OBJDIR=c:/Users/Barret/Workspace/src/hg.mozilla.org/mozilla-central/obj-x86_64-pc-mingw32
    OBJDIR=c:/Users/Barret/Workspace/src/hg.mozilla.org/mozilla-central/obj-x86_64-pc-mingw32
    FOUND_MOZCONFIG=c:/Users/Barret/Workspace/src/hg.mozilla.org/mozilla-central/mozconfig
    export FOUND_MOZCONFIG
 0:02.06 c:\mozilla-build\bin\mozmake.EXE -f client.mk MOZ_PARALLEL_BUILD=56 -s
 0:02.19 mozmake.EXE[1]: *** No targets specified and no makefile found.  Stop.
 0:02.19 mozmake.EXE: *** [client.mk:125: build] Error 2
 0:02.22 227 compiler warnings present.

For posterity, here is the error with instructions from STR 3 to remove the file :

 2:15.05 Reticulating splines...
 2:15.05 Traceback (most recent call last):
 2:15.05   File "c:/Users/Barret/Workspace/src/hg.mozilla.org/mozilla-central/configure.py", line 132, in <module>
 2:15.05     sys.exit(main(sys.argv))
 2:15.05   File "c:/Users/Barret/Workspace/src/hg.mozilla.org/mozilla-central/configure.py", line 43, in main
 2:15.05     return config_status(config)
 2:15.05   File "c:/Users/Barret/Workspace/src/hg.mozilla.org/mozilla-central/configure.py", line 127, in config_status
 2:15.05     return config_status(args=[], **encode(sanitized_config, encoding))
 2:15.06   File "c:\Users\Barret\Workspace\src\hg.mozilla.org\mozilla-central\python\mozbuild\mozbuild\config_status.py", line 143, in config_status
 2:15.06     definitions = list(definitions)
 2:15.06   File "c:\Users\Barret\Workspace\src\hg.mozilla.org\mozilla-central\python\mozbuild\mozbuild\frontend\emitter.py", line 185, in emit
 2:15.06     objs = list(emitfn(out))
 2:15.06   File "c:\Users\Barret\Workspace\src\hg.mozilla.org\mozilla-central\python\mozbuild\mozbuild\frontend\emitter.py", line 1207, in emit_from_context
 2:15.06     for obj in self._handle_linkables(context, passthru, generated_files):
 2:15.06   File "c:\Users\Barret\Workspace\src\hg.mozilla.org\mozilla-central\python\mozbuild\mozbuild\frontend\emitter.py", line 880, in _handle_linkables
 2:15.06     'exist: \'%s\'' % (symbol, full_path), context)
 2:15.06 mozbuild.frontend.reader.SandboxValidationError:
 2:15.06 ==============================
 2:15.07 FATAL ERROR PROCESSING MOZBUILD FILE
 2:15.07 ==============================
 2:15.07 The error occurred while processing the following file or one of the files it includes:
 2:15.07     c:/Users/Barret/Workspace/src/hg.mozilla.org/mozilla-central/dom/ipc/moz.build
 2:15.07 The error occurred when validating the result of the execution. The reported error is:
 2:15.07     File listed in UNIFIED_SOURCES does not exist: 'c:/Users/Barret/Workspace/src/hg.mozilla.org/mozilla-central/dom/ipc/RemoteFrameParent.cpp'
 2:15.14 *** Fix above errors and then restart with\
 2:15.14                "./mach build"
 2:15.17 mozmake.EXE: *** [client.mk:115: configure] Error 1
Keywords: in-triage
Assignee: nobody → mshal
Keywords: in-triage

I think I have a fix for this. The problem is that if parsing fails, no Makefiles get written out, so make has no entry point to execute the backend-out-of-date logic. Mozbuild doesn't re-run configure because that part completed successfully, so at that point there's nothing that will try to regenerate the backend.

We already have some code to do this check in mozbuild instead, so we should use that for RecursiveMake too.

If mozbuild parsing fails due to a missing file (eg: a file not existing
in UNIFIED_SOURCES), then no Makefiles are written out, but
config.status exists. This would cause mozbuild to think that configure
doesn't need to run, and rely on make to perform the backend-out-of-date
check in rebuild-backend.mk. Unfortunately since no Makefiles were
written, the make command fails immediately and no attempt is made to
re-create the backend. Note that this is only a problem if the first
mozbuild parsing from a clobber build fails, otherwise there is
typically a top-level Makefile from a previous build to call into (at
which point make can determine it is out-of-date, and re-invoke itself).

The fix is to have the RecursiveMake backend re-use the same logic that
was introduced into mozbuild for alternate backends, and remove
rebuild-backend.mk. This way, mozbuild can always determine if the
backend needs to be regenerated, even if the initial parsing failed.

Test code was also relying on rebuild-backend.mk to generate the
TestBackend, but moving backend_out_of_date() into MozbuildObject allows
this code to be shared.

Pushed by mshal@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/95b3298fd2d4
Use mozbuild's backend-out-of-date logic for RecursiveMake; r=firefox-build-system-reviewers,chmanchester

Looks like this conflicted with bug 1195299 in some way, though it's not immediately apparent to me why. One thing I noticed in the logs is that bug 1195299 introduced some messages in the build log like this:

13:43:51 INFO - c:\mozilla-build\python\python2.7.exe: can't open file 'main_file_name = 'mach'^M
13:43:51 INFO - main_module_name = 'mach'^M
13:43:51 INFO - import imp^M
13:43:51 INFO - import os^M
13:43:51 INFO - import sys^M
13:43:51 INFO - orig_find_module = imp.find_module^M
13:43:51 INFO - def my_find_module(name, dirs):^M
13:43:51 INFO - if name == main_module_name:^M
13:43:51 INFO - path = os.path.join(dirs[0], main_file_name)^M
13:43:51 INFO - f = open(path)^M
13:43:51 INFO - return (f, path, ('', 'r', imp.PY_SOURCE))^M
13:43:51 INFO - return orig_find_module(name, dirs)^M
13:43:51 INFO - # Don't allow writing bytecode file for the main module.^M
13:43:51 INFO - orig_load_module = imp.load_module^M
13:43:51 INFO - def my_load_module(name, file, path, description):^M
13:43:51 INFO - # multiprocess.forking invokes imp.load_module manually and^M
13:43:51 INFO - # hard-codes the name parents_main as the module name.^M
13:43:51 INFO - if name == 'parents_main':^M
13:43:51 INFO - old_bytecode = sys.dont_write_bytecode^M
13:43:51 INFO - sys.dont_write_bytecode = True^M
13:43:51 INFO - try:^M
13:43:51 INFO - return orig_load_module(name, file, path, description)^M
13:43:51 INFO - finally:^M
13:43:51 INFO - sys.dont_write_bytecode = old_bytecode^M
13:43:51 INFO - return orig_load_module(name, file, path, description)^M
13:43:51 INFO - imp.find_module = my_find_module^M
13:43:51 INFO - imp.load_module = my_load_module^M
13:43:51 INFO - from multiprocessing.forking import main; main()^M
13:43:51 INFO - ': [Errno 2] No such file or directory^M

This shows up twice, once after the 'mach build' and once after 'mach warnings-list'.

With the patch in this bug added on top, the '[Errno 2] no such file or directory' thing shows up another 16+ times before the task gets aborted. Maybe it gets stuck in some sort of loop?

ahal: Do you see anything obvious in D26262 that would conflict with the patches in bug 1195299?

try with both this bug and 1195299: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d1dc9607e75eca391ad5ed0a350165d5585bbff6
try with bug 1195299 backed out: https://treeherder.mozilla.org/#/jobs?repo=try&revision=495dad34672fc9308119d8fb00e7a113b0235426

Flags: needinfo?(mshal) → needinfo?(ahal)

Bug 1195299 does two things that might affect this:

  1. Updates the copies of mozbase that mozharness uses to latest
  2. Uses |mach python| to run the mozharness scripts instead of the system python (so in-tree packages are available).

Assuming the thing that interacted badly here is the second one, you're in luck. The mozharness change also caused a regression to Windows ccov build tasks in bug 1542242. And I just landed a fix which hacks PYTHONPATH for Windows builds instead of running mach python.

So try running your patches on top of this:
https://hg.mozilla.org/integration/autoland/rev/20c86313e332

If that fixes it, please add a comment to bug 1543149 so we know to expect this failure when investigating.

Flags: needinfo?(ahal)

Also bug 1543149 is likely going to need some build team help eventually (it's not a high priority or anything though).

Ahh, thanks for the tip! With your autoland patch it appears to work fine: https://treeherder.mozilla.org/#/jobs?repo=try&revision=eba9b9f5fae838f7056e6d28c39e3130f0e58644

I'll try to re-land and leave a note in the other bug.

Attachment #9056030 - Attachment description: Bug 1537574 - Use mozbuild's backend-out-of-date logic for RecursiveMake; r?#firefox-build-system-reviewers → Bug 1537574 - Use mozbuild's backend-out-of-date logic for RecursiveMake; r=chmanchester
Pushed by mshal@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/148b08e319b3
Use mozbuild's backend-out-of-date logic for RecursiveMake; r=firefox-build-system-reviewers,chmanchester
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

This regressed the cram tests (which didn't run on your pushes because their "files-changed" attribute didn't include the build system):
https://treeherder.mozilla.org/logviewer.html#?job_id=239614623&repo=autoland

Looks like at the very least that errno is actually undefined. Though I'm not sure why those tests are hitting that code path in the first place.

Flags: needinfo?(mshal)

Filed bug 1543663 for the cram try failure

Depends on: 1543663
No longer depends on: 1543663
Regressions: 1543663

The cram issue should be fixed now in bug 1543663.

Flags: needinfo?(mshal)
Blocks: clobber
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: