Closed Bug 809429 Opened 12 years ago Closed 8 years ago

B2G Panda build errors don't halt the build quickly enough (continues for 15 mins more)

Categories

(Firefox OS Graveyard :: GonkIntegration, defect)

Other
Gonk (Firefox OS)
defect
Not set
major

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: emorley, Unassigned)

References

Details

(Whiteboard: [mozharness])

Attachments

(1 file)

eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16820205&tree=Mozilla-Inbound

Whilst we can tweak the TBPL regex to match against the failure, this still doesn't help local builds.

Is there any reason why the build continues for 15 minutes after the initial failure?
Depends on: 809436
Hm.
I tend to think this is a b2g build system/script issue.
We can detect output from this script and get mozharness to fatal() early, but that's second guessing what the build script should be doing itself.
I agree :-)
Summary: Build errors for mozharness / panda builds are buried hundreds of lines before the end of buildstep → B2G Panda build errors don't halt the build quickly enough (continues for 15 mins more)
Component: Release Engineering → Builds
Product: mozilla.org → Boot2Gecko
Version: other → unspecified
Parallel make ?
(In reply to Nick Thomas [:nthomas] from comment #3)
> Parallel make ?

Yeah likely a factor, but parallel desktop builds don't take this long to stop, so I'm sure there must be something else going on...
The issue here is that the glue that builds Gecko is run in parallel in Make.  When there is a build error in something other than Gecko that happens while Gecko is building, make waits for gecko to finish before exiting the top level make.  This is expected behaviour.  The majority of the Android modules are very small, so this isn't a problem for Android builds, but Gecko is so large that it magnifies the problem.
:JHFord

Is this going to require one of the build peers to have a look at how we can fail quicker based on comment 5?
Flags: needinfo?(jhford)
Just to clarify...

From a sheriffing standpoint, the builds failing quicker makes it easier to tell if backouts have fixed the problem and/or enables us to see the original breakage sooner, so we can back something out before another 10 commits have landed on top.

In addition, whilst comment 0 mentions tweaking the regexp (which has partially happened) to make it easier to see the root cause in the pages of additional stdout, due to bug 910196 we still don't really have anything to match against, so the less confusing the stdout the better.
Ted or Kyle, do you have any thoughts on this?  Basically, we want make to kill all sub-makes on the first failure in any submake.

I'm not sure how to do this.
Flags: needinfo?(ted)
Flags: needinfo?(khuey)
Flags: needinfo?(jhford)
AFAIK there's no built-in way to do this in make. It will execute all jobs in parallel, and if one errors it will wait for all outstanding jobs to finish before erroring.
Flags: needinfo?(ted)
So I decided to dig into make a little bit and I came up with this super sketchy patch.  Basically, it kills the parent make when the first child dies.  This patch may result in zombies eating your brain and is ill-advised for production without further testing.  At a minimum, it should probably be modified to actually go through the list of sub processes and kill them all recursively.

Johns-MacBook-Pro:~/software/make $ cat test.mk
test: sleep5a failing sleep5b
	@echo Done!

sleep5a:
	@echo `date` going to Sleep 5 A
	sleep 5
	@echo Slept 5 A

sleep5b:
	@echo `date` going to Sleep 5 B
	sleep 5
	@echo `date` Slept 5 B

failing:
	@echo `date` "Going to fail"
	$(fail Purposely failing)
	false

.PHONY: test sleep5a failing sleep5b

Johns-MacBook-Pro:~/software/make $ make -j4 -f test.mk test
Mon Sep 9 11:54:04 CEST 2013 Going to fail
Mon Sep 9 11:54:04 CEST 2013 going to Sleep 5 A
Mon Sep 9 11:54:04 CEST 2013 going to Sleep 5 B
false
sleep 5
sleep 5
make: *** [failing] Error 1
make: *** Waiting for unfinished jobs....
Slept 5 A
Mon Sep 9 11:54:09 CEST 2013 Slept 5 B
Johns-MacBook-Pro:~/software/make $ ./make -j4 -f test.mk test
Mon Sep 9 11:54:18 CEST 2013 going to Sleep 5 B
make: *** Killing top-level make because of child failure
Mon Sep 9 11:54:18 CEST 2013 Going to fail
Mon Sep 9 11:54:18 CEST 2013 going to Sleep 5 A
Johns-MacBook-Pro:~/software/make $
Attachment #801481 - Flags: feedback?(ted)
Flags: needinfo?(khuey)
Comment on attachment 801481 [details] [diff] [review]
Kill all subprocesses on first child failure

Review of attachment 801481 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know anything about the gmake source, I have no confidence in telling you whether this is likely to work or explode.
Attachment #801481 - Flags: feedback?(ted)
No longer using pandas at mozilla
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: