Closed
Bug 809429
Opened 12 years ago
Closed 9 years ago
B2G Panda build errors don't halt the build quickly enough (continues for 15 mins more)
Categories
(Firefox OS Graveyard :: GonkIntegration, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: emorley, Unassigned)
References
Details
(Whiteboard: [mozharness])
Attachments
(1 file)
1.11 KB,
patch
|
Details | Diff | Splinter Review |
eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16820205&tree=Mozilla-Inbound
Whilst we can tweak the TBPL regex to match against the failure, this still doesn't help local builds.
Is there any reason why the build continues for 15 minutes after the initial failure?
Comment 1•12 years ago
|
||
Hm.
I tend to think this is a b2g build system/script issue.
We can detect output from this script and get mozharness to fatal() early, but that's second guessing what the build script should be doing itself.
Reporter | ||
Comment 2•12 years ago
|
||
I agree :-)
Summary: Build errors for mozharness / panda builds are buried hundreds of lines before the end of buildstep → B2G Panda build errors don't halt the build quickly enough (continues for 15 mins more)
Updated•12 years ago
|
Component: Release Engineering → Builds
Product: mozilla.org → Boot2Gecko
Version: other → unspecified
Comment 3•12 years ago
|
||
Parallel make ?
Reporter | ||
Comment 4•12 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #3)
> Parallel make ?
Yeah likely a factor, but parallel desktop builds don't take this long to stop, so I'm sure there must be something else going on...
Comment 5•12 years ago
|
||
The issue here is that the glue that builds Gecko is run in parallel in Make. When there is a build error in something other than Gecko that happens while Gecko is building, make waits for gecko to finish before exiting the top level make. This is expected behaviour. The majority of the Android modules are very small, so this isn't a problem for Android builds, but Gecko is so large that it magnifies the problem.
Reporter | ||
Updated•11 years ago
|
Blocks: b2g-sheriffing
Comment 6•11 years ago
|
||
:JHFord
Is this going to require one of the build peers to have a look at how we can fail quicker based on comment 5?
Flags: needinfo?(jhford)
Reporter | ||
Comment 7•11 years ago
|
||
Just to clarify...
From a sheriffing standpoint, the builds failing quicker makes it easier to tell if backouts have fixed the problem and/or enables us to see the original breakage sooner, so we can back something out before another 10 commits have landed on top.
In addition, whilst comment 0 mentions tweaking the regexp (which has partially happened) to make it easier to see the root cause in the pages of additional stdout, due to bug 910196 we still don't really have anything to match against, so the less confusing the stdout the better.
Comment 8•11 years ago
|
||
Ted or Kyle, do you have any thoughts on this? Basically, we want make to kill all sub-makes on the first failure in any submake.
I'm not sure how to do this.
Flags: needinfo?(ted)
Flags: needinfo?(khuey)
Flags: needinfo?(jhford)
Comment 9•11 years ago
|
||
AFAIK there's no built-in way to do this in make. It will execute all jobs in parallel, and if one errors it will wait for all outstanding jobs to finish before erroring.
Flags: needinfo?(ted)
Comment 10•11 years ago
|
||
So I decided to dig into make a little bit and I came up with this super sketchy patch. Basically, it kills the parent make when the first child dies. This patch may result in zombies eating your brain and is ill-advised for production without further testing. At a minimum, it should probably be modified to actually go through the list of sub processes and kill them all recursively.
Johns-MacBook-Pro:~/software/make $ cat test.mk
test: sleep5a failing sleep5b
@echo Done!
sleep5a:
@echo `date` going to Sleep 5 A
sleep 5
@echo Slept 5 A
sleep5b:
@echo `date` going to Sleep 5 B
sleep 5
@echo `date` Slept 5 B
failing:
@echo `date` "Going to fail"
$(fail Purposely failing)
false
.PHONY: test sleep5a failing sleep5b
Johns-MacBook-Pro:~/software/make $ make -j4 -f test.mk test
Mon Sep 9 11:54:04 CEST 2013 Going to fail
Mon Sep 9 11:54:04 CEST 2013 going to Sleep 5 A
Mon Sep 9 11:54:04 CEST 2013 going to Sleep 5 B
false
sleep 5
sleep 5
make: *** [failing] Error 1
make: *** Waiting for unfinished jobs....
Slept 5 A
Mon Sep 9 11:54:09 CEST 2013 Slept 5 B
Johns-MacBook-Pro:~/software/make $ ./make -j4 -f test.mk test
Mon Sep 9 11:54:18 CEST 2013 going to Sleep 5 B
make: *** Killing top-level make because of child failure
Mon Sep 9 11:54:18 CEST 2013 Going to fail
Mon Sep 9 11:54:18 CEST 2013 going to Sleep 5 A
Johns-MacBook-Pro:~/software/make $
Attachment #801481 -
Flags: feedback?(ted)
Updated•11 years ago
|
Flags: needinfo?(khuey)
Comment 11•11 years ago
|
||
Comment on attachment 801481 [details] [diff] [review]
Kill all subprocesses on first child failure
Review of attachment 801481 [details] [diff] [review]:
-----------------------------------------------------------------
I don't know anything about the gmake source, I have no confidence in telling you whether this is likely to work or explode.
Attachment #801481 -
Flags: feedback?(ted)
Comment 12•9 years ago
|
||
No longer using pandas at mozilla
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•