Closed
Bug 1059992
Opened 10 years ago
Closed 10 years ago
ALL b2g builds exceeding buildbot output limits
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(b2g-v1.4 fixed, b2g-v2.0 fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)
People
(Reporter: mshal, Assigned: jhford)
References
Details
Attachments
(3 files, 1 obsolete file)
679 bytes,
text/plain
|
catlee
:
review-
|
Details |
456 bytes,
patch
|
dhylands
:
review+
jhford
:
checked-in+
|
Details | Diff | Splinter Review |
4.70 KB,
patch
|
catlee
:
review+
jhford
:
checked-in+
|
Details | Diff | Splinter Review |
Flame builds sometimes hit the following:
Output exceeded 52428800 bytes, remaining output has been truncated (output was 52446216 bytes)
(eg: https://tbpl.mozilla.org/php/getParsedLog.php?id=46947281&tree=Mozilla-Central )
This doesn't actually cause the build to fail - it just stops logging output after that. But if there is a failure after the output limit is reached, the actual failure message is lost.
I think it is probably simplest to just up the logMaxSize in buildbot-configs/mozilla/master_common.py. I don't know how feasible it is to trim the actual build output.
Reporter | ||
Comment 1•10 years ago
|
||
The fact that the logMaxSize hasn't changed since it was first created makes me somewhat uncertain that this is the right thing to do. I'm open to suggestions.
Attachment #8480816 -
Flags: review?(catlee)
Comment 2•10 years ago
|
||
We've talked about this amongst catlee's group and there are good reasons to not bump the log limit, since it adds more load to the buildbot masters to handle the information streaming back, and because 50MB really should be enough for anyone. Someone suggested that there is a lot of repeated error output in the log which could be fixed or shown once or suppressed to get back under the limit.
Blocks: 907890
Reporter | ||
Comment 3•10 years ago
|
||
Ahh, ok. Feel free to r- then :).
The problem I ran into is my flame-kk builds were failing in staging, but no error messages were in the log since the output was truncated. I upped the log size on my dev-master to see the error, but then when I wanted to compare to a real good build on tbpl, that log was truncated too.
Comment 4•10 years ago
|
||
Comment on attachment 8480816 [details]
logmaxsize
Yeah, I'd really really like to not have to increase this. We've held the line at 50MB for quite a while.
Are there any obvious candidates for reducing build output?
Attachment #8480816 -
Flags: review?(catlee) → review-
Comment 5•10 years ago
|
||
Bug 907890, which caused every single b2g build on every tree >= 32 to have a truncated log? Seems like a fairly obvious candidate to me.
Of course there are other things, there's half a meg of spammy traces from 13 Uglify2 errors because they want to use fat arrows and it doesn't know how to parse them and thinks it's useful to say so with a huge block of text, but who are you actually going to persuade to fix them?
The only reason I didn't close b2g32 and every trunk tree this morning when I discovered this was true of not just some Flame builds, but every single b2g device and emulator build since the end of July, was because people would lose their minds on mergeday eve. I think the right thing to do is close Tuesday morning, trunk aurora and 32, and not reopen until we can actually see b2g build logs.
Severity: normal → blocker
Summary: Flame builds exceeding buildbot output limits → ALL b2g builds exceeding buildbot output limits
Comment 6•10 years ago
|
||
John, as you can see it's a big issue to sheriffs to have the logs truncated.
Flags: needinfo?(jhford)
Assignee | ||
Comment 7•10 years ago
|
||
I don't think we should be turning off showcommands as it is still very useful information for debugging non-gecko changes. If there is no option to increase the size of the logs we can stop having Android build system output, I guess we need to turn off the extended information for non-gecko parts.
To get us into a useful state where we don't loose gecko logging, I think we should do the following steps in order:
1) change gonk-misc/Android.mk to read an environment variable to force not setting -s without breaking the showcommands target support
2) set environment variable in buildbot/mozharness
3) revert https://bug907890.bugzilla.mozilla.org/attachment.cgi?id=8464166
I can take 1 and 3, but I need someone from Releng for 2. Nick, do you know who on Releng can take number 2 on?
Flags: needinfo?(jhford) → needinfo?(nthomas)
Assignee | ||
Comment 8•10 years ago
|
||
Using a fake rule of
testingjohn:
echo 'value for -s flag:' $(SHOW_COMMAND_GECKO)
I've tested that this works as follows:
$ ./build.sh testingjohn
...
value for -s flag: -s
$ ./build.sh testingjohn showcommands
...
value for -s flag:
$ ./build.sh testingjohn FORCE_GECKO_BUILD_OUTPUT=1
...
value for -s flag:
$ FORCE_GECKO_BUILD_OUTPUT=1 ./build.sh testingjohn
...
value for -s flag:
Attachment #8482699 -
Flags: review?(dhylands)
Assignee | ||
Comment 9•10 years ago
|
||
This patch removes the showcommands target from Mozharness. It can't be landed until after attachment 8482699 [details] [diff] [review] lands. If you'd prefer to specify the environment variable here instead of in buildbot, I guess that works too.
Attachment #8482701 -
Flags: review?(catlee)
Updated•10 years ago
|
Attachment #8482701 -
Flags: review?(catlee) → review+
Comment 10•10 years ago
|
||
Comment on attachment 8482699 [details] [diff] [review]
Environment override to force only gecko to show full logs
Review of attachment 8482699 [details] [diff] [review]:
-----------------------------------------------------------------
r=me with issue addressed (either in this bug or a followup filed)
::: Android.mk
@@ +268,5 @@
> +ifneq ($(strip $(FORCE_GECKO_BUILD_OUTPUT)),)
> +SKIP_DASH_S = 1
> +endif
> +
> +ifdef SKIP_DASH_S
My only concern here is that if you decide to set SKIP_DASH_S = 0 it would still take the if.
I'd rather see:
ifneq ($(strip($(SKIP_DASH_S)),1)
but this can be fixed in a followup bug as well.
Attachment #8482699 -
Flags: review?(dhylands) → review+
Assignee | ||
Comment 11•10 years ago
|
||
Comment on attachment 8482699 [details] [diff] [review]
Environment override to force only gecko to show full logs
gonk-misc commit 6969df171e5295f855f12d12db0382048e6892e7
Attachment #8482699 -
Flags: checked-in+
Assignee | ||
Comment 12•10 years ago
|
||
set the environment variable. I'm not sure how spidermonkey gets these values as it does not have an equivalent query env function I think.
Attachment #8482774 -
Flags: review?(catlee)
Comment 13•10 years ago
|
||
Comment on attachment 8482774 [details] [diff] [review]
Set the environment as well
Review of attachment 8482774 [details] [diff] [review]:
-----------------------------------------------------------------
::: scripts/b2g_build.py
@@ +292,5 @@
> )
> # Force B2G_UPDATER so that eng builds (like the emulator) will get
> # the updater included. Otherwise the xpcshell updater tests won't run.
> env['B2G_UPDATER'] = '1'
> + env['FORCE_GECKO_BUILD_OUTPUT'] = '1'
please add a comment here about referencing this bug, and pointing to where this environment variable is referenced by the build system.
Attachment #8482774 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 14•10 years ago
|
||
Comment on attachment 8482774 [details] [diff] [review]
Set the environment as well
https://hg.mozilla.org/build/mozharness/rev/d6c4ce5b963e
Attachment #8482774 -
Flags: checked-in+
Assignee | ||
Updated•10 years ago
|
Attachment #8482701 -
Attachment is obsolete: true
Updated•10 years ago
|
Flags: needinfo?(nthomas)
Comment 15•10 years ago
|
||
Comment on attachment 8482774 [details] [diff] [review]
Set the environment as well
I transplanted this to production to get into use asap:
https://hg.mozilla.org/build/mozharness/rev/ea52d732ef2f
Comment 16•10 years ago
|
||
Does the gonk-misc patch need uplifting to the v2.1, v2.0, etc branches in that repo ?
Assignee | ||
Comment 17•10 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #16)
> Does the gonk-misc patch need uplifting to the v2.1, v2.0, etc branches in
> that repo ?
Only if we're truncating there as well
Assignee | ||
Comment 18•10 years ago
|
||
(In reply to John Ford [:jhford] -- please use 'needinfo?' instead of a CC from comment #17)
> (In reply to Nick Thomas [:nthomas] from comment #16)
> > Does the gonk-misc patch need uplifting to the v2.1, v2.0, etc branches in
> > that repo ?
>
> Only if we're truncating there as well
Scratch that, yes, we do need to uplift it because we're no longer using showcommands and thus we lose out on the Gecko build system output.
Assignee | ||
Comment 19•10 years ago
|
||
Comment on attachment 8482699 [details] [diff] [review]
Environment override to force only gecko to show full logs
NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): this bug
User impact if declined: developers loose Gecko build logging information
Testing completed: it works on master branch
Risk to taking this patch (and alternatives if risky): if we have a build failure or otherwise need to refer to a full gecko build, we won't have the required information for branches without this patch.
String or UUID changes made by this patch: none
Attachment #8482699 -
Flags: approval-mozilla-b2g32?
Attachment #8482699 -
Flags: approval-mozilla-b2g30?
Attachment #8482699 -
Flags: approval-mozilla-b2g28?
Comment 20•10 years ago
|
||
This is really NPOTB, isn't it? i.e. doesn't affect shipping code at all?
Assignee: nobody → jhford
Comment 21•10 years ago
|
||
Comment on attachment 8482699 [details] [diff] [review]
Environment override to force only gecko to show full logs
Confirmed with John on IRC that this doesn't affect any compiled code, just changes some of the output in the logs. Uplifting a=NPOTB.
v2.1: https://github.com/mozilla-b2g/gonk-misc/commit/fd2ace405952a06d16eb8a1334129007c3f179fa
v2.0: https://github.com/mozilla-b2g/gonk-misc/commit/c4765e55870c241a5d25499d65323f3e87d26143
v1.4: https://github.com/mozilla-b2g/gonk-misc/commit/c5fc772a061a6fa46750e457954837c0d0df64dc
Attachment #8482699 -
Flags: approval-mozilla-b2g32?
Attachment #8482699 -
Flags: approval-mozilla-b2g30?
Attachment #8482699 -
Flags: approval-mozilla-b2g28?
Updated•10 years ago
|
status-b2g-v1.4:
--- → fixed
status-b2g-v2.0:
--- → fixed
status-b2g-v2.1:
--- → fixed
status-b2g-v2.2:
--- → fixed
Comment 22•10 years ago
|
||
Merged to production, and deployed.
Comment 23•10 years ago
|
||
I checked some logs to confirm we've resolved all branches & builds:
# stage
cd /pub/mozilla.org/b2g/tinderbox/builds
zgrep -c 'output has been truncated' */1410*/*-bm*-build1-build*.txt.gz > file
There are 3578 jobs in that covering 2.75 days, of which 239 (6%) end up truncated, mostly not by much.
There is consistent truncation for mozilla-b2g32_v2_0 emulator opt/nightly/debug.
And intermittent truncation for
* m-c & peers: emulator & emulator-kk, non-unified, opt & debug, most builds (unified is ok)
* same for m-a, except it's dep only (we don't unify there, so no non-unified)
* also on m-a: flame nightlies for opt, flame nightly & dep for eng (but flame-kk is fine); hamachi opt & eng nightlies
I'm wondering if there is some other low hanging fruit, like some the l10n output or buildsymbols warnings.
Assignee | ||
Comment 24•10 years ago
|
||
$ curl -L "https://tbpl.mozilla.org/php/getParsedLog.php?id=47640747&tree=Mozilla-Central&full=1" | gzip -d > log
$ egrep 'INFO|ERROR|WARNING' log | wc -l
86674
That's 90,000 "17:11:59 WARNING - " 21 char strings. That's 1,845KB of output.
I would argue that we could compress that format down to something much smaller and still get useful information:
171159E for error, 171159W for warning and 171159I for info. That's only 615KB. If we got rid of the seconds, that's only 439K of output.
Also, the build directory is "/builds/slave/b2g_m-cen_emu_dep-000000000000". That gets repeated a bunch of times because of how we invoke the compiler.
jhford:~ $ sed s/b2g_m-cen_emu_dep-000000000000//g < log > smalllog
jhford:~ $ du log smalllog
58528 log
48288 smalllog
There's another 10MB. Just getting rid of the 0s, we reduce by 4MB
jhford:~ $ sed s/b2g_m-cen_emu_dep-000000000000/b2g_m-cen_emu_dep/g < log > smalllog
jhford:~ $ du log smalllog
58528 log
54088 smalllog
If we hash the name of that directory and take only the first 7 chars, we get a huge improvement!
jhford:~ $ sed s/b2g_m-cen_emu_dep-000000000000/c1c1a17/g < log > smalllog
jhford:~ $ du log smalllog
58528 log
50672 smalllog
At the time when the hashes are generated, we could have a safety check that we aren't assigning the same shortened hash to two different build directories in the same master process.
jhford:~ $ cat safety.py
import sys
import hashlib
usedHashes = []
def getDir(name):
h = hashlib.new('sha1')
h.update(name)
h = h.hexdigest()
if h in usedHashes:
sys.stderr.write('ERROR: There is a hash collision!')
raise Exception('Short hash collision!')
else:
usedHashes.append(h)
return h
for i in sys.argv[1:]:
print getDir(i);
jhford:~ $ python safety.py john ford
a51dda7c7ff50b61eaea0444371f4a6a9301e501
d6edd79fd69252cd1adb811d1e99a7398dd6f53a
jhford:~ $ python safety.py john ford john
a51dda7c7ff50b61eaea0444371f4a6a9301e501
d6edd79fd69252cd1adb811d1e99a7398dd6f53a
ERROR: There is a hash collision!Traceback (most recent call last):
File "safety.py", line 18, in <module>
print getDir(i);
File "safety.py", line 12, in getDir
raise Exception('Short hash collision!')
Exception: Short hash collision!
Here's a bunch of low hanging fruit that I think could give pretty good wins.
Comment 25•10 years ago
|
||
Thanks for the suggestions John. I found something else which accounts for a lot of bits which is a less intrusive thing to change:
eg 15:55:15 INFO - /builds/slave/b2g_m-aurora_flm-kk_ntly-00000/build/prebuilts/gcc/linux-x86/arm/arm-eabi-4.7/bin/../lib/gcc/arm-eabi/4.7/../../../../arm-eabi/bin/as: /lib64/libz.so.1: no version information available (required by /builds/slave/b2g_m-aurora_flm-kk_ntly-00000/build/prebuilts/gcc/linux-x86/arm/arm-eabi-4.7/bin/../lib/gcc/arm-eabi/4.7/../../../../arm-eabi/bin/as)
Filed bug 1072026 to suppress this.
Comment 26•10 years ago
|
||
I reran the check in comment #23, and we're only truncating on emulator builds for mozilla-b2g34_v2_1, mozilla-b2g34_v2_1s, and mozilla-b2g32_v2_0. Uplifting bug 1072026 to those branches should fix that up.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•