Closed
Bug 1226276
Opened 9 years ago
Closed 9 years ago
l10n taggging fails when running 'hg branches' against clean checkout of certain locale repos
Categories
(Release Engineering :: Release Automation, defect)
Release Engineering
Release Automation
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlund, Unassigned)
References
Details
Attachments
(1 file)
|
4.00 KB,
patch
|
rail
:
review+
jlund
:
checked-in+
|
Details | Diff | Splinter Review |
three retriggers, different slaves, same failure: http://buildbot-master73.bb.releng.usw2.mozilla.com:8001/builders/release-comm-esr38-thunderbird_tag_l10n/builds/1/steps/run_script/logs/stdio
running the last few cmds on one of the failing machines succeeds cleanly..
[cltbld@bld-linux64-spot-1096.build.releng.usw2.mozilla.com nb-NO]$ mock_mozilla -r mozilla-centos6-x86_64 --cwd /builds/slave/rel-c-esr38-tb_tag_l10n-000000/nb-NO/. --unpriv --s
hell '/usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" BUILDBOT_CONFIGS="https://hg.mozilla.org/build/buildbot-configs" CLOBBERER_URL="https://api.pub.build.mozilla.org/clobber
er/forceclobber" BUILDBOTCUSTOM="https://hg.mozilla.org/build/buildbotcustom" PROPERTIES_FILE="/builds/slave/rel-c-esr38-tb_tag_l10n-000000/buildprops.json" PATH="/tools/buildbot
/bin:/usr/local/bin:/usr/lib/ccache:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/tools/git/bin:/tools/python27/bin:/tools/python27-mercurial/bin:/home/cltbld/bin" EXTRA_DATA="/
builds/slave/rel-c-esr38-tb_tag_l10n-000000/data.json" hg branches -c'
INFO: mock_mozilla.py version 1.0.3 starting...
State Changed: init plugins
INFO: selinux disabled
State Changed: start
State Changed: lock buildroot
State Changed: shell
MOBILE4201_2015111905_RELBRANCH 2818:2c65acd17612
THUNDERBIRD3840_2015111302_RELBRANCH 2817:f9901dbca796
THUNDERBIRD3840_2015110918_RELBRANCH 2816:7997911655a3
SEA_COMM420_20151103_RELBRANCH 2815:1526960facbe
GECKO420_2015102918_RELBRANCH 2813:4af100eb71bf
... etc
[cltbld@bld-linux64-spot-1096.build.releng.usw2.mozilla.com nb-NO]$ mock_mozilla -r mozilla-centos6-x86_64 --cwd /builds/slave/rel-c-esr38-tb_tag_l10n-000000/nb-NO/. --unpriv --s
hell '/usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" BUILDBOT_CONFIGS="https://hg.mozilla.org/build/buildbot-configs" CLOBBERER_URL="https://api.pub.build.mozilla.org/clobber
er/forceclobber" BUILDBOTCUSTOM="https://hg.mozilla.org/build/buildbotcustom" PROPERTIES_FILE="/builds/slave/rel-c-esr38-tb_tag_l10n-000000/buildprops.json" PATH="/tools/buildbot
/bin:/usr/local/bin:/usr/lib/ccache:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/tools/git/bin:/tools/python27/bin:/tools/python27-mercurial/bin:/home/cltbld/bin" EXTRA_DATA="/
builds/slave/rel-c-esr38-tb_tag_l10n-000000/data.json" hg update -C -r cd8316ad9001'
INFO: mock_mozilla.py version 1.0.3 starting...
State Changed: init plugins
INFO: selinux disabled
State Changed: start
State Changed: lock buildroot
State Changed: shell
149 files updated, 0 files merged, 20 files removed, 0 files unresolved
State Changed: unlock buildroot
at this point I'm not sure what's wrong..
Comment 1•9 years ago
|
||
I can't see anything obvious either. It may be worth trying to use strace while it's hung.
| Reporter | ||
Comment 2•9 years ago
|
||
re-triggered again and followed with strace:
[root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# ps auxwf | grep 10382
root 11873 0.0 0.0 103240 864 pts/4 S+ 07:57 0:00 \_ grep 10382
cltbld 10382 0.1 0.0 187512 10436 ? S 07:29 0:01 \_ /usr/bin/python /builds/slave/rel-c-esr38-tb_tag_l10n-000000/scripts/scripts/release/tag-release.py -c mozilla/release-thunderbird-comm-esr38.py -b https://hg.mozilla.org/build/buildbot-configs -t THUNDERBIRD_38_4_0_RELEASE --tag-l10n
[root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# strace -fF -p 10382 -o tagging_sub.log
which eventually hung as expected on nb-NO running 'hg branches -c' so I inspected that:
Process 11838 attached
[root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# strace -p 11838
Process 11838 attached - interrupt to quit
write(1, "nactive)\n", 9) = 9
then, just by poking the 1 fd in that proc, oddly (very oddly!) that helped kick it back into action:
[root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# cd /proc/11838/fd
[root@bld-linux64-spot-381.build.releng.usw2.mozilla.com fd]# cat 1
MOBILE4201_2015111905_RELBRANCH 2818:2c65acd17612
THUNDERBIRD3840_2015111302_RELBRANCH 2817:f9901dbca796
.... branch output
as soon as I cat'd that fd, proc 11838 came back to life!
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f6e04e504a0}, {0x7f6e0518fe50, [], SA_RESTORER, 0x7f6e04e504a0}, 8) = 0
rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7f6e04e504a0}, {0x7f6e0518fe50, [], SA_RESTORER, 0x7f6e04e504a0}, 8) = 0
rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7f6e04e504a0}, {0x7f6e0518fe50, [], SA_RESTORER, 0x7f6e04e504a0}, 8) = 0
exit_group(0) = ?
Process 11838 detached
and the log continued to go. Talking to rail, he thinks there could be some weird flushing/buffering bug in hg at 3.1.2
at any rate, the job seems to be progressing now :\
| Reporter | ||
Comment 3•9 years ago
|
||
this is happening again. same repo. different product (fennec) release. investigating..
Summary: tb 38.4.0 is failing to tag nb-NO locale repo with release tag → l10n taggging fails when running 'hg branches' against clean checkout of nb-NO locale repo
| Reporter | ||
Comment 4•9 years ago
|
||
this is happening on 'ca' locale now: http://buildbot-master72.bb.releng.usw2.mozilla.com:8001/builders/release-mozilla-release-firefox_tag_l10n/builds/6
Summary: l10n taggging fails when running 'hg branches' against clean checkout of nb-NO locale repo → l10n taggging fails when running 'hg branches' against clean checkout of certain locale repos
| Reporter | ||
Comment 5•9 years ago
|
||
we hit this on about 10 more locale repos. I suppose nb-NO was a bit more bloated with branches, tags, etc. This requires manual surveillance of the job until it completes.
| Reporter | ||
Comment 6•9 years ago
|
||
this is actually not because of hg or OS. it is our implementation of running the cmd through subprocess: https://dxr.mozilla.org/build-central/source/tools/lib/python/util/commands.py#186
will upload a patch that changes from poll();stdout.read() logic to something that looks like communicate() but allows us to still keep a timeout.
| Reporter | ||
Comment 7•9 years ago
|
||
played with this today.
this patch passes tests defined in: https://dxr.mozilla.org/build-central/source/tools/lib/python/mozilla_buildtools/test/test_util_commands.py#74
the patch will look weird and big but it's only because I added a try/catch which bumped the whole indent. here is the same patch without try/catch: http://people.mozilla.org/~jlund/151215_bug_1226276_tools_get_output-no-try-catch.diff
this patch
1) uses a tempfile for storing output.
2) stops using communicate() when timeout is reached. I don't think it's needed and besides, I believe there was a second bug in get_output where stderrdata[1] was always going to be empty because when we do include stderr output, we just mix it in with stdout[2]
[1] https://dxr.mozilla.org/build-central/source/tools/lib/python/util/commands.py#213
[2] https://dxr.mozilla.org/build-central/source/tools/lib/python/util/commands.py#177
Attachment #8698715 -
Flags: review?(catlee)
Comment 8•9 years ago
|
||
Comment on attachment 8698715 [details] [diff] [review]
151215_bug_1226276_tools_get_output_buffer_fix.diff
Let's land this, lgtm. TemporaryFile has a nice feature to remove the file, whenever it's closed.
Attachment #8698715 -
Flags: review?(catlee) → review+
| Reporter | ||
Comment 9•9 years ago
|
||
Comment on attachment 8698715 [details] [diff] [review]
151215_bug_1226276_tools_get_output_buffer_fix.diff
thanks
https://hg.mozilla.org/build/tools/rev/fc9b6d055d7e
Attachment #8698715 -
Flags: checked-in+
| Reporter | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•