Closed
Bug 1226276
Opened 9 years ago
Closed 8 years ago
l10n taggging fails when running 'hg branches' against clean checkout of certain locale repos
Categories
(Release Engineering :: Release Automation: Other, defect)
Release Engineering
Release Automation: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlund, Unassigned)
References
Details
Attachments
(1 file)
4.00 KB,
patch
|
rail
:
review+
jlund
:
checked-in+
|
Details | Diff | Splinter Review |
three retriggers, different slaves, same failure: http://buildbot-master73.bb.releng.usw2.mozilla.com:8001/builders/release-comm-esr38-thunderbird_tag_l10n/builds/1/steps/run_script/logs/stdio running the last few cmds on one of the failing machines succeeds cleanly.. [cltbld@bld-linux64-spot-1096.build.releng.usw2.mozilla.com nb-NO]$ mock_mozilla -r mozilla-centos6-x86_64 --cwd /builds/slave/rel-c-esr38-tb_tag_l10n-000000/nb-NO/. --unpriv --s hell '/usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" BUILDBOT_CONFIGS="https://hg.mozilla.org/build/buildbot-configs" CLOBBERER_URL="https://api.pub.build.mozilla.org/clobber er/forceclobber" BUILDBOTCUSTOM="https://hg.mozilla.org/build/buildbotcustom" PROPERTIES_FILE="/builds/slave/rel-c-esr38-tb_tag_l10n-000000/buildprops.json" PATH="/tools/buildbot /bin:/usr/local/bin:/usr/lib/ccache:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/tools/git/bin:/tools/python27/bin:/tools/python27-mercurial/bin:/home/cltbld/bin" EXTRA_DATA="/ builds/slave/rel-c-esr38-tb_tag_l10n-000000/data.json" hg branches -c' INFO: mock_mozilla.py version 1.0.3 starting... State Changed: init plugins INFO: selinux disabled State Changed: start State Changed: lock buildroot State Changed: shell MOBILE4201_2015111905_RELBRANCH 2818:2c65acd17612 THUNDERBIRD3840_2015111302_RELBRANCH 2817:f9901dbca796 THUNDERBIRD3840_2015110918_RELBRANCH 2816:7997911655a3 SEA_COMM420_20151103_RELBRANCH 2815:1526960facbe GECKO420_2015102918_RELBRANCH 2813:4af100eb71bf ... etc [cltbld@bld-linux64-spot-1096.build.releng.usw2.mozilla.com nb-NO]$ mock_mozilla -r mozilla-centos6-x86_64 --cwd /builds/slave/rel-c-esr38-tb_tag_l10n-000000/nb-NO/. --unpriv --s hell '/usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" BUILDBOT_CONFIGS="https://hg.mozilla.org/build/buildbot-configs" CLOBBERER_URL="https://api.pub.build.mozilla.org/clobber er/forceclobber" BUILDBOTCUSTOM="https://hg.mozilla.org/build/buildbotcustom" PROPERTIES_FILE="/builds/slave/rel-c-esr38-tb_tag_l10n-000000/buildprops.json" PATH="/tools/buildbot /bin:/usr/local/bin:/usr/lib/ccache:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/tools/git/bin:/tools/python27/bin:/tools/python27-mercurial/bin:/home/cltbld/bin" EXTRA_DATA="/ builds/slave/rel-c-esr38-tb_tag_l10n-000000/data.json" hg update -C -r cd8316ad9001' INFO: mock_mozilla.py version 1.0.3 starting... State Changed: init plugins INFO: selinux disabled State Changed: start State Changed: lock buildroot State Changed: shell 149 files updated, 0 files merged, 20 files removed, 0 files unresolved State Changed: unlock buildroot at this point I'm not sure what's wrong..
Comment 1•9 years ago
|
||
I can't see anything obvious either. It may be worth trying to use strace while it's hung.
Reporter | ||
Comment 2•9 years ago
|
||
re-triggered again and followed with strace: [root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# ps auxwf | grep 10382 root 11873 0.0 0.0 103240 864 pts/4 S+ 07:57 0:00 \_ grep 10382 cltbld 10382 0.1 0.0 187512 10436 ? S 07:29 0:01 \_ /usr/bin/python /builds/slave/rel-c-esr38-tb_tag_l10n-000000/scripts/scripts/release/tag-release.py -c mozilla/release-thunderbird-comm-esr38.py -b https://hg.mozilla.org/build/buildbot-configs -t THUNDERBIRD_38_4_0_RELEASE --tag-l10n [root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# strace -fF -p 10382 -o tagging_sub.log which eventually hung as expected on nb-NO running 'hg branches -c' so I inspected that: Process 11838 attached [root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# strace -p 11838 Process 11838 attached - interrupt to quit write(1, "nactive)\n", 9) = 9 then, just by poking the 1 fd in that proc, oddly (very oddly!) that helped kick it back into action: [root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# cd /proc/11838/fd [root@bld-linux64-spot-381.build.releng.usw2.mozilla.com fd]# cat 1 MOBILE4201_2015111905_RELBRANCH 2818:2c65acd17612 THUNDERBIRD3840_2015111302_RELBRANCH 2817:f9901dbca796 .... branch output as soon as I cat'd that fd, proc 11838 came back to life! rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f6e04e504a0}, {0x7f6e0518fe50, [], SA_RESTORER, 0x7f6e04e504a0}, 8) = 0 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7f6e04e504a0}, {0x7f6e0518fe50, [], SA_RESTORER, 0x7f6e04e504a0}, 8) = 0 rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7f6e04e504a0}, {0x7f6e0518fe50, [], SA_RESTORER, 0x7f6e04e504a0}, 8) = 0 exit_group(0) = ? Process 11838 detached and the log continued to go. Talking to rail, he thinks there could be some weird flushing/buffering bug in hg at 3.1.2 at any rate, the job seems to be progressing now :\
Reporter | ||
Comment 3•9 years ago
|
||
this is happening again. same repo. different product (fennec) release. investigating..
Summary: tb 38.4.0 is failing to tag nb-NO locale repo with release tag → l10n taggging fails when running 'hg branches' against clean checkout of nb-NO locale repo
Reporter | ||
Comment 4•9 years ago
|
||
this is happening on 'ca' locale now: http://buildbot-master72.bb.releng.usw2.mozilla.com:8001/builders/release-mozilla-release-firefox_tag_l10n/builds/6
Summary: l10n taggging fails when running 'hg branches' against clean checkout of nb-NO locale repo → l10n taggging fails when running 'hg branches' against clean checkout of certain locale repos
Reporter | ||
Comment 5•9 years ago
|
||
we hit this on about 10 more locale repos. I suppose nb-NO was a bit more bloated with branches, tags, etc. This requires manual surveillance of the job until it completes.
Reporter | ||
Comment 6•9 years ago
|
||
this is actually not because of hg or OS. it is our implementation of running the cmd through subprocess: https://dxr.mozilla.org/build-central/source/tools/lib/python/util/commands.py#186 will upload a patch that changes from poll();stdout.read() logic to something that looks like communicate() but allows us to still keep a timeout.
Reporter | ||
Comment 7•9 years ago
|
||
played with this today. this patch passes tests defined in: https://dxr.mozilla.org/build-central/source/tools/lib/python/mozilla_buildtools/test/test_util_commands.py#74 the patch will look weird and big but it's only because I added a try/catch which bumped the whole indent. here is the same patch without try/catch: http://people.mozilla.org/~jlund/151215_bug_1226276_tools_get_output-no-try-catch.diff this patch 1) uses a tempfile for storing output. 2) stops using communicate() when timeout is reached. I don't think it's needed and besides, I believe there was a second bug in get_output where stderrdata[1] was always going to be empty because when we do include stderr output, we just mix it in with stdout[2] [1] https://dxr.mozilla.org/build-central/source/tools/lib/python/util/commands.py#213 [2] https://dxr.mozilla.org/build-central/source/tools/lib/python/util/commands.py#177
Attachment #8698715 -
Flags: review?(catlee)
Comment 8•8 years ago
|
||
Comment on attachment 8698715 [details] [diff] [review] 151215_bug_1226276_tools_get_output_buffer_fix.diff Let's land this, lgtm. TemporaryFile has a nice feature to remove the file, whenever it's closed.
Attachment #8698715 -
Flags: review?(catlee) → review+
Reporter | ||
Comment 9•8 years ago
|
||
Comment on attachment 8698715 [details] [diff] [review] 151215_bug_1226276_tools_get_output_buffer_fix.diff thanks https://hg.mozilla.org/build/tools/rev/fc9b6d055d7e
Attachment #8698715 -
Flags: checked-in+
Reporter | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•