Closed Bug 1226276 Opened 9 years ago Closed 9 years ago

l10n taggging fails when running 'hg branches' against clean checkout of certain locale repos

Categories

(Release Engineering :: Release Automation, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlund, Unassigned)

References

Details

Attachments

(1 file)

three retriggers, different slaves, same failure: http://buildbot-master73.bb.releng.usw2.mozilla.com:8001/builders/release-comm-esr38-thunderbird_tag_l10n/builds/1/steps/run_script/logs/stdio running the last few cmds on one of the failing machines succeeds cleanly.. [cltbld@bld-linux64-spot-1096.build.releng.usw2.mozilla.com nb-NO]$ mock_mozilla -r mozilla-centos6-x86_64 --cwd /builds/slave/rel-c-esr38-tb_tag_l10n-000000/nb-NO/. --unpriv --s hell '/usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" BUILDBOT_CONFIGS="https://hg.mozilla.org/build/buildbot-configs" CLOBBERER_URL="https://api.pub.build.mozilla.org/clobber er/forceclobber" BUILDBOTCUSTOM="https://hg.mozilla.org/build/buildbotcustom" PROPERTIES_FILE="/builds/slave/rel-c-esr38-tb_tag_l10n-000000/buildprops.json" PATH="/tools/buildbot /bin:/usr/local/bin:/usr/lib/ccache:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/tools/git/bin:/tools/python27/bin:/tools/python27-mercurial/bin:/home/cltbld/bin" EXTRA_DATA="/ builds/slave/rel-c-esr38-tb_tag_l10n-000000/data.json" hg branches -c' INFO: mock_mozilla.py version 1.0.3 starting... State Changed: init plugins INFO: selinux disabled State Changed: start State Changed: lock buildroot State Changed: shell MOBILE4201_2015111905_RELBRANCH 2818:2c65acd17612 THUNDERBIRD3840_2015111302_RELBRANCH 2817:f9901dbca796 THUNDERBIRD3840_2015110918_RELBRANCH 2816:7997911655a3 SEA_COMM420_20151103_RELBRANCH 2815:1526960facbe GECKO420_2015102918_RELBRANCH 2813:4af100eb71bf ... etc [cltbld@bld-linux64-spot-1096.build.releng.usw2.mozilla.com nb-NO]$ mock_mozilla -r mozilla-centos6-x86_64 --cwd /builds/slave/rel-c-esr38-tb_tag_l10n-000000/nb-NO/. --unpriv --s hell '/usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" BUILDBOT_CONFIGS="https://hg.mozilla.org/build/buildbot-configs" CLOBBERER_URL="https://api.pub.build.mozilla.org/clobber er/forceclobber" BUILDBOTCUSTOM="https://hg.mozilla.org/build/buildbotcustom" PROPERTIES_FILE="/builds/slave/rel-c-esr38-tb_tag_l10n-000000/buildprops.json" PATH="/tools/buildbot /bin:/usr/local/bin:/usr/lib/ccache:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/tools/git/bin:/tools/python27/bin:/tools/python27-mercurial/bin:/home/cltbld/bin" EXTRA_DATA="/ builds/slave/rel-c-esr38-tb_tag_l10n-000000/data.json" hg update -C -r cd8316ad9001' INFO: mock_mozilla.py version 1.0.3 starting... State Changed: init plugins INFO: selinux disabled State Changed: start State Changed: lock buildroot State Changed: shell 149 files updated, 0 files merged, 20 files removed, 0 files unresolved State Changed: unlock buildroot at this point I'm not sure what's wrong..
Blocks: 1223879
I can't see anything obvious either. It may be worth trying to use strace while it's hung.
re-triggered again and followed with strace: [root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# ps auxwf | grep 10382 root 11873 0.0 0.0 103240 864 pts/4 S+ 07:57 0:00 \_ grep 10382 cltbld 10382 0.1 0.0 187512 10436 ? S 07:29 0:01 \_ /usr/bin/python /builds/slave/rel-c-esr38-tb_tag_l10n-000000/scripts/scripts/release/tag-release.py -c mozilla/release-thunderbird-comm-esr38.py -b https://hg.mozilla.org/build/buildbot-configs -t THUNDERBIRD_38_4_0_RELEASE --tag-l10n [root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# strace -fF -p 10382 -o tagging_sub.log which eventually hung as expected on nb-NO running 'hg branches -c' so I inspected that: Process 11838 attached [root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# strace -p 11838 Process 11838 attached - interrupt to quit write(1, "nactive)\n", 9) = 9 then, just by poking the 1 fd in that proc, oddly (very oddly!) that helped kick it back into action: [root@bld-linux64-spot-381.build.releng.usw2.mozilla.com ~]# cd /proc/11838/fd [root@bld-linux64-spot-381.build.releng.usw2.mozilla.com fd]# cat 1 MOBILE4201_2015111905_RELBRANCH 2818:2c65acd17612 THUNDERBIRD3840_2015111302_RELBRANCH 2817:f9901dbca796 .... branch output as soon as I cat'd that fd, proc 11838 came back to life! rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f6e04e504a0}, {0x7f6e0518fe50, [], SA_RESTORER, 0x7f6e04e504a0}, 8) = 0 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7f6e04e504a0}, {0x7f6e0518fe50, [], SA_RESTORER, 0x7f6e04e504a0}, 8) = 0 rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7f6e04e504a0}, {0x7f6e0518fe50, [], SA_RESTORER, 0x7f6e04e504a0}, 8) = 0 exit_group(0) = ? Process 11838 detached and the log continued to go. Talking to rail, he thinks there could be some weird flushing/buffering bug in hg at 3.1.2 at any rate, the job seems to be progressing now :\
this is happening again. same repo. different product (fennec) release. investigating..
Summary: tb 38.4.0 is failing to tag nb-NO locale repo with release tag → l10n taggging fails when running 'hg branches' against clean checkout of nb-NO locale repo
Depends on: 1229494
Summary: l10n taggging fails when running 'hg branches' against clean checkout of nb-NO locale repo → l10n taggging fails when running 'hg branches' against clean checkout of certain locale repos
we hit this on about 10 more locale repos. I suppose nb-NO was a bit more bloated with branches, tags, etc. This requires manual surveillance of the job until it completes.
this is actually not because of hg or OS. it is our implementation of running the cmd through subprocess: https://dxr.mozilla.org/build-central/source/tools/lib/python/util/commands.py#186 will upload a patch that changes from poll();stdout.read() logic to something that looks like communicate() but allows us to still keep a timeout.
played with this today. this patch passes tests defined in: https://dxr.mozilla.org/build-central/source/tools/lib/python/mozilla_buildtools/test/test_util_commands.py#74 the patch will look weird and big but it's only because I added a try/catch which bumped the whole indent. here is the same patch without try/catch: http://people.mozilla.org/~jlund/151215_bug_1226276_tools_get_output-no-try-catch.diff this patch 1) uses a tempfile for storing output. 2) stops using communicate() when timeout is reached. I don't think it's needed and besides, I believe there was a second bug in get_output where stderrdata[1] was always going to be empty because when we do include stderr output, we just mix it in with stdout[2] [1] https://dxr.mozilla.org/build-central/source/tools/lib/python/util/commands.py#213 [2] https://dxr.mozilla.org/build-central/source/tools/lib/python/util/commands.py#177
Attachment #8698715 - Flags: review?(catlee)
Comment on attachment 8698715 [details] [diff] [review] 151215_bug_1226276_tools_get_output_buffer_fix.diff Let's land this, lgtm. TemporaryFile has a nice feature to remove the file, whenever it's closed.
Attachment #8698715 - Flags: review?(catlee) → review+
Comment on attachment 8698715 [details] [diff] [review] 151215_bug_1226276_tools_get_output_buffer_fix.diff thanks https://hg.mozilla.org/build/tools/rev/fc9b6d055d7e
Attachment #8698715 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: