Closed Bug 1297786 Opened 4 years ago Closed 3 years ago

Thunderbird 49.0b1 mac repacks failing

Categories

(Thunderbird :: Build Config, defect, blocker)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 49.0

People

(Reporter: wsmwk, Unassigned)

References

Details

Attachments

(3 files)

From  Bug #1286440 comment 89

Mac repack log below from 49.0b1 build 4.

https://archive.mozilla.org/pub/thunderbird/candidates/49.0b1-candidates/build4/logs/release-comm-beta-macosx64_repack_1-bm84-build1-build20.txt.gz
command: START
command: make export
command: cwd: comm-beta/obj-l10n/config
command: env: {'MOZ_MAKE_COMPLETE_MAR': '1', 'MOZ_SIGN_CMD': 'python /builds/slave/tb-rel-c-beta-m64_rpk_1-000000/scripts/release/signing/signtool.py --cachedir /builds/slave/tb-rel-c-beta-m64_rpk_1-000000/signing_cache -t /builds/slave/tb-rel-c-beta-m64_rpk_1-000000/token -n /builds/slave/tb-rel-c-beta-m64_rpk_1-000000/nonce -c /builds/slave/tb-rel-c-beta-m64_rpk_1-000000/scripts/release/signing/host.cert -H gpg:sha2signcode:osslsigncode:signcode:mar:jar:emevoucher:signing4.srv.releng.scl3.mozilla.com:9120 -H gpg:sha2signcode:osslsigncode:signcode:mar:jar:emevoucher:signing5.srv.releng.scl3.mozilla.com:9120 -H gpg:sha2signcode:osslsigncode:signcode:mar:jar:emevoucher:signing6.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing1.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing2.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing3.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing4.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing6.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing7.srv.releng.scl3.mozilla.com:9120', 'UPLOAD_SSH_KEY': '~/.ssh/tbirdbld_dsa', 'COMM_REV': 'THUNDERBIRD_49_0b1_RELEASE', 'MOZ_OBJDIR': 'obj-l10n', 'UPLOAD_TO_TEMP': '1', 'LD_LIBRARY_PATH': '', 'DOWNLOAD_HOST': 'archive.mozilla.org', 'UPLOAD_USER': 'tbirdbld', 'UPLOAD_HOST': 'upload.tbirdbld.productdelivery.prod.mozaws.net', 'POST_UPLOAD_CMD': 'post_upload.py -p thunderbird -n 4 -v 49.0b1 --release-to-candidates-dir --signed --bucket-prefix net-mozaws-prod-delivery', 'MOZILLA_REV': 'THUNDERBIRD_49_0b1_RELEASE', 'MOZ_PKG_VERSION': '49.0b1', 'MBSDIFF_HOOK': '/builds/slave/tb-rel-c-beta-m64_rpk_1-000000/scripts/scripts/l10n/mbsdiff_hook.sh -c /builds/slave/tb-rel-c-beta-m64_rpk_1-000000/fs-cache', 'MOZ_PKG_PRETTYNAMES': '1'}
command: output:
command: END (0.08s elapsed)

Traceback (most recent call last):
  File "/builds/slave/tb-rel-c-beta-m64_rpk_1-000000/scripts/scripts/l10n/create-release-repacks.py", line 394, in <module>
    bucket_prefix=branchConfig['bucket_prefix'],
  File "/builds/slave/tb-rel-c-beta-m64_rpk_1-000000/scripts/scripts/l10n/create-release-repacks.py", line 108, in createRepacks
    makeDirs, env, tooltoolManifest, tooltool_script, tooltool_urls)
  File "/builds/slave/tb-rel-c-beta-m64_rpk_1-000000/scripts/lib/python/build/l10n.py", line 100, in l10nRepackPrep
    env=env)
  File "/builds/slave/tb-rel-c-beta-m64_rpk_1-000000/scripts/lib/python/util/commands.py", line 52, in run_cmd
    return subprocess.check_call(cmd, **kwargs)
  File "/tools/python27/lib/python2.7/subprocess.py", line 506, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/tools/python27/lib/python2.7/subprocess.py", line 493, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/tools/python27/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/tools/python27/lib/python2.7/subprocess.py", line 1249, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory: 'comm-beta/obj-l10n/config'
program finished with exit code 1
elapsedTime=628.490386
Blocks: 1297787
(In reply to Edmund Wong (:ewong) from comment #3)
> Rationale for the two patches:
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1286440#c73

I guess the part I don't like about this is the fact that we don't really know why there is bustage. Or at least we don't know why it wasn't happening before..
Comment on attachment 8785113 [details] [diff] [review]
[buildbotcustom] proposed patch (v1)

Review of attachment 8785113 [details] [diff] [review]:
-----------------------------------------------------------------

sorry for taking >1 day for this review.

I think I'm up to speed. thanks for the details of how you arrived to this patch. It helped.

One thing I can confirm is that this code should only used by TB since all desktop releases use release promotion which leverage mozharness (desktop_l10n.py)[1] for repacks and, as you mentioned, we key off android above this patch.


But I have two main concerns here:

1) iiuc, this will break standalone repack builders unless we patch that too: http://hg.mozilla.org/build/buildbotcustom/file/tip/process/release.py#l735

2) re: comment 4, we don't know exactly why this is failing now and wasn't before. I think it would be good to have someone from build sys work through this with you. I could sign off on it but I don't know the internals..

[1] http://hg.mozilla.org/build/buildbotcustom/file/tip/process/release.py#l1823
Attachment #8785113 - Flags: review?(jlund) → review-
Comment on attachment 8785114 [details] [diff] [review]
[tools] add objdir parameter to release_repacks.sh

Review of attachment 8785114 [details] [diff] [review]:
-----------------------------------------------------------------

see comment 5 for details.

I don't have any additional concerns related to this patch
Attachment #8785114 - Flags: review?(jlund)
Relevant part of the corresponding 45.3 log:
https://archive.mozilla.org/pub/thunderbird/candidates/45.3.0-candidates/build1/logs/release-comm-esr45-macosx64_repack_1-bm86-build1-build1.txt.gz

command: START
command: make export
command: cwd: comm-esr45/obj-l10n/config
command: env: {'MOZ_MAKE_COMPLETE_MAR': '1', 'MOZ_SIGN_CMD': 'python /builds/slave/tb-rel-c-esr45-m64_rpk_1-00000/scripts/release/signing/signtool.py --cachedir /builds/slave/tb-rel-c-esr45-m64_rpk_1-00000/signing_cache -t /builds/slave/tb-rel-c-esr45-m64_rpk_1-00000/token -n /builds/slave/tb-rel-c-esr45-m64_rpk_1-00000/nonce -c /builds/slave/tb-rel-c-esr45-m64_rpk_1-00000/scripts/release/signing/host.cert -H gpg:sha2signcode:osslsigncode:signcode:mar:jar:emevoucher:signing4.srv.releng.scl3.mozilla.com:9120 -H gpg:sha2signcode:osslsigncode:signcode:mar:jar:emevoucher:signing5.srv.releng.scl3.mozilla.com:9120 -H gpg:sha2signcode:osslsigncode:signcode:mar:jar:emevoucher:signing6.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing1.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing2.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing3.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing4.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing6.srv.releng.scl3.mozilla.com:9120 -H dmgv2:mac-v2-signing7.srv.releng.scl3.mozilla.com:9120', 'UPLOAD_SSH_KEY': '~/.ssh/tbirdbld_dsa', 'COMM_REV': 'THUNDERBIRD_45_3_0_RELEASE', 'MOZ_OBJDIR': 'obj-l10n', 'UPLOAD_TO_TEMP': '1', 'LD_LIBRARY_PATH': '', 'DOWNLOAD_HOST': 'archive.mozilla.org', 'UPLOAD_USER': 'tbirdbld', 'UPLOAD_HOST': 'upload.tbirdbld.productdelivery.prod.mozaws.net', 'POST_UPLOAD_CMD': 'post_upload.py -p thunderbird -n 1 -v 45.3.0 --release-to-candidates-dir --signed --bucket-prefix net-mozaws-prod-delivery', 'MOZILLA_REV': 'THUNDERBIRD_45_3_0_RELEASE', 'MBSDIFF_HOOK': '/builds/slave/tb-rel-c-esr45-m64_rpk_1-00000/scripts/scripts/l10n/mbsdiff_hook.sh -c /builds/slave/tb-rel-c-esr45-m64_rpk_1-00000/fs-cache', 'MOZ_PKG_PRETTYNAMES': '1'}
(In reply to Jordan Lund (:jlund) from comment #4)
> I guess the part I don't like about this is the fact that we don't really
> know why there is bustage. Or at least we don't know why it wasn't happening
> before..

I wonder if it's something about this hack that broke...
https://hg.mozilla.org/build/tools/file/tip/scripts/l10n/create-release-repacks.py#l116
This implements the approach I suggested over in bug 1268440 - include the mozconfig which sets compiler environment variables, so that configure finds clang. Since it's not doing a universal build we can put --with-l10n-base back, and make config doesn't need any changes. When I ran this on a mac worker it repacked a locale without any issue (up to the point it would have uploaded).

FTR setting --disable-compile-environment also works (also removing the universal mozconfig, and fixing --with-l10n-base). 'make export' in config is a no-op, and config/nsinstall.py is used instead of the binary.
Comment on attachment 8786258 [details] [diff] [review]
[comm-beta] Include mozconfig which exports CC/CXX etc

(nthomas' comment in tb-drivers sounds like a review is wanted)
Attachment #8786258 - Flags: review?(jlund)
Wouldn't this normally be reviewed by a mail person ?
Comment on attachment 8786258 [details] [diff] [review]
[comm-beta] Include mozconfig which exports CC/CXX etc

indeed
Attachment #8786258 - Flags: review?(jlund) → review?(aleth)
Comment on attachment 8786258 [details] [diff] [review]
[comm-beta] Include mozconfig which exports CC/CXX etc

Review of attachment 8786258 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not sure how it fixes the same issue as ewong's patches, but it looks fine. Let's try it!
Attachment #8786258 - Flags: review?(aleth) → review+
Comment on attachment 8786258 [details] [diff] [review]
[comm-beta] Include mozconfig which exports CC/CXX etc

[Approval Request Comment]
Risk to taking this patch (and alternatives if risky): bustage fix
Attachment #8786258 - Flags: approval-comm-beta?
Attachment #8786258 - Flags: approval-comm-aurora?
(In reply to Nick Thomas [:nthomas] from comment #9)
> This implements the approach I suggested over in bug 1268440 - include the
> mozconfig which sets compiler environment variables, so that configure finds
> clang. Since it's not doing a universal build we can put --with-l10n-base
> back, and make config doesn't need any changes. When I ran this on a mac
> worker it repacked a locale without any issue (up to the point it would have
> uploaded).
> 
> FTR setting --disable-compile-environment also works (also removing the
> universal mozconfig, and fixing --with-l10n-base). 'make export' in config
> is a no-op, and config/nsinstall.py is used instead of the binary.

So to summarize, we now have for the universal build (macosx-universal/l10n-mozconfig)

ac_add_options --with-l10n-base=../../../l10n
ac_add_options --disable-compile-environment

and for macosx64/l10n-mozconfig 

. $topsrcdir/build/macosx/local-mozconfig.common
ac_add_options --with-l10n-base=../../l10n

I am a bit confused why the latter mozconfig is used by beta rc OSX builds, i.e. why it affects the bug in the description. Aren't all release OSX builds universal opt builds?
Flags: needinfo?(nthomas)
Comment on attachment 8786258 [details] [diff] [review]
[comm-beta] Include mozconfig which exports CC/CXX etc

C-A (TB 50): https://hg.mozilla.org/releases/comm-aurora/rev/633bca365d8b
C-B (TB 49): https://hg.mozilla.org/releases/comm-beta/rev/bf49084c3c9e
Attachment #8786258 - Flags: approval-comm-beta?
Attachment #8786258 - Flags: approval-comm-beta+
Attachment #8786258 - Flags: approval-comm-aurora?
Attachment #8786258 - Flags: approval-comm-aurora+
Much confusion here has come from the idea that macosx-universal/l10n-mozconfig is being used for releases, but it isn't. I see aurora nightlies are using the universal config, so this is understandable.

The long version..... In the release case, the mozconfig is determined by http://hg.mozilla.org/build/tools/file/default/scripts/l10n/create-release-repacks.py#l294
  mozconfig = path.join(get_repo_dirname(sourceRepoInfo["path"]),
                          releaseConfig["appName"], "config", "mozconfigs",
                          platform, "l10n-mozconfig")

platform is an arg when that script is called (via release_repacks.sh), eg in https://archive.mozilla.org/pub/thunderbird/candidates/49.0b1-candidates/build4/logs/release-comm-beta-macosx64_repack_1-bm84-build1-build20.txt.gz
 bash scripts/scripts/l10n/release_repacks.sh macosx64 mozilla/production_config.py --chunks 10 --this-chunk 1 ....
...
 python /path/to/create-release-repacks.py ... -r mozilla/release-thunderbird-comm-beta.py ... -p macosx64 .....

This comes back (eventually) to http://hg.mozilla.org/build/buildbot-configs/file/production/mozilla/release-thunderbird-comm-beta.py#l95
  releaseConfig['l10nPlatforms']       = releaseConfig['enUSPlatforms']
so actually line 86
  releaseConfig['enUSPlatforms']       = ('linux', 'linux64', 'win32', 'macosx64')

ie macosx64.

In the case of central/aurora nightlies the repack is instead done by a 'buildbot factory', and the mozconfig comes from https://dxr.mozilla.org/build-central/source/buildbotcustom/misc.py#1723
   mozconfig = os.path.join(os.path.dirname(
                        pf['src_mozconfig']), 'l10n-mozconfig')
and https://dxr.mozilla.org/build-central/source/buildbot-configs/mozilla/thunderbird_config.py#280
   'src_mozconfig': 'mail/config/mozconfigs/macosx-universal/nightly', 

ie mail/config/mozconfigs/macosx-universal/l10n-mozconfig
Can I ask something completely silly?
You're distinguishing between "central/aurora nightlies" and "release" and it looks like "beta" falls into the release category. We recently shipped a 45.3.0 which was built without any problems. So why is beta causing so much trouble? How is "beta release" different to "ESR release"?
Indeed 'release' is short for release automation, so comm-beta & comm-esr45. 45.3.0 most likely went fine because the gecko code you're using is mozilla-esr45, and the build system changes down in m-c aren't merged there. In that sense beta is your little canary for new problems. FWIW, we used to have trouble with beta 1 for Firefox quite often, but we've managed to remove a lot of the code divergence between dep builds and releases (aka release promotion). Unfortunately for Thunderbird you're hitting all the merge issues in the older automation, when previously it was just the Thunderbird specific ones.
Flags: needinfo?(nthomas)
(In reply to Jorg K (GMT+2, PTO during summer) from comment #20)
> Can I ask something completely silly?
> You're distinguishing between "central/aurora nightlies" and "release" and
> it looks like "beta" falls into the release category. We recently shipped a
> 45.3.0 which was built without any problems. So why is beta causing so much
> trouble? How is "beta release" different to "ESR release"?

Nightlies and releases use different buildbot configs. The difference between the beta and the last ESR is mainly that ongoing build system changes can break things like l10n repacks in non-obvious ways that are not continuously made visible on treeherder.
(In reply to Nick Thomas [:nthomas] from comment #19)
> Much confusion here has come from the idea that
> macosx-universal/l10n-mozconfig is being used for releases, but it isn't. I
> see aurora nightlies are using the universal config, so this is
> understandable.

Thanks! That's very helpful. Is there a reason for this difference? (Especially as after beta is fine again, the next step will be to see how much of the beta l10n config fixes should be backported to aurora.)
I don't recall, and it would probably take a fair bit of archaeology to find out. From a quick look at an aurora log I'm not convinced the central/aurora jobs are really universal, whatever that might mean in the face of --disable-compile-environment.
Mac repacks are working in 49.0b1 build5  \o/
Awesome, thanks!
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 49.0
You need to log in before you can comment on or make changes to this bug.