Closed
Bug 1008052
Opened 11 years ago
Closed 10 years ago
Mac repacks sometimes fail, due to signing server issue
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Unassigned)
References
Details
Attachments
(4 files)
42.44 KB,
text/plain
|
Details | |
921 bytes,
patch
|
bhearsum
:
review+
|
Details | Diff | Splinter Review |
1.11 KB,
patch
|
nthomas
:
checked-in+
|
Details | Diff | Splinter Review |
438 bytes,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
eg this for 30.0b3 repacks for mac, ko locale:
cd ./firefox && python /builds/slave/rel-m-beta-osx64_rpk_6-0000000/scripts/release/signing/signtool.py --cachedir /builds/slave/rel-m-beta-osx64_rpk_6-0000000/signing_cache -t /builds/slave/rel-m-beta-osx64_rpk_6-0000000/token -n /builds/slave/rel-m-beta-osx64_rpk_6-0000000/nonce -c /builds/slave/rel-m-beta-osx64_rpk_6-0000000/scripts/release/signing/host.cert -H mac-signing3.build.scl1.mozilla.com:9120 -H mac-signing4.build.scl1.mozilla.com:9120 -H mac-signing1.srv.releng.scl3.mozilla.com:9120 -H mac-signing2.srv.releng.scl3.mozilla.com:9120 -f dmg Firefox.app && rm Firefox.app/Contents/CodeResources && cp Firefox.app/Contents/_CodeSignature/CodeResources Firefox.app/Contents && cd /builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/obj-l10n/dist/l10n-stage && /builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/build/package/mac_osx/pkg-dmg --source 'firefox' --target 'mac/ko/Firefox 30.0b3.dmg' --volname 'Firefox' --copy '/builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/obj-l10n/dist/branding/dsstore:/.DS_Store' --mkdir /.background --copy '/builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/obj-l10n/dist/branding/background.png:/.background' --icon '/builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/obj-l10n/dist/branding/disk.icns' --symlink '/Applications:/ '
2014-05-08 14:47:46,950 - 518a7ff8d0ff9fb3629ecb680d763e6904d22c95: processing Firefox.app.tar.gz on https://mac-signing4.build.scl1.mozilla.com:9120
2014-05-08 14:47:49,597 - 518a7ff8d0ff9fb3629ecb680d763e6904d22c95: uploading for signing
2014-05-08 14:49:59,486 - 518a7ff8d0ff9fb3629ecb680d763e6904d22c95: processing Firefox.app.tar.gz on https://mac-signing4.build.scl1.mozilla.com:9120
2014-05-08 14:51:05,944 - 518a7ff8d0ff9fb3629ecb680d763e6904d22c95: processing Firefox.app.tar.gz on https://mac-signing4.build.scl1.mozilla.com:9120
2014-05-08 14:52:16,401 - 518a7ff8d0ff9fb3629ecb680d763e6904d22c95: OK
2014-05-08 14:52:16,401 - Copying Firefox.app.tar.gz to cache /builds/slave/rel-m-beta-osx64_rpk_6-0000000/signing_cache/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95
Firefox.app/Contents/MacOS/updater.app/Contents/MacOS/updater: (Empty error message)
tar: Error exit delayed from previous errors.
2014-05-08 14:52:17,631 - Error unpacking tar file /builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/obj-l10n/dist/l10n-stage/firefox/Firefox.app.tar.gz to /builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/obj-l10n/dist/l10n-stage/firefox
We also hit this the end of last week on Aurora when it was time to re-enable updates after the merge.
Looking in the signing cache for the log above (perms + ownership removed for brevity):
$ ls -lS
total 605984
63412144 8 May 14:37 4a7b103cd717df7c189fda40815507f1ade56d4d
63379898 8 May 14:26 557a8c19653236e6ed3f0ff40fe2d27e47a56627
63178808 8 May 14:43 cfb82d1f71ef47c3992b25d52302abdcb0b7d344
63166019 8 May 14:32 5e8da215d3d8c28770633352bd6f7c83edaeb7f1
57118720 8 May 14:52 518a7ff8d0ff9fb3629ecb680d763e6904d22c95
ie. the file is truncated. Also:
$ tar tfz 518a7ff8d0ff9fb3629ecb680d763e6904d22c95 | tail
Firefox.app/Contents/MacOS/updater.app/Contents/Resources/English.lproj/MainMenu.nib/classes.nib
Firefox.app/Contents/MacOS/updater.app/Contents/Resources/English.lproj/MainMenu.nib/info.nib
Firefox.app/Contents/MacOS/updater.app/Contents/Resources/English.lproj/MainMenu.nib/keyedobjects.nib
Firefox.app/Contents/MacOS/updater.app/Contents/MacOS/updater
tar: Truncated input file (needed 160768 bytes, only 0 available)
tar: Error exit delayed from previous errors.
Reporter | ||
Comment 1•11 years ago
|
||
Somehow this also causes later jobs to burn, even though we'd be signing a different tarball due to localized content. eg for the 30.0b3 example:
mv /builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/obj-l10n/dist/l10n-stage/firefox/Firefox.app/Contents/Resources/en.lproj /builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/obj-l10n/dist/l10n-stage/firefox/Firefox.app/Contents/Resources/ku.lproj
mv: rename /builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/obj-l10n/dist/l10n-stage/firefox/Firefox.app/Contents/Resources/en.lproj to /builds/slave/rel-m-beta-osx64_rpk_6-0000000/mozilla-beta/obj-l10n/dist/l10n-stage/firefox/Firefox.app/Contents/Resources/ku.lproj: No such file or directory
make[1]: *** [repackage-zip] Error 1
make: *** [repackage-zip-ku] Error 2
Full log:
http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/30.0b3-candidates/build1/logs/release-mozilla-beta-macosx64_repack_6-bm84-build1-build9.txt.gz
Reporter | ||
Comment 2•11 years ago
|
||
Grepping over the signing server log, we have a timeout on the signing call:
(egrep '^2014-05-08.*(12029|12334|10.26.52.74|518a7ff8d0ff9fb3629ecb680d763e6904d22c95)' signing.log.1)
# has it been signed before ?
2014-05-08 14:47:49,594 - INFO - 10.26.52.74 GET /sign/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95
2014-05-08 14:47:49,595 - DEBUG - Looking for /builds/signing/rel-key-signing-server/signed-files/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95
2014-05-08 14:47:49,595 - DEBUG - /builds/signing/rel-key-signing-server/signed-files/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95 is missing
2014-05-08 14:47:49,595 - INFO - 10.26.52.74 - - [2014-05-08 14:47:49] "GET /sign/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95 HTTP/1.1" 404 96 0.001664
# nope, I'll post it for signing then
2014-05-08 14:48:19,875 - INFO - 10.26.52.74 POST /sign/dmg
2014-05-08 14:49:42,059 - INFO - Request to dmg sign Firefox.app.tar.gz (518a7ff8d0ff9fb3629ecb680d763e6904d22c95) from 10.26.52.74
2014-05-08 14:49:58,355 - DEBUG - Putting (u'518a7ff8d0ff9fb3629ecb680d763e6904d22c95', u'Firefox.app.tar.gz', 'dmg', <gevent.event.Event object at 0x101344810>) on the queue
2014-05-08 14:49:58,447 - INFO - 10.26.52.74 - - [2014-05-08 14:49:58] "POST /sign/dmg HTTP/1.1" 202 98 98.572016
# signing starts here, I think
2014-05-08 14:49:58,577 - INFO - Signing Firefox.app.tar.gz (dmg - 518a7ff8d0ff9fb3629ecb680d763e6904d22c95)
2014-05-08 14:49:58,776 - DEBUG - 12029: ['/builds/signing/rel-key-signing-server/bin/python2.7', [redacted]
# is it signed yet ? no, but we're working on it
2014-05-08 14:50:04,579 - INFO - 10.26.52.74 GET /sign/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95
2014-05-08 14:51:04,753 - DEBUG - Looking for /builds/signing/rel-key-signing-server/signed-files/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95 (Firefox.app.tar.gz)
2014-05-08 14:51:04,881 - DEBUG - /builds/signing/rel-key-signing-server/signed-files/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95 is missing
2014-05-08 14:51:04,882 - DEBUG - Pending: {(u'518a7ff8d0ff9fb3629ecb680d763e6904d22c95', 'dmg'): <gevent.event.Event object at 0x101344810>, (u'c697447bd62f33c7e39ae18f6e7435beee14aee9', 'dmg'): <gevent.event.Event object at 0x101344cd0>}
# how about nao ? patience, grasshopper
2014-05-08 14:51:04,883 - INFO - 10.26.52.74 - - [2014-05-08 14:51:04] "GET /sign/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95 HTTP/1.1" 404 113 60.303492
2014-05-08 14:51:06,467 - INFO - 10.26.52.74 GET /sign/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95
# Y U take longer than two mins to sign !?!?
2014-05-08 14:51:58,910 - DEBUG - 12029: Exceeded timeout
2014-05-08 14:51:59,911 - INFO - 12029: run_signscript: Failed with rc -1; retrying in a bit
# try to sign again
2014-05-08 14:52:09,175 - DEBUG - 12334: ['/builds/signing/rel-key-signing-server/bin/python2.7', [redacted]
# partial content written ? definitely the wrong number of bytes returned
# we sent 57118859, slave got 57118720
2014-05-08 14:52:09,280 - DEBUG - Looking for /builds/signing/rel-key-signing-server/signed-files/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95 (Firefox.app.tar.gz)
2014-05-08 14:52:14,099 - DEBUG - /builds/signing/rel-key-signing-server/signed-files/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95 is OK
2014-05-08 14:52:16,144 - INFO - 10.26.52.74 - - [2014-05-08 14:52:16] "GET /sign/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95 HTTP/1.1" 200 57118859 69.679274
2014-05-08 14:54:10,192 - DEBUG - Got message: ('done', (u'518a7ff8d0ff9fb3629ecb680d763e6904d22c95', u'Firefox.app.tar.gz', 'dmg', <gevent.event.Event object at 0x101344810>), 'eb9e324616f6b0d92a47f0157734fa5eb3cf54e2')
2014-05-08 14:55:22,647 - INFO - Deleting /builds/signing/rel-key-signing-server/signed-files/dmg/518a7ff8d0ff9fb3629ecb680d763e6904d22c95.out with no unsigned file
Perhaps there is a problem handling the retry case. Might need to ensure that the destination file doesn't exist after killing the processes.
Chris, any idea what's going on ?
Flags: needinfo?(catlee)
Summary: Mac repacks sometimes fail → Mac repacks sometimes fail, due to signing server issue
Reporter | ||
Comment 3•11 years ago
|
||
FWIW, there are 161 matches to 'Exceeded timeout' for the nightly signing server on mac-signing4, over a period of 7 days. Seems to start on 2014-01-14 03:01:07; curiously bug 945751 landed the day before.
Reporter | ||
Comment 4•11 years ago
|
||
Other servers have different dates for first 'Exceeded timeout' messages, looking at just the nightly case:
mac-signing1: 2014-02-12 02:52:44
mac-signing2: 2014-01-14 03:00:26
mac-signing3: 2014-01-21 02:12:55
mac-signing4: 2014-01-14 03:01:07, 1383 times since
signing4: 2014-02-24, only 6 in total
signing5: 2014-01-14, only 14 in total
signing6: 2014-01-27, only 5 in total
We have less timeouts on mac-signing4's dep signer, but still 54 times in the last 3 days, so this seems be load related.
Comment 7•11 years ago
|
||
We may have hit a new symptom of this in 30.0b4, where a repack failed to unpack the signed complete mar:
command: START
command: perl ../../../tools/update-packaging/unwrap_full_update.pl /builds/slave/rel-m-beta-osx64_rpk_3-0000000/mozilla-beta/obj-l10n/dist/update/mac/en-GB/firefox-30.0b4.complete.mar
command: cwd: /builds/slave/rel-m-beta-osx64_rpk_3-0000000/mozilla-beta/obj-l10n/dist/current
command: env: {'MOZ_MAKE_COMPLETE_MAR': '1', 'AB_CD': 'en-GB', 'MOZ_SIGN_CMD': 'python /builds/slave/rel-m-beta-osx64_rpk_3-0000000/scripts/release/signing/signtool.py --cachedir /builds/slave/rel-m-beta-osx64_rpk_3-0000000/signing_cache -t /builds/slave/rel-m-beta-osx64_rpk_3-0000000/token -n /builds/slave/rel-m-beta-osx64_rpk_3-0000000/nonce -c /builds/slave/rel-m-beta-osx64_rpk_3-0000000/scripts/release/signing/host.cert -H mac-signing3.build.scl1.mozilla.com:9120 -H mac-signing4.build.scl1.mozilla.com:9120 -H mac-signing1.srv.releng.scl3.mozilla.com:9120 -H mac-signing2.srv.releng.scl3.mozilla.com:9120', 'UPLOAD_SSH_KEY': '~/.ssh/ffxbld_dsa', 'LOCALE_MERGEDIR': '/builds/slave/rel-m-beta-osx64_rpk_3-0000000/mozilla-beta/obj-l10n/browser/locales/merged', 'MOZ_PKG_PLATFORM': 'mac', 'COMM_REV': 'FIREFOX_30_0b4_RELEASE', 'MOZ_OBJDIR': 'obj-l10n', 'MAR': '/builds/slave/rel-m-beta-osx64_rpk_3-0000000/mozilla-beta/obj-l10n/dist/host/bin/mar', 'UPLOAD_TO_TEMP': '1', 'LD_LIBRARY_PATH': '', 'ZIP_IN': '/builds/slave/rel-m-beta-osx64_rpk_3-0000000/firefox.dmg', 'UPLOAD_USER': 'ffxbld', 'MBSDIFF': '/builds/slave/rel-m-beta-osx64_rpk_3-0000000/mozilla-beta/obj-l10n/dist/host/bin/mbsdiff', 'UPLOAD_EXTRA_FILES': 'update/mac/el/firefox-30.0b3-30.0b4.partial.mar update/mac/el/firefox-30.0b3-30.0b4.partial.mar.asc update/mac/el/firefox-30.0b2-30.0b4.partial.mar update/mac/el/firefox-30.0b2-30.0b4.partial.mar.asc', 'UPLOAD_HOST': 'stage.mozilla.org', 'POST_UPLOAD_CMD': 'post_upload.py -p firefox -n 1 -v 30.0b4 --release-to-candidates-dir --signed', 'MOZILLA_REV': 'FIREFOX_30_0b4_RELEASE', 'MOZ_PKG_PRETTYNAMES': '1', 'MOZ_PKG_VERSION': '30.0b4'}
command: output:
bzip2: Data integrity error when decompressing.
Input file = Contents/MacOS/XUL.bz2, output file = Contents/MacOS/XUL
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
bzip2: Deleting output file Contents/MacOS/XUL, if it exists.
Couldn't decompress "Contents/MacOS/XUL" at ../../../tools/update-packaging/unwrap_full_update.pl line 64.
Comment 8•11 years ago
|
||
Some signing server logs from the last failure:
signing.log.1:2014-05-13 08:09:31,825 - INFO - 10.26.52.108 GET /sign/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e
signing.log.1:2014-05-13 08:09:31,825 - DEBUG - Looking for /builds/signing/rel-key-signing-server/signed-files/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e
signing.log.1:2014-05-13 08:09:31,825 - DEBUG - /builds/signing/rel-key-signing-server/signed-files/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e is missing
signing.log.1:2014-05-13 08:09:31,826 - INFO - 10.26.52.108 - - [2014-05-13 08:09:31] "GET /sign/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e HTTP/1.1" 404 96 0.001439
signing.log.1:2014-05-13 08:09:41,888 - INFO - Request to mar sign firefox-30.0b4.complete.mar (9f24babe97415d9d5fe2cc88f871852fdeb4b22e) from 10.26.52.108
signing.log.1:2014-05-13 08:09:43,570 - DEBUG - Putting (u'9f24babe97415d9d5fe2cc88f871852fdeb4b22e', u'firefox-30.0b4.complete.mar', 'mar', <gevent.event.Event object at 0x10134e990>) on the queue
signing.log.1:2014-05-13 08:09:43,576 - INFO - Signing firefox-30.0b4.complete.mar (mar - 9f24babe97415d9d5fe2cc88f871852fdeb4b22e)
signing.log.1:2014-05-13 08:09:43,586 - DEBUG - 9619: ['/builds/signing/rel-key-signing-server/bin/python2.7', '/builds/signing/rel-key-signing-server/tools/release/signing/signscript.py', '-c', '/builds/signing/rel-key-signing-server/signscript.ini', 'mar', u'/builds/signing/rel-key-signing-server/unsigned-files/9f24babe97415d9d5fe2cc88f871852fdeb4b22e', u'/builds/signing/rel-key-signing-server/signed-files/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e', u'firefox-30.0b4.complete.mar']
signing.log.1:2014-05-13 08:09:45,275 - INFO - 10.26.52.108 GET /sign/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e
signing.log.1:2014-05-13 08:09:51,331 - DEBUG - Got message: ('done', (u'9f24babe97415d9d5fe2cc88f871852fdeb4b22e', u'firefox-30.0b4.complete.mar', 'mar', <gevent.event.Event object at 0x10134e990>), '17718e0fbdd188115a4373e390459f112b70b8a4')
signing.log.1:2014-05-13 08:09:51,333 - DEBUG - Looking for /builds/signing/rel-key-signing-server/signed-files/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e (firefox-30.0b4.complete.mar)
signing.log.1:2014-05-13 08:09:51,615 - DEBUG - /builds/signing/rel-key-signing-server/signed-files/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e is OK
signing.log.1:2014-05-13 08:11:05,441 - INFO - 10.26.52.108 - - [2014-05-13 08:11:05] "GET /sign/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e HTTP/1.1" 200 58067179 80.166659
signing.log.1:2014-05-13 08:11:05,984 - INFO - 10.26.52.108 GET /sign/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e
signing.log.1:2014-05-13 08:11:05,984 - DEBUG - Looking for /builds/signing/rel-key-signing-server/signed-files/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e (firefox-30.0b4.complete.mar)
signing.log.1:2014-05-13 08:11:06,253 - DEBUG - /builds/signing/rel-key-signing-server/signed-files/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e is OK
signing.log.1:2014-05-13 08:11:38,105 - INFO - 10.26.52.108 - - [2014-05-13 08:11:38] "GET /sign/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e HTTP/1.1" 200 58067179 32.122056
signing.log.1:2014-05-13 08:19:22,733 - INFO - Deleting /builds/signing/rel-key-signing-server/signed-files/mar/9f24babe97415d9d5fe2cc88f871852fdeb4b22e.out with no unsigned file
I don't see anything unusual there...
Updated•11 years ago
|
Flags: needinfo?(catlee)
Reporter | ||
Comment 9•11 years ago
|
||
I'm beginning to think this is a mac-signing4 issue. The early comments here are for that machine, and for 32.0b6 we hit several failures on it. There's also a weird "can't launch python" issue on bug 1037260.
Right now the release signer is stopped on it.
Reporter | ||
Updated•11 years ago
|
See Also: → mac-v2-signing4
Reporter | ||
Comment 10•11 years ago
|
||
We keep logs for 7 days, and hit 50 failures in 884 jobs that tried to sign (1635 total, caching must be working). Signing server breakdown:
3 on mac-signing2
8 on mac-singing3
39 on mac-signing4
So mac-signing4 isn't the only bad guy, just the baddest. The distribution of requests is fairly even over the three servers (295/317/272 respectively).
Probably dumb idea, but what about bumping the signing timeout and retest in a few days ?
Comment 11•11 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #10)
> Created attachment 8471992 [details]
> Recent l10n nightly failures
>
> We keep logs for 7 days, and hit 50 failures in 884 jobs that tried to sign
> (1635 total, caching must be working). Signing server breakdown:
> 3 on mac-signing2
> 8 on mac-singing3
> 39 on mac-signing4
>
> So mac-signing4 isn't the only bad guy, just the baddest. The distribution
> of requests is fairly even over the three servers (295/317/272
> respectively).
>
> Probably dumb idea, but what about bumping the signing timeout and retest in
> a few days ?
Well, it looks like the timeout is a mere 120 seconds right now (http://mxr.mozilla.org/build-central/source/tools/lib/python/signing/server.py#77). Seems worthwhile to try, to me. Even a 5 minute timeout doesn't seem crazy.
Reporter | ||
Comment 13•11 years ago
|
||
I tried manually running signscript for dmg signing, it takes 26 seconds or so on an idle mac-signing4. We allow up to 4 workers per set of certs, so in theory there could be 12 workers running simultaneously over 3 signing servers per host. In practice that would closer to 4-6, as I expect the number of dep jobs to be 0 to 2 at any one time, and usually no release jobs; and some of those would be much cheaper signing like mar or gpg. We get queues in the nightly signer when l10n is coming through.
As for how long it actually takes, I looked at the signing log and we'd have to parse for pairs of lines like
2014-08-10 04:01:19,288 - DEBUG - 78329: ['/builds/signing/nightly-key-signing-server/bin/python2.7', '/builds/signing/nightly-key-signing-server/tools/release/signing/signscript.py', '-c', '/builds/signing/nightly-key-signing-server/signscript.ini', 'gpg', u'/builds/signing/nightly-key-signing-server/unsigned-files/09d3271c5fef0bba6099865e4619939caabd25c7', u'/builds/signing/nightly-key-signing-server/signed-files/gpg/09d3271c5fef0bba6099865e4619939caabd25c7', u'thunderbird-33.0a2.gd.mac.checksums']
2014-08-10 04:01:25,067 - DEBUG - 78329: Success!
taking care to match the pids and check the signing type. Then track that over the day, including the # of concurrent processes. Urgh.
Reporter | ||
Comment 14•11 years ago
|
||
Interrupting the signscript process with SIGTERM leaves behind a <hash>.out file, so it seems that we won't return truncated content if we hit the 120s timeout as we're writing the signed tar.gz. The underlying cause is still unknown.
Reporter | ||
Comment 15•11 years ago
|
||
An arbitrary factor of 2, in lieu of actually understanding the problem.
Attachment #8473484 -
Flags: review?(bhearsum)
Comment 16•11 years ago
|
||
Comment on attachment 8473484 [details] [diff] [review]
[tools] bump timeout to 4 mins
Review of attachment 8473484 [details] [diff] [review]:
-----------------------------------------------------------------
Might I suggest some duct tape or glue to help this stick to the wall?
Attachment #8473484 -
Flags: review?(bhearsum) → review+
Comment 17•11 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #13)
> I tried manually running signscript for dmg signing, it takes 26 seconds or
> so on an idle mac-signing4. We allow up to 4 workers per set of certs, so in
> theory there could be 12 workers running simultaneously over 3 signing
> servers per host. In practice that would closer to 4-6, as I expect the
> number of dep jobs to be 0 to 2 at any one time, and usually no release
> jobs; and some of those would be much cheaper signing like mar or gpg. We
> get queues in the nightly signer when l10n is coming through.
I wonder how linearly load scales here. The 10.6 signing machines only have 2 cores, so we're probably thrashing the CPU if we have more than 2 workers running (especially during taring/untaring). I was wondering about I/O performance as well, but I feel like taring and untaring would be more CPU bound than I/O bound...
Reporter | ||
Comment 19•11 years ago
|
||
Bah! Can't deploy this to a new branch in the tools repo because it has the single hook on it.
Reporter | ||
Comment 20•11 years ago
|
||
Reporter | ||
Comment 21•11 years ago
|
||
Running into quite a bit of trouble with this today, so emergency halving of concurrency applied.
https://hg.mozilla.org/build/puppet/rev/f7870cf2cbe2
https://hg.mozilla.org/build/puppet/rev/98eee0418b39
I think the issue might just be mac-v2-signing3, but this lets me recover better.
Attachment #8505233 -
Flags: checked-in+
Comment 22•11 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #21)
> Created attachment 8505233 [details] [diff] [review]
> [puppet] Halve concurrency to 2 on 10.9 mac signing servers
>
> Running into quite a bit of trouble with this today, so emergency halving of
> concurrency applied.
>
> https://hg.mozilla.org/build/puppet/rev/f7870cf2cbe2
> https://hg.mozilla.org/build/puppet/rev/98eee0418b39
>
> I think the issue might just be mac-v2-signing3, but this lets me recover
> better.
Thanks Nick. The last alert I see is about an hour after this landed (which is many hours ago at this point), so it looks like this helped. We're migrating one of the 10.6 signing machines to 10.9 to help take on load here - that's been tracked in bug 1082685. I'm thinking we should pull mac-v2-signing3 out of the pool and run some diagnostics when that's complete - it certainly does seem like it's the one having issues. I filed that as bug 1083156, though it could arguably be done here...
Depends on: mac-v2-signing3, 1082685
Reporter | ||
Comment 23•11 years ago
|
||
Good call on the mac-v2-signing3 diagnostics.
Comment 24•10 years ago
|
||
Hit this a few times in chemspills today.
Comment 25•10 years ago
|
||
Comment 26•10 years ago
|
||
This is hitting a second day in a row.
We need to figure out why things are busted, and fix it.
Comment 27•10 years ago
|
||
Catlee realized that this block is eating all exceptions that happen during dmg_signpackage. It turns out that this was originally fixed in bug 757684 (https://github.com/mozilla/build-tools/commit/ca4907e889b45e1e79592d0f24fff070e9571fb6), but it got lost when the code was refactored in bug 730972 (https://github.com/mozilla/build-tools/commit/e863153c0760e4ea92b5c83df943fec26db68cc2#diff-95d18d8a67730330e3a45e4a5d75de72R258).
This should re-fix the bug. It won't stay the signing from failing, but the signing server will return an error and the client will retry.
Attachment #8640065 -
Flags: review?(catlee)
Updated•10 years ago
|
Attachment #8640065 -
Flags: review?(catlee) → review+
Updated•10 years ago
|
Attachment #8640065 -
Flags: checked-in+
Comment 28•10 years ago
|
||
I forgot to update the tools checkouts on the signing servers for this yesterday. I did this just now, so the fix should be live. I'm going to watch some l10n logs to see if I can catch them retrying for verification.
Comment 29•10 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #28)
> I forgot to update the tools checkouts on the signing servers for this
> yesterday. I did this just now, so the fix should be live. I'm going to
> watch some l10n logs to see if I can catch them retrying for verification.
This looks like a series of retries:
06:29:33 INFO - 2015-07-29 06:29:33,989 - 12d3a5ad099aa4831c8858113badd7414f2403dc: uploading for signing
06:29:50 INFO - 2015-07-29 06:29:50,808 - 12d3a5ad099aa4831c8858113badd7414f2403dc: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing4.srv.releng.scl3.mozilla.com:9100
06:30:54 INFO - 2015-07-29 06:30:54,671 - 12d3a5ad099aa4831c8858113badd7414f2403dc: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing4.srv.releng.scl3.mozilla.com:9100
06:31:55 INFO - 2015-07-29 06:31:55,736 - 12d3a5ad099aa4831c8858113badd7414f2403dc: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing4.srv.releng.scl3.mozilla.com:9100
06:33:01 INFO - 2015-07-29 06:33:01,885 - 12d3a5ad099aa4831c8858113badd7414f2403dc: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing4.srv.releng.scl3.mozilla.com:9100
06:33:31 INFO - 2015-07-29 06:33:31,668 - 12d3a5ad099aa4831c8858113badd7414f2403dc: OK
So that's a good sign.
Comment 30•10 years ago
|
||
All Aurora repacks are up-to-date today, and the mozilla-central ones are just about there (just waiting on two more chunks to complete). There's a few failures on central, but it looks totally unrelated to signing:
06:58:06 INFO - /builds/slave/m-cen-m64-l10n-ntly-1-00000000/build/mozilla-central/obj-l10n/_virtualenv/bin/python /builds/slave/m-cen-m64-l10n-ntly-1-00000000/build/mozilla-central/toolkit/mozapps/installer/l10n-repack.py /builds/slave/m-cen-m64-l10n-ntly-1-00000000/build/mozilla-central/obj-l10n/dist/l10n-stage/firefox/FirefoxNightly.app/Contents/Resources ../../dist/xpi-stage/locale-en-GB \
06:58:06 INFO - \
06:58:07 INFO - Error: Multiple locales aren't supported
06:58:08 ERROR - Traceback (most recent call last):
06:58:08 INFO - File "/builds/slave/m-cen-m64-l10n-ntly-1-00000000/build/mozilla-central/toolkit/mozapps/installer/l10n-repack.py", line 60, in <module>
06:58:08 INFO - main()
06:58:08 INFO - File "/builds/slave/m-cen-m64-l10n-ntly-1-00000000/build/mozilla-central/toolkit/mozapps/installer/l10n-repack.py", line 56, in main
06:58:08 INFO - non_resources=args.non_resource, non_chrome=NON_CHROME)
06:58:08 INFO - File "/builds/slave/m-cen-m64-l10n-ntly-1-00000000/build/mozilla-central/python/mozbuild/mozpack/packager/l10n.py", line 247, in repack
06:58:08 INFO - _repack(app_finder, l10n_finder, copier, formatter, non_chrome)
06:58:08 INFO - File "/tools/python/lib/python2.7/contextlib.py", line 24, in __exit__
06:58:08 INFO - self.gen.next()
06:58:08 INFO - File "/builds/slave/m-cen-m64-l10n-ntly-1-00000000/build/mozilla-central/python/mozbuild/mozpack/errors.py", line 131, in accumulate
06:58:08 INFO - raise AccumulatedErrors()
06:58:08 INFO - mozpack.errors.AccumulatedErrors
06:58:08 ERROR - make[1]: *** [repackage-zip] Error 1
06:58:08 INFO - make: *** [repackage-zip-en-GB] Error 2
06:58:08 ERROR - Return code: 2
06:58:08 ERROR - make installers-en-GB failed
Given all of this, I think we can call this fixed...
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•