Closed
Bug 1261498
Opened 8 years ago
Closed 8 years ago
mac-v2-signing2 & 7 sometimes fail to sign
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: bhearsum)
References
Details
Attachments
(2 files)
1.26 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
521 bytes,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
It seems to select mac signing2 and 7 more often than any others, which leads to them getting the vast majority of the load and sometimes falling over.
Assignee | ||
Comment 1•8 years ago
|
||
I'm starting to doubt that there's actually an issue with signing server selection. The code seems to do it properly, and when assessing across the all instances on each server, the numbers aren't too uneven. Here's the totals number of matches for "Putting" on each server since April 1: mac-v2-signing1: 1712 mac-v2-signing2: 1901 mac-v2-signing3: 2039 mac-v2-signing4: 1955 mac-v2-signing6: 1968 mac-v2-signing7: 1393 (grepped for "Putting", because that only happens when a file is put onto the queue, whereas the file hash is printed when the client checks to see if the file is done, too.) When I break it down to just Nightly signing servers, things look a bit different: mac-v2-signing1: 215 mac-v2-signing2: 369 mac-v2-signing3: 359 mac-v2-signing4: 313 mac-v2-signing6: 309 mac-v2-signing7: 483 I also grepped for Timeouts, which only occured on 2 & 7: mac-v2-signing2: 67 mac-v2-signing7: 156 Every timeout causes another "Putting" message, so that may explain the elevated number of matches in the Nightly logs on those servers.
Summary: bad signing server selection by signtool → mac-v2-signing2 & 7 sometimes fail to sign
Assignee | ||
Comment 2•8 years ago
|
||
A few notes about 2 & 7: * They don't share a hardware class (2 is r5, 7 is r4) * They were both reimaged recently, along with the rest of the pool * I can't find record of hardware diagnostics being run anytime in the past year Sounds like hardware diagnostics might be the next step here, though I'm not sure how much we trust them on Macs.
Assignee | ||
Comment 3•8 years ago
|
||
As we discussed, here's changes that should make us actually give up on pending files after awhile and try a new server. Increasing the error count after giving up on one server means that we'll eventually give up entirely. WIth this patch we should try a different signing server after ~5min, and give up entirely after trying 5 servers.
Updated•8 years ago
|
Attachment #8741496 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 4•8 years ago
|
||
Comment on attachment 8741496 [details] [diff] [review] improve retry around pending files https://hg.mozilla.org/build/tools/rev/556153d9544c
Attachment #8741496 -
Flags: checked-in+
Assignee | ||
Comment 5•8 years ago
|
||
My patch to the signing client seems to be working fine. I haven't seen it fail over to another server yet, but I'll look for timeouts next week and see if they switch servers after ~5min to confirm.
Assignee | ||
Comment 6•8 years ago
|
||
(In reply to Ben Hearsum (:bhearsum) from comment #5) > My patch to the signing client seems to be working fine. I haven't seen it > fail over to another server yet, but I'll look for timeouts next week and > see if they switch servers after ~5min to confirm. So, my patch works insofar as it switches to the next server after 5 tries...but it doesn't reset the pending count, so we end up not giving the next server a fair chance to sign: 05:41:14 INFO - 2016-04-16 05:41:14,442 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9100 05:41:36 INFO - 2016-04-16 05:41:36,911 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: uploading for signing 05:42:05 INFO - 2016-04-16 05:42:05,402 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9100 05:43:20 INFO - 2016-04-16 05:43:20,902 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9100 05:44:36 INFO - 2016-04-16 05:44:36,455 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9100 05:45:51 INFO - 2016-04-16 05:45:51,663 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9100 05:47:06 INFO - 2016-04-16 05:47:06,864 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9100 05:48:21 INFO - 2016-04-16 05:48:21,887 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: giving up after 5 tries 05:48:21 INFO - 2016-04-16 05:48:21,887 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing3.srv.releng.scl3.mozilla.com:9100 05:48:21 INFO - 2016-04-16 05:48:21,905 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: uploading for signing 05:48:31 INFO - 2016-04-16 05:48:31,059 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: giving up after 5 tries 05:48:31 INFO - 2016-04-16 05:48:31,059 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: processing FirefoxNightly.app.tar.gz on https://mac-v2-signing2.srv.releng.scl3.mozilla.com:9100 05:48:31 INFO - 2016-04-16 05:48:31,079 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: uploading for signing 05:48:41 INFO - 2016-04-16 05:48:41,820 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: giving up after 5 tries 05:48:41 INFO - 2016-04-16 05:48:41,821 - aba7e1dc2ad11788cc11d80285d6ede90a3cd29f: giving up after 6 tries 05:48:41 INFO - 2016-04-16 05:48:41,821 - Failed to sign FirefoxNightly.app.tar.gz with dmg 05:48:41 ERROR - make[1]: *** [repackage-zip] Error 1 05:48:41 INFO - make: *** [repackage-zip-ast] Error 2
Attachment #8742499 -
Flags: review?(catlee)
Updated•8 years ago
|
Attachment #8742499 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 7•8 years ago
|
||
Comment on attachment 8742499 [details] [diff] [review] reset pending count when trying a new server https://hg.mozilla.org/build/tools/rev/710d0b6ec4d2
Attachment #8742499 -
Flags: checked-in+
Assignee | ||
Comment 8•8 years ago
|
||
No timeouts since I landed the latest patch, couldn't verify it.
Assignee | ||
Comment 9•8 years ago
|
||
Looks to be working now: 05:56:10 INFO - 2016-04-19 05:56:10,873 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: processing Nightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9110 05:56:10 INFO - 2016-04-19 05:56:10,909 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: uploading for signing 05:56:31 INFO - 2016-04-19 05:56:31,683 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: processing Nightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9110 05:57:46 INFO - 2016-04-19 05:57:46,905 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: processing Nightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9110 05:59:06 INFO - 2016-04-19 05:59:06,276 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: processing Nightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9110 06:00:21 INFO - 2016-04-19 06:00:21,819 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: processing Nightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9110 06:01:43 INFO - 2016-04-19 06:01:43,116 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: processing Nightly.app.tar.gz on https://mac-v2-signing7.srv.releng.scl3.mozilla.com:9110 06:02:58 INFO - 2016-04-19 06:02:58,489 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: giving up after 5 tries 06:02:58 INFO - 2016-04-19 06:02:58,490 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: processing Nightly.app.tar.gz on https://mac-v2-signing1.srv.releng.scl3.mozilla.com:9110 06:02:58 INFO - 2016-04-19 06:02:58,508 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: uploading for signing 06:03:12 INFO - 2016-04-19 06:03:12,940 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: processing Nightly.app.tar.gz on https://mac-v2-signing1.srv.releng.scl3.mozilla.com:9110 06:04:20 INFO - 2016-04-19 06:04:20,718 - 071fcc11d39a4428a42ae48ef9705d9e8479776c: OK
Assignee | ||
Comment 10•8 years ago
|
||
Since the signing client changes were made, there have only been 2 timeouts across the entire pool of signing servers. Load looks much more balanced as well, based on a grep for "Putting" again. Calling this fixed.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•