If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

improve signing client retry logic

RESOLVED FIXED

Status

Release Engineering
General Automation
P3
enhancement
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: bhearsum, Assigned: catlee)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [signing])

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
The signing client currently seems to retry the same server 20 times before moving on to another one. For example:
2012-06-07 17:09:41,734 - a53774f6112fb2458853012d2ea02a21226f4e03: processing mac/is/Thunderbird 14.0b1.dmg on https://mac-signing2.srv.releng.scl3.mozilla.com:9120
2012-06-07 17:10:56,834 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:12:13,440 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:13:30,014 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:14:46,606 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:16:03,195 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:17:19,787 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:18:36,360 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:19:52,937 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:21:09,529 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:22:26,123 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:23:42,700 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:24:59,289 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:26:15,864 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:27:32,462 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:28:49,042 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:30:05,615 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:31:22,202 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:32:38,794 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:33:55,363 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:35:11,947 - a53774f6112fb2458853012d2ea02a21226f4e03: connection error; trying again soon
2012-06-07 17:35:12,948 - a53774f6112fb2458853012d2ea02a21226f4e03: giving up after 20 tries
2012-06-07 17:35:13,098 - a53774f6112fb2458853012d2ea02a21226f4e03: processing mac/is/Thunderbird 14.0b1.dmg on https://mac-signing4.build.scl1.mozilla.com:9100
2012-06-07 17:35:13,445 - a53774f6112fb2458853012d2ea02a21226f4e03: uploading for signing
2012-06-07 17:35:22,903 - a53774f6112fb2458853012d2ea02a21226f4e03: OK


For batched repacks, this more or less guarantees that your token will expire before your job is done if a signing server is down. The client should be switching servers after fewer failures than this.

Comment 1

5 years ago
(In reply to Ben Hearsum [:bhearsum] from comment #0) 
> For batched repacks, this more or less guarantees that your token will
> expire before your job is done if a signing server is down. The client
> should be switching servers after fewer failures than this.

Do we have to wait before switching servers at all, i.e. can we iterate through all possible servers before 'trying again soon' on each cycle?
(Reporter)

Comment 2

5 years ago
(In reply to Chris Cooper [:coop] from comment #1)
> (In reply to Ben Hearsum [:bhearsum] from comment #0) 
> > For batched repacks, this more or less guarantees that your token will
> > expire before your job is done if a signing server is down. The client
> > should be switching servers after fewer failures than this.
> 
> Do we have to wait before switching servers at all, i.e. can we iterate
> through all possible servers before 'trying again soon' on each cycle?

The reason we wait right now is because we're retrying the same request to the same server, and giving it a chance to come back up first. Something like this would probably be better:
* Try server A
* If that fails, try server B
* If that fails, try server C
* If that fails, wait N seconds and try them all again.

Probably should shuffle the servers, though.
(Assignee)

Updated

5 years ago
Whiteboard: [signing]
(Reporter)

Updated

5 years ago
Severity: normal → enhancement
Priority: -- → P3
(Assignee)

Updated

5 years ago
Assignee: nobody → catlee
(Assignee)

Comment 3

5 years ago
Created attachment 651979 [details] [diff] [review]
move url retrying into remote_signfile

this moves handling of multiple urls to inside remote_signfile.

the urls are first shuffled, and then are tried in order. if we fail on one url, that url is moved to the end of the list. I think it's worthwhile to keep the small sleep that's in there in case there's something network-wide that's failing.
Attachment #651979 - Flags: review?(bhearsum)
(Reporter)

Comment 4

5 years ago
Comment on attachment 651979 [details] [diff] [review]
move url retrying into remote_signfile

Review of attachment 651979 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable to me.
Attachment #651979 - Flags: review?(bhearsum) → review+
(Assignee)

Updated

5 years ago
Attachment #651979 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.