Need automated workaround to socket hang issue

RESOLVED FIXED

Status

Release Engineering
General
RESOLVED FIXED
5 years ago
7 months ago

People

(Reporter: hwine, Assigned: hwine)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

5 years ago
Our current alerting script should be modified to auto-apply the workaround (HUP the hg process), and complain noisily if it can not.

Ideally, it would auto update bug 829025 with the details of the hang, but that may remain manual for a while.
(Assignee)

Comment 1

5 years ago
Created attachment 705130 [details] [diff] [review]
attempt hup of hg in socket wait

Only easy way to test is to run live in production - want a sanity check first.

Script would be started from cron with the --fix option. That should attempt one HUP of the socket (existing messages have shown solid detection of just the hg in socket wait), then report again.

Ideally, I'll get to run manually on a live hang prior to deploy via cron.
Assignee: nobody → hwine
Status: NEW → ASSIGNED
Attachment #705130 - Flags: feedback?(nthomas)
Comment on attachment 705130 [details] [diff] [review]
attempt hup of hg in socket wait

Re-entrant bash doesn't make me nervous, no, not at all.

>diff --git a/check_process_delay b/check_process_delay
>+email_subject="[vcs2vcs] process delays"

You could use this at the end of the on_exit() function that follows, yes ?

>+        log "socket hang on pid $p" 

Nit, trailing whitespace. 

>+# process command line args
>+attempt_fix=false
>+while test $# -gt 0; do
>+    case "$1" in
>+    --fix) attempt_fix=true ;;
>+    -h | --help) usage ;;
>+    -*) usage "unknown option '$1'" ;;
>+    *) break ;;
>+    esac
>+    shift
>+done

If memory serves, the case options can be indented for greater readability.
Attachment #705130 - Flags: feedback?(nthomas) → feedback+
(Assignee)

Comment 3

5 years ago
Created attachment 709202 [details] [diff] [review]
automatically try to fix a hung socket

This has been successfully running on gd3 for a while, and incorporates :nthomas previous feedback.
Attachment #705130 - Attachment is obsolete: true
Attachment #709202 - Flags: review?(nthomas)
Comment on attachment 709202 [details] [diff] [review]
automatically try to fix a hung socket

>             # likely i/o to NFS slowing things down. Notify, but may not
>-            # be error (unsubscripted array access is element 0)
>+            # be error

nit, be an error
Attachment #709202 - Flags: review?(nthomas) → review+
(Assignee)

Comment 5

5 years ago
Comment on attachment 709202 [details] [diff] [review]
automatically try to fix a hung socket

http://hg.mozilla.org/users/hwine_mozilla.com/repo-sync-tools/rev/0986499abadc

and deployed in production
Attachment #709202 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: Tools → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.