Closed Bug 833590 Opened 11 years ago Closed 11 years ago

Need automated workaround to socket hang issue

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Assigned: hwine)

References

Details

Attachments

(1 file, 1 obsolete file)

Our current alerting script should be modified to auto-apply the workaround (HUP the hg process), and complain noisily if it can not.

Ideally, it would auto update bug 829025 with the details of the hang, but that may remain manual for a while.
Attached patch attempt hup of hg in socket wait (obsolete) — Splinter Review
Only easy way to test is to run live in production - want a sanity check first.

Script would be started from cron with the --fix option. That should attempt one HUP of the socket (existing messages have shown solid detection of just the hg in socket wait), then report again.

Ideally, I'll get to run manually on a live hang prior to deploy via cron.
Assignee: nobody → hwine
Status: NEW → ASSIGNED
Attachment #705130 - Flags: feedback?(nthomas)
Comment on attachment 705130 [details] [diff] [review]
attempt hup of hg in socket wait

Re-entrant bash doesn't make me nervous, no, not at all.

>diff --git a/check_process_delay b/check_process_delay
>+email_subject="[vcs2vcs] process delays"

You could use this at the end of the on_exit() function that follows, yes ?

>+        log "socket hang on pid $p" 

Nit, trailing whitespace. 

>+# process command line args
>+attempt_fix=false
>+while test $# -gt 0; do
>+    case "$1" in
>+    --fix) attempt_fix=true ;;
>+    -h | --help) usage ;;
>+    -*) usage "unknown option '$1'" ;;
>+    *) break ;;
>+    esac
>+    shift
>+done

If memory serves, the case options can be indented for greater readability.
Attachment #705130 - Flags: feedback?(nthomas) → feedback+
This has been successfully running on gd3 for a while, and incorporates :nthomas previous feedback.
Attachment #705130 - Attachment is obsolete: true
Attachment #709202 - Flags: review?(nthomas)
Comment on attachment 709202 [details] [diff] [review]
automatically try to fix a hung socket

>             # likely i/o to NFS slowing things down. Notify, but may not
>-            # be error (unsubscripted array access is element 0)
>+            # be error

nit, be an error
Attachment #709202 - Flags: review?(nthomas) → review+
Comment on attachment 709202 [details] [diff] [review]
automatically try to fix a hung socket

http://hg.mozilla.org/users/hwine_mozilla.com/repo-sync-tools/rev/0986499abadc

and deployed in production
Attachment #709202 - Flags: checked-in+
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: