Closed
Bug 739361
Opened 12 years ago
Closed 12 years ago
Please make etl-run.sh script for Weave/Sync Metrics process more intelligent
Categories
(Cloud Services :: Operations: Miscellaneous, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dre, Assigned: Atoll)
Details
(Whiteboard: [qa-])
Here are some general ideas based on the failure observed from today: 1. Reasonable timeout on SCP 2. Log interpolated commands and the results of running them (i.e. bash -xv) 3. E-mail logs on failure 4. Retry logic?
Assignee: server-ops → nobody
Group: infra
Component: Server Operations → Operations
Product: mozilla.org → Mozilla Services
QA Contact: phong → operations
Version: other → unspecified
specifically, the one-off script adm1.phx1.svc:/opt/etl-run.sh that we should probably also puppetize along with /opt/weave_2/writeFile/ at the same time.
Updated•12 years ago
|
Whiteboard: [qa-]
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #0) > 1. Reasonable timeout on SCP It's actually the kernel permitting up to 2 days before a session dies, when the routers in between don't send any ICMP indicating otherwise. Switching to rsync will permit proper timeout handling. > 2. Log interpolated commands and the results of running them (i.e. bash -xv) I don't quite understand "interpolated commands", but set -x; set -v is certainly reasonable. > 3. E-mail logs on failure Can do. > 4. Retry logic? Can do.
Assignee: nobody → rsoderberg
Status: NEW → ASSIGNED
Implemented: rsync instead of scp, --timeout=300 seconds, --verbose, --perms --chmod=Fa+w to replace the extra chmod step, with 1 retry after a 60-240 second sleep in case of failure. if something goes wrong, the full output of etl-run.sh will be sent to <cron-weave> and to <metrics-alerts>, From: <cron+sync-etl-run>. The etl-run.sh output does NOT include the actual kettle job output, because kettle job output contains plaintext passwords. Instead, it will list the pathname to the job log, so Metrics can ask Svcops to look. Pushed the new etl-run.sh script to adm1.phx1.svc (wp-adm01) and we'll see how this morning's results look.
Results were fine. Closing this as resolved, filing a separate bug to puppetize it.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•