Closed Bug 739361 Opened 12 years ago Closed 12 years ago

Please make etl-run.sh script for Weave/Sync Metrics process more intelligent

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: dre, Assigned: Atoll)

Details

(Whiteboard: [qa-])

Daniel Einspanjer [:dre] [:deinspanjer]

Reporter

Description

•

12 years ago

Here are some general ideas based on the failure observed from today:

1. Reasonable timeout on SCP
2. Log interpolated commands and the results of running them (i.e. bash -xv)
3. E-mail logs on failure
4. Retry logic?

:Atoll

Assignee

Updated

•

12 years ago

Assignee: server-ops → nobody

Group: infra

Component: Server Operations → Operations

Product: mozilla.org → Mozilla Services

QA Contact: phong → operations

Version: other → unspecified

:Atoll

Assignee

Comment 1

•

12 years ago

specifically, the one-off script adm1.phx1.svc:/opt/etl-run.sh that we should probably also puppetize along with /opt/weave_2/writeFile/ at the same time.

James Bonacci [:jbonacci]

Updated

•

12 years ago

Whiteboard: [qa-]

:Atoll

Assignee

Comment 2

•

12 years ago

(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #0)
> 1. Reasonable timeout on SCP

It's actually the kernel permitting up to 2 days before a session dies, when the routers in between don't send any ICMP indicating otherwise.  Switching to rsync will permit proper timeout handling.

> 2. Log interpolated commands and the results of running them (i.e. bash -xv)

I don't quite understand "interpolated commands", but set -x; set -v is certainly reasonable.

> 3. E-mail logs on failure

Can do.

> 4. Retry logic?

Can do.

Assignee: nobody → rsoderberg

Status: NEW → ASSIGNED

:Atoll

Assignee

Comment 3

•

12 years ago

Implemented:

rsync instead of scp, --timeout=300 seconds, --verbose, --perms --chmod=Fa+w to replace the extra chmod step, with 1 retry after a 60-240 second sleep in case of failure.

if something goes wrong, the full output of etl-run.sh will be sent to <cron-weave> and to <metrics-alerts>, From: <cron+sync-etl-run>.

The etl-run.sh output does NOT include the actual kettle job output, because kettle job output contains plaintext passwords.  Instead, it will list the pathname to the job log, so Metrics can ask Svcops to look.

Pushed the new etl-run.sh script to adm1.phx1.svc (wp-adm01) and we'll see how this morning's results look.

:Atoll

Assignee

Comment 4

•

12 years ago

Results were fine. Closing this as resolved, filing a separate bug to puppetize it.

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Please make etl-run.sh script for Weave/Sync Metrics process more intelligent

Categories

(Cloud Services :: Operations: Miscellaneous, task)

Tracking

(Not tracked)

People

(Reporter: dre, Assigned: Atoll)

References

Details

(Whiteboard: [qa-])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4