Closed Bug 1263959 Opened 8 years ago Closed 8 years ago

Presto master is dead

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rvitillo, Unassigned)

Details

(Whiteboard: [SvcOps])

The Presto master is dead and the cluster is in an unusable state. Wesley, could you please have a look at it?
Flags: needinfo?(whd)
Severity: normal → blocker
Priority: -- → P1
How we fixed the issue, at least temporarily:

1. Free enough memory for Presto to start.
2. Run sudo /etc/lib/presto/bin/launcher start --config /etc/presto/conf/config.properties
3. Check /var/log/presto/launcher.log to confirm it worked.
Priority: P1 → --
Whiteboard: [SvcOps]
(In reply to Anthony Zhang [:azhang] from comment #1)
> How we fixed the issue, at least temporarily:
> 
> 1. Free enough memory for Presto to start.
> 2. Run sudo /etc/lib/presto/bin/launcher start --config
> /etc/presto/conf/config.properties
> 3. Check /var/log/presto/launcher.log to confirm it worked.

I think what happened here is the following:
1. I killed the Presto service thinking that it would restart on its own, which it usually does
2. Presto couldn't restart as redash_celery was eating a large fraction of the available memory
3. I restarted redash_celery with supervisorctl which freed up memory
4. Presto started up correctly automatically

In conclusion I don't think we need any manual steps to start Presto next time this happens.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(whd)
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.