Closed Bug 1382296 Opened 7 years ago Closed 7 years ago

Extending Spark Cluster Lifetime From ATMO Dashboard Not Working

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bencase, Assigned: bugzilla)

Details

Benton Case

Reporter

Description

•

7 years ago

I ran a spark cluster on ATMO with an initial lifetime of 24 hours. About 12 hours in, I used the "extend lifetime" button a few times to push the clusters termination time out by 3 days. However, at the end of the 24 hour initial lifetime the cluster was terminated.

Steps leading to issue:
1) started a spark cluster with 1 node, 24 hour lifetime
2) 12 hours in, used "extend lifetime" button three times (added 72 hours)
3) cluster terminated after only the initial 24 hour period

Rob Miller [:rmiller]

Comment 1

•

7 years ago

Transferred to https://github.com/mozilla/telemetry-analysis-service/issues/627 for tracking.

Rob Hudson [:robhudson]

Comment 2

•

7 years ago

I looked into this. I tested manually attempting to do the same steps. I also wrote a unit test which passes. I wasn't able to reproduce it.

I'm curious if you've experienced this again or if perhaps it was some fluke? Are there any more details you can provide to help reproduce this? When you extended the cluster did you happen to notice if the termination date/time on the page got updated?

Flags: needinfo?(bcase)

Josh Gaunt [:jgaunt]

Comment 3

•

7 years ago

Similar case(s) to report as bcase, only the most recent is fresh enough in my memory to report w/accuracy. I've had it happen with 3, 5, and 10 node clusters as well.

Steps leading to issue:
1) started a spark cluster with 30 nodes, 24 hour lifetime
2) ~12 hours in, used "extend lifetime" button once (added 24 hours)
3) cluster terminated after only the initial 24 hour period

Frank Bertsch [:frank]

Comment 4

•

7 years ago

Note that Josh's cluster was terminated with "The master node was terminated by user", see [0]. This indicates to me that ATMO is shutting down these clusters.

[0] https://screenshots.firefox.com/LiWdmoYKUxPUaXvM/us-west-2.console.aws.amazon.com

Benton Case

Reporter

Comment 5

•

7 years ago

Rob, 

I've experience this issue more than once, with always with 1 node clusters where I use the "extend lifetime" button about 12 hours in. The termination date/time on the page did update properly after using the button. 

Is there anything else I can help with for reproduction? I'm not sure if there are any other details that would be helpful.

Flags: needinfo?(bcase)

bugzilla

Assignee

Comment 6

•

7 years ago

So, according to CloudTrail, it's *not* the prod ATMO instance that's killing the clusters -- the kill request is coming from an old dev instance of ATMO. I'm going to kill that environment (I don't think it's being used anymore -- the code hasn't been updated in nearly a year) and hopefully that fixes the problem.

Assignee: nobody → ssuh

bugzilla

Assignee

Updated

•

7 years ago

Flags: needinfo?(mreid)

Rob Miller [:rmiller]

Comment 7

•

7 years ago

Closing bc this seems to be resolved, we can reopen if the issue comes back.

Status: NEW → RESOLVED

Closed: 7 years ago

Flags: needinfo?(mreid)

Resolution: --- → FIXED

BMO Automation

Updated

•

4 years ago

Product: Data Platform and Tools → Data Platform and Tools Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Extending Spark Cluster Lifetime From ATMO Dashboard Not Working

Categories

(Data Platform and Tools Graveyard :: Telemetry Analysis Service (ATMO), defect)

Tracking

(Not tracked)

People

(Reporter: bencase, Assigned: bugzilla)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Updated