Closed Bug 1484777 Opened 7 years ago Closed 6 years ago

Set spark.dynamicAllocation.enabled to false for all airflow job clusters

Categories

(Data Platform and Tools :: General, enhancement, P2)

enhancement
Points:
2

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bugzilla, Unassigned)

Details

(Whiteboard: [DataPlatform])

Apparently by default, EMR sets `spark.dynamicAllocation.enabled` to `true` (ref: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html) And here are the spark docs for this config: https://spark.apache.org/docs/latest/configuration.html#execution-behavior Since all of our airflow jobs use a dedicated cluster, releasing and re-requesting executors as workload scales ends up wasting resources. We should test out setting this config to false on these jobs, and potentially on ATMO clusters as well (although in ad-hoc clusters there's more of a chance that folks are trying to use multiple spark contexts at once and not sharing resources nicely will cause issues.)
Bumping to 2 points since it can be a bit tricky to test this kind of change.
Points: 1 → 2
Priority: -- → P2
So a few more notes from me: - I noticed this happening (executor count going all the way down to like, 1 and then scaling back to 50 a few times) when I was running a month-long longitudinal by hand - I had to run a 3-month longitudinal and I turned off dynamicAllocation at the command line and it ran just fine (entire job took 2.7 hours) I think we should consider turning this on for a few major jobs of various types, one at a time in the job code, and then switch it on for all the airflow jobs when we're fairly confident this works and is an improvement.
Assignee: nobody → ssuh
Assignee: ssuh → nobody
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Component: Spark → General
You need to log in before you can comment on or make changes to this bug.