Closed
Bug 1484777
Opened 7 years ago
Closed 6 years ago
Set spark.dynamicAllocation.enabled to false for all airflow job clusters
Categories
(Data Platform and Tools :: General, enhancement, P2)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: bugzilla, Unassigned)
Details
(Whiteboard: [DataPlatform])
Apparently by default, EMR sets `spark.dynamicAllocation.enabled` to `true` (ref:
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html)
And here are the spark docs for this config: https://spark.apache.org/docs/latest/configuration.html#execution-behavior
Since all of our airflow jobs use a dedicated cluster, releasing and re-requesting executors as workload scales ends up wasting resources. We should test out setting this config to false on these jobs, and potentially on ATMO clusters as well (although in ad-hoc clusters there's more of a chance that folks are trying to use multiple spark contexts at once and not sharing resources nicely will cause issues.)
Comment 1•7 years ago
|
||
Bumping to 2 points since it can be a bit tricky to test this kind of change.
Points: 1 → 2
Priority: -- → P2
So a few more notes from me:
- I noticed this happening (executor count going all the way down to like, 1 and then scaling back to 50 a few times) when I was running a month-long longitudinal by hand
- I had to run a 3-month longitudinal and I turned off dynamicAllocation at the command line and it ran just fine (entire job took 2.7 hours)
I think we should consider turning this on for a few major jobs of various types, one at a time in the job code, and then switch it on for all the airflow jobs when we're fairly confident this works and is an improvement.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
| Assignee | ||
Updated•3 years ago
|
Component: Spark → General
You need to log in
before you can comment on or make changes to this bug.
Description
•