Closed
Bug 1344020
Opened 8 years ago
Closed 8 years ago
Drop support for EMR 4 series in analysis tools
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, enhancement, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: whd, Assigned: rvitillo)
References
Details
Attachments
(3 files)
The agent of change here is the desire to use a centralized metastore, which is not supported on 4.X. We could set up multiple configurations to continue supporting the old version, but we should be moving to spark 2 for things anyway.
Some things we need to do (not necessarily in order):
1. Announce that we're doing this to the appropriate lists.
2. Check and migrate airflow jobs using 4.X. Here's a list of dag nodes that are using the default release label, and thus need to be checked:
android_addons.py:t0 = EMRSparkOperator(task_id="android_addons", job_name="Update android addons"...)
android_clients.py:t0 = EMRSparkOperator(task_id="android_clients", job_name="Update android clients"...)
android_events.py:t0 = EMRSparkOperator(task_id="android_events", job_name="Update android events"...)
bugzilla_dataset.py:t0 = EMRSparkOperator( task_id="update_bugs", job_name="Bugzilla Dataset Update"...)
example.py:t0 = EMRSparkOperator(task_id = "spark", job_name = "Spark Example Job"...)
example.py:t1 = EMRSparkOperator(task_id = "bash", job_name = "Bash Example Job"...)
longitudinal.py:t1 = EMRSparkOperator(task_id="update_orphaning", job_name="Update Orphaning View"...)
longitudinal.py:t3 = EMRSparkOperator(task_id="game_hw_survey", job_name="Game Hardware Survey"...)
main_summary.py:t2 = EMRSparkOperator(task_id="engagement_ratio", job_name="Update Engagement Ratio"...)
main_summary.py:t5 = EMRSparkOperator(task_id="daily_search_rollup", job_name="Daily Search Rollup"...)
mobile_clients.py:t0 = EMRSparkOperator(task_id="mobile_clients", job_name="Update mobile clients"...)
telemetry_aggregates_fennec_backfill.py:t0 = EMRSparkOperator(task_id = "telemetry_aggregate_fennec_backfill", job_name = "Telemetry Aggregate Fennec Backfill"...)
telemetry_aggregates.py:t0 = EMRSparkOperator(task_id = "telemetry_aggregate_view", job_name = "Telemetry Aggregate View"...)
3. Change the default release_label in telemetry-airflow (most override the default to use 5.X series anyway, none set it explicitly).
4. Remove 4.X series from selectable EMR releases on ATMO.
5. Migrate the Churn scheduled ATMO job. This might be a dupe of a different churn job in airflow so maybe we can just remove it. It's owned by :Dexter but references an :mreid s3 path in the code, and lives at s3://telemetry-analysis-code-2/jobs/telemetry-churn-atmov2/Churn.ipynb.
It might also make sense to just make the release_label a required argument with no default value for EMRSparkOperator, so that we're always forced to piecemeal migrate jobs when we deprecate old EMR versions, as opposed to just bumping the default and accidentally breaking something.
It might be better to have separate bugs for each of these things, making this a meta bug, but I filed it as-is and people can split it out if needed.
Assignee | ||
Comment 1•8 years ago
|
||
Attachment #8844435 -
Flags: review?(mreid)
Updated•8 years ago
|
Attachment #8844435 -
Flags: review?(mreid) → review+
Assignee | ||
Comment 2•8 years ago
|
||
Attachment #8844438 -
Flags: review?(jezdez)
Assignee | ||
Comment 3•8 years ago
|
||
Attachment #8844445 -
Flags: review?(mreid)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → rvitillo
Points: --- → 2
Priority: -- → P1
Updated•8 years ago
|
Attachment #8844445 -
Flags: review?(mreid) → review+
Assignee | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Attachment #8844438 -
Flags: review?(jezdez) → review+
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•