Support realtime logging of EMR clusters in Airflow

RESOLVED FIXED

Status

Cloud Services
Metrics: Pipeline
P4
normal
RESOLVED FIXED
2 years ago
10 months ago

People

(Reporter: rvitillo, Assigned: whd)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [SvcOps])

User Story

Airflow doesn't currently show logs in real-time. Since the logs are uploaded in semi real-time to S3 while a job is running, Airflow could fetch those logs and display them in the GUI.
Comment hidden (empty)

Updated

2 years ago
Points: --- → 3
Priority: -- → P4
(Assignee)

Comment 1

10 months ago
To clarify, Airflow does show real time logging of EMR cluster status via our EMR operator, but not the EMR logs themselves. The task here would be to modify our EMR operator to also fetch the EMR logs that are uploaded in semi real-time and add them to the logging output already available.

Depending on whether we move to the upstream EMR operator (bug #1325393), this would require maintaining a separate version of that operator, or making such logic generically useful.
Assignee: nobody → whd
Status: NEW → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → FIXED
Whiteboard: [SvcOps]
You need to log in before you can comment on or make changes to this bug.