Closed Bug 1309876 Opened 8 years ago Closed 8 years ago

Mobile jobs are failing

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rvitillo, Assigned: mdoglio)

References

Details

User Story

The following atmo v1 jobs have been failing for a while:

- mobile-android-addons-v1
- mobile-android-events-v1
- mobile-android-clients-v1

Mauro, are those jobs still being used?
      No description provided.
Blocks: 1255755
Summary: Mobl → Mobile jobs are failing
Flags: needinfo?(mdoglio)
I suspect nobody is using it but I don't know tbh. I meant to add them to airflow in bug 1305423 but I haven't done it yet. Let's ask bbermes if those datasets are still needed.
Flags: needinfo?(mdoglio) → needinfo?(bbermes)
Points: --- → 2
Priority: -- → P2
When did they stop working?

Most of my presto queries query android_events_v1, mobile_events_v1, android_clients_v1, android_addons_v1, and mobile_clients_v1, so I think we should try to figure out what the issue is.

Thanks to you both for following up.
Flags: needinfo?(rvitillo)
Flags: needinfo?(mdoglio)
Flags: needinfo?(bbermes)
They seem to have failed on random days since the Sept 29th. I'll move them to airflow (in bug 1305423) as soon as possible, atmo v1 doesn't help me monitor these jobs very much. I'll take care of the eventual backfill as well.
Flags: needinfo?(mdoglio)
Flags: needinfo?(rvitillo)
Thanks Mauro,

Please let us know when we can expect this to be fixed. 

We are currently waiting for some Activity Stream data to come into re:dash for Android...
Assignee: nobody → mdoglio
Status: NEW → ASSIGNED
Priority: P2 → P1
I'm migrating the files right now. It shouldn't take more than a day to backfill the missing data, so I would say EOD tomorrow.
:barbara running the backfill is taking more than expected, I'll give you an update by EOD today or earlier.
After some investigation it turns out the filter on build_id between 20100101000000 and 99999999999999 is slowing down the job A LOT. On a 20 nodes cluster it takes about 30 minutes to run a android_addons job on 1% of data with the filter set. Without the filter it takes about 10 minutes to run the same job on 100%. I'm wondering if the filter is actually needed or not. :barbara do you have an opinion on that? In the meantime I'll change the notebooks to apply the build_id filter once the data is in spark. That should make the backfill extremely fast.
I found (and fixed) bug 1315243, which explains why the jobs were taking so long. I'm backfilling the last month of data for android_clients and android_events; android_addons was already backfilled last friday. :barbara can you please confirm that the numbers generated make sense to you?
Flags: needinfo?(bbermes)
I think it looks good now, thanks.

Was there also an issue with mobile_clients?
Flags: needinfo?(bbermes)
no, mobile_clients wasn't affected by that bug.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.