Closed Bug 1450243 Opened 7 years ago Closed 7 years ago

missing ADI data for 2018-03-29 in -new-prod

Categories

(Socorro :: Database, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Unassigned)

Details

Yesterday we did a migration and during that we copied the db from -prod to -new-prod. That's fine--everything is super there. However, -prod is unique among all our environments in that FetchADIFromHiveCronApp running on SCL3 updates the crontabber bookkeeping in the -prod db. So when we copied it over, the crontabber bookkeeping was such that in -new-prod, RawADIMoverCronApp didn't run right today: "fetch-adi-from-hive": { "next_run": "2018-03-31T08:20:00+00:00", "first_run": "2014-08-29T21:45:05.775446+00:00", "last_error": {}, "last_run": "2018-03-30T08:13:17.852078+00:00", "last_success": "2018-03-29T08:00:00+00:00", "error_count": 0, "depends_on": [], "ongoing": null }, The next-run is set for tomorrow, but the last_success is from yesterday. Anyhow, so the result of that is that build-adu-matview and adu-matview both failed because the ADI data for 2018-03-29 is missing in -new-prod: https://sentry.prod.mozaws.net/operations/socorro-new-prod/issues/3419995/ This bug covers fixing all that.
I verified the data is in the -prod db, so I think the best course of action is to fix the crontabber record in -new-prod causing the RawADIMoverCronApp to run and copy the data over. After that, I think everything will recover just fine. Given that fetch-adi-from-hive is a different job in -new-prod from -prod, I vote we delete the record altogether today and let the job "start fresh". We need to run this in the db in -new-prod: DELETE FROM crontabber WHERE app_name = 'fetch-adi-from-hive'; That'll delete the record. Next time crontabber runs, it'll run RawADIMoverCronApp for yesterday and copy the 2018-03-29 data over. If my theory is horribly wrong, we'll be able to see that it ran and either we'll have some errors in Sentry or it'll do nothing. It won't cause the issue to be worse. Miles, Brian: Can one of you run that db statement as soon as you get in?
Flags: needinfo?(miles)
Flags: needinfo?(bpitts)
Priority: -- → P1
An alternative approach is that we dump the data we need from -prod as SQL insert statements, then run that script in -new-prod. I'm pretty sure crontabber will run RawADICronMoverApp correctly tomorrow morning. I'm game either way, but deleting the crontabber record thus coercing crontabber into running RawADIMoverCronApp seems like the easiest and safest approach.
Here's an /api/ADI/ query against -new-stage: https://crash-stats.allizom.org/api/ADI/?end_date=2018-03-29&platforms=Windows&product=Firefox&start_date=2018-03-29&versions=59.0.2 { "hits": [ { "date": "2018-03-29", "adi_count": 44243224, "version": "59.0.2", "build_type": "release" } ], "total": 1 } Same query against -new-prod: https://crash-stats.mozilla.com/api/ADI/?end_date=2018-03-29&platforms=Windows&product=Firefox&start_date=2018-03-29&versions=59.0.2 { "hits": [], "total": 0 } After we copy the data and the ADI jobs run, these two API queries should produce the same results.
breakpad=> select * FROM crontabber WHERE app_name = 'fetch-adi-from-hive'; app_name | next_run | first_run | last_run | last_success | error_count | depends_on | last_error | ongo ing ---------------------+------------------------+-------------------------------+-------------------------------+------------------------+-------------+------------+------------+----- ---- fetch-adi-from-hive | 2018-03-31 08:20:00+00 | 2014-08-29 21:45:05.775446+00 | 2018-03-30 08:13:17.852078+00 | 2018-03-29 08:00:00+00 | 0 | {} | {} | (1 row) breakpad=> DELETE FROM crontabber WHERE app_name = 'fetch-adi-from-hive'; DELETE 1 breakpad=> select * FROM crontabber WHERE app_name = 'fetch-adi-from-hive'; app_name | next_run | first_run | last_run | last_success | error_count | depends_on | last_error | ongoing ----------+----------+-----------+----------+--------------+-------------+------------+------------+--------- (0 rows)
Flags: needinfo?(bpitts)
I saw RawADICronApp kick off and run--took 5 minutes. Then ftpscraper ran for like 15 minutes because it's so sloooooowwww. Then the two ADU jobs ran. I checked the /api/ADI/ query above, but it still returned 0 results. Pretty sure that's cached, so I switched the version to 59.0.1 for the same date and that returned results. So I think we're good here. Marking as FIXED.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(miles)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.