Closed Bug 1579083 Opened 5 years ago Closed 5 years ago

Backfill 1% of data from telemetry-sample as an early step

Categories

(Data Platform and Tools :: General, task, P1)

task
Points:
5

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mreid, Assigned: klukas)

References

Details

This unblocks migration of things currently dependent on the Longitudinal dataset, as well as being a low-cost test of our import pathway.

I started the initial import yesterday via GCS transfer service. It is about 40% done so may finish over the weekend. Data is being written to gs://moz-fx-data-prod-data/telemetry-sample-2/.

The import job finished in 52 hours. This process (if extrapolated to 100% via simple multiplication) would take about 10 times longer to import than we're expecting, so I'm going to run a separate test on a single day of telemetry-3, to be detailed elsewhere. At any rate, the sample data is available now for developing the backfill procedure. I'm assuming this bug encompasses more than the AWS->GCP import, so I'm leaving it open.

Late last week, we ran a series of jobs for this and populated backfill-test-252723.test_ingestion_1pct.telemetry__main_v4. We now need to validate that output, which should already be fully deduplicated per day and reaches back to 2018-11-01.

Errors are in backfill-test-252723.test_ingestion_1pct.error.

:relud has been validating a day of this data vs. what's in main_summary.

He found 217 messages in main_summary missing from the heka import, but these all show reasonable validation errors:

216 have negative sessionLength and 1 has "timezoneOffset":539.55, so they seem correct to have thrown out.

There is one anomaly not yet explained:

13578523-d76c-43f0-963b-2e7d9a903e0b on 2019-08-30 doesn't exist in main summary, but does exist in the 1pct table

Assignee: nobody → jklukas
Points: --- → 5
Priority: -- → P1

The 1% backfill table now exists as: moz-fx-data-shared-prod:static.main_1pct_backfill

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.