Closed Bug 1158175 Opened 9 years ago Closed 9 years ago

Add build-id dimension to v4 filenames.

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rvitillo, Assigned: whd)

References

Details

V2 telemetry files contained the build-id in their filenames, which allowed users to query the indexing service by build-id range. Since this is a common query for the performance team, and more generally fo users trying to correlate a regression or performance improvement to a specific build-id, v4 filenames should contain the build-id.

Queries we might want to issue might look like this:
query(appName="Firefox", appUpdateChannel="nightly", appBuildID=("20150101000000", "20150110999999")

Note that the submission date is not part of the query.
Blocks: 1134217
Priority: -- → P2
Assignee: nobody → whd
Roberto, how will this impact the S3 filter service? 

The plan is to backfill the majority of the data into a new bucket prefix using the new schema, then cut over the data loader to use the new schema, then backfill the small gap in the middle.

We will then delete the data from the old prefix and use the new one.
Flags: needinfo?(rvitillo)
The following is required to transition to the new bucket:

- Change the v4 bucket in the batch filter service
- Change the v4 bucket in the lambda function
- Backfill submissions in the SimpleDB index

Where is the new schema definition going to be stored? Does telemetry_schema.py support it? When will the transition to the new bucket happen?
Flags: needinfo?(mreid)
Flags: needinfo?(rvitillo)
The new schema definition will be stored in the metadata bucket.  It will be very similar to the current schema, with the addition of one more field for appBuildId, so it will definitely be supported by telemetry_schema.py (you will have to set "dirs_only=True" similar to the current v4 schema).

Wes, do you have a particular timeline when you expect to be ready to transition?
Flags: needinfo?(mreid) → needinfo?(whd)
Probably Wednesday. As we surmised, the cardinality increase caused the backfill process to hit open file descriptor limits, but it's finally running and should complete in about a day. The production switch-over and single day of backfill takes another day.
Flags: needinfo?(whd)
Depends on: 1164174
Blocks: 1164174
No longer depends on: 1164174
Update on this: we ran into another possibly related issue where the heka process was being killed by SIGPIPE mid-backfill. I've worked around it by processing data in chunks of one month and the cutover should happen this weekend.
Uh oh, did the backfilled data get snappy-encoded?
There are still a few lingering snappy-encoded records on the following days:
20150514
20150515
20150516

I'm checking prior history and will update here if there are any other affected days.
2015051[456] have all been reprocessed.
I also notice that there is only one file for 20150430, no files for 20150429 and 20150428, and fewer than expected for 20150427 and 20150426
(In reply to Mark Reid [:mreid] from comment #9)
> I also notice that there is only one file for 20150430, no files for
> 20150429 and 20150428, and fewer than expected for 20150427 and 20150426

At least there was a simple explanation for this: the SIGPIPE of death affected the backfill for the "201504" prefix, and heka died while processing the final days of that month. I re-ran the backfill for the affected days.
Was it just the days I mentioned that you backfilled?
Flags: needinfo?(whd)
(In reply to Mark Reid [:mreid] from comment #11)
> Was it just the days I mentioned that you backfilled?

Yeah, it looked from the logs like 20150425 had been entirely processed and 201505 was processed in a different heka run. Only 201504 had the SIGPIPE issue because it's the only full month of files in landfill.
Flags: needinfo?(whd)
Backfill is complete and metadata has been updated at s3://net-mozaws-prod-us-west-2-pipeline-metadata/sources.json, so we're done here.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
20150430 is still empty :(
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Really this time.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.