Closed Bug 1333533 Opened 7 years ago Closed 7 years ago

Process stub installer pings with funnelcake id 98, 99, 100

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: agashlin, Assigned: robotblake)

References

(Blocks 1 open bug)

Details

(Whiteboard: [SvcOps])

Currently dsmo_load expects a stub ping to have a version field equal to "v6" or "v7". When the stub is built for a funnelcake, for instance the upcoming 98, this becomes "v7-98", so these pings will be dumped in the download_stats_errors table. Besides the version number the request is formatted the same as a normal v7 ping.

The upcoming funnelcake (tracked in meta bug 1322718) will involve funnelcake ids 98, 99, and 100, so we need to be able to analyze pings coming in with version field "v7-98", "v7-99", or "v7-100". This is a one-off experiment running for three weeks from February 7, as well as testing the week before.

It is probably best for these pings to not go into the main download_stats table, but we need them processed in the same way for analysis. It seems like this should involve a new table with the same schema.

Also, these requests will include attribution data, which may be too long for the download_stats_errors table which currently uses a VARCHAR 256 for the request, while download_stats provides a full 256 for the attribution data alone (looking at some examples these seem to run around 120 chars). If the pings will be processed by re-examining the errors this should be increased.

The attribution field will be used to determine which funnelcake id the ping is associated with, but it would be helpful if a field could be added to save the ping version, as well.
Assignee: nobody → bimsland
Points: --- → 2
Priority: -- → P1
Adam, should this block the funnelcake launch? If this is not done in time could we still use the attribution field to track the data in the download_stats_errors table?
Flags: needinfo?(agashlin)
Due to bug 1333868 we won't see the data yet in download_stats_errors, but I think we can still process the logs later. I know we've had backfills in the past.

Regarding attribution, is that being set on the stubs? We do have the ping version to rely on for funnelcake identification even if attribution isn't present.
Flags: needinfo?(agashlin)
> Regarding attribution, is that being set on the stubs? We do have the ping
> version to rely on for funnelcake identification even if attribution isn't
> present.
I was assuming from previous e-mail conversations that we'd be using the attribution fields on the stub installer for funnelcake identification but it sounds like the ping version gives us the same level of information so that's all good!
I'm just wondering if the attribution is being set on the stub, that's something done by the server and I don't know what the progress is on deploying that. We can use the stub ping version to identify pings, but for telemetry from Firefox itself I don't know if we have anything besides attribution to tell what funnelcake someone is in.
NIing cmore - Chris do you know if we need to use the attribution field for this purpose. Perhaps the build_id is being used?
Flags: needinfo?(chrismore.bugzilla)
I think distribution_id should work, though I can't get anything through Redash this morning to confirm that's what I think it is.

(sorry for taking the comments of this bug off-topic, probably should follow-up via email)
We don't set anything on the server-side (besides coercing some fields into booleans), the data that ends up in redshift is purely based on the GET url that the installer hits.

Also, looks like you may have gotten an answer on IRC, but we do have all the raw data so can reprocess later. With that said, how would you like the funnelcake pings to show up? Table per version? Or just a separate funnelcake table with a field with the version number?
The server-side processing I meant was off-topic, inserting attribution data into the stub binary, sorry for the confusion.

Thanks for the confirmation that reprocessing is possible.

I prefer a single table with a field for the version number. Thanks!
Funnelcakes are identified via distribution_id as mozillaXXX where XXX = funnelcake ID (98|99|100)
Flags: needinfo?(chrismore.bugzilla)
If we don't have this ability, will we lose the ability to segment stub installer metrics by funnelcake?
The funnelcake ID is recorded in the ping version, so even if this is not finished yet we will be able to go back and process the pings to segment by funnelcake ID. I don't think it should block the experiment, though it does block some analysis.
Adam, could we look at build_id in the stub ping data to identify data from the different funnelcakes until this gets fixed? I assume different funnelcakes have different build_ids?
Flags: needinfo?(agashlin)
We can use the ping version, I've put together a draft query that reports 98 vs 99/100 exit codes here:

https://sql.telemetry.mozilla.org/queries/2851/source#5363

You should be able to modify the "SPLIT_PART(request, '/', 1+13)" part to extract other fields from the ping, the code at 
https://github.com/whd/dsmo_load/blob/master/heka/usr/share/heka/lua_filters/nginx_redshift.lua shows what field is what, for instance the exit code is field 13, using the redshift numbering that's 14 (I wrote 13+1 to make it clearer, hopefully).
Flags: needinfo?(agashlin)
That's very useful, thanks!
Whiteboard: [SvcOps]
These now exist in their own tables and can be queried using the "download_stats_funnelcake" view.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.