Closed Bug 1520794 Opened 6 years ago Closed 6 years ago

Investigate attribution value incoherence on DSMO-RS

Categories

(Data Platform and Tools :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RT, Assigned: robotblake)

References

Details

(Whiteboard: [DataOps])

Per https://sql.telemetry.mozilla.org/queries/60892/source#157103 only 1% of stub installs have attribution (DSMO-RS represents stub ping data) whereas Su's analysis on telemetry shows 18.8% of new profiles with attribution on https://metrics.mozilla.com/protected/shong/shong-reports/firefox-acquisition/darkfunnel/tiger-team-profile-categorization/profile-categorizations.html

This means to me that attribution is working as expected but either the stub ping does not report it (very long strings or strange characters may break it?) or the data pipeline processing the stub ping filters out some of it and interprets as "0" or "null".

This bug is about investigating where the attribution data may break in the data pipeline processing.

Blake, is this something you'd look into?

Flags: needinfo?(bimsland)

I think this'll be me, I'll get it triaged.

Flags: needinfo?(bimsland)
Whiteboard: [DataOps]
Assignee: nobody → bimsland
Priority: -- → P2

A couple drive-by questions:

  1. Is https://github.com/whd/dsmo_load the right place to look?
  2. Is https://github.com/whd/dsmo_load/blob/master/heka/usr/share/heka/lua_filters/nginx_redshift.lua#L173-L176 doing the right thing -- should that conditional be "v7 or greater" rather than "exactly v7"?

ni :whd re: Tim's questions

Flags: needinfo?(whd)

(In reply to Tim Smith 👨‍🔬 [:tdsmith] from comment #3)

A couple drive-by questions:

  1. Is https://github.com/whd/dsmo_load the right place to look?
  2. Is https://github.com/whd/dsmo_load/blob/master/heka/usr/share/heka/lua_filters/nginx_redshift.lua#L173-L176 doing the right thing -- should that conditional be "v7 or greater" rather than "exactly v7"?

I think these are both correct (that looks like a code bug), but :robotblake would know better.

Flags: needinfo?(whd)

This has been fixed by https://github.com/whd/dsmo_load/pull/4 and :robotblake is looking into backfilling the data.

Hey all, as :whd said there was a bug where only v7 pings were getting attribution pushed into the db properly.

A fix has been deployed to the DSMO edge servers so data 20190130 forward should be good but I'll need to run a relatively large backfill to fix the data, potentially all the way back to mid-2017. Will update this bug as I get that all running.

As a followup, I'm still processing the data, it's taking quite a bit of time since there is a lot of data, hoping to actually initiate the backfill sometime Friday.

For info I just re-ran the query and back-filled data does not show-up:
https://sql.telemetry.mozilla.org/queries/60892#157103

The backfill had a failure over the weekend and never completed, I've been working on trying to figure out why which is still in progress. Hoping to start a run over night to see if the changes I've made address the issues.

Priority: P2 → P1

Small update, data processing is still chugging along and appears to have gotten past the failure point. Looking like an ETA of sometime this afternoon or early evening and then I can start loading the data into redshift.

Alright, so the data processing finally completed after several more fits and starts, I'm doing some validation / smoke-testing that the data looks okay and then I'll be cutting the tables over to the backfilled versions later tonight.

Data has been backfilled back to the beginning of June 2017, one side-effect is that any ping that had empty or null attribution (normally v6 is null, and >= v7 is an empty string) all loaded as null. I don't think this is an issue in practice since versioning should be taken into account, and for all intents and purposes null and blank should be in the same "no attribution" category.

If there are any other questions about this or issues you see with the data, let me know!

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.