Make Universal Search Telemetry data available in re:dash

RESOLVED FIXED

Status

Cloud Services
Metrics: Product Metrics
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: clouserw, Assigned: Rebecca Weiss)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
This bug is bug 1264049's cousin, but for an experiment instead of Test Pilot itself.

Universal search is using Telemetry through Test Pilot (using the testpilottest type).  
The testpilottype takes an undefined payload which needs to be defined by each test.  For universal search, that payload is:

{
    "test": "universal-search@mozilla.com",  // The em:id field from the add-on
    "agent": "User Agent String",
    "payload": {
        "didNavigate": true,
        "interactionType": "click",
        "recommendationShown": true,
        "recommendationType": "tld",
        "recommendationSelected": true,
        "selectedIndex": -1
    }
}

And a schema:

local schema = {
--   column name                   field type   length  attributes   field name
    {"timestamp",                  "TIMESTAMP", nil,    "SORTKEY",   "Timestamp"},
    {"uuid",                       "VARCHAR",   36,      nil,         get_uuid},

    {"test",                       "VARCHAR",   255,     nil,         "test"},
    {"agent",                      "VARCHAR",   45,      nil,         "agent"},
    {"didNavigate",                "BOOLEAN",   nil,     nil,         "payload[didNavigate]"},
    {"interactionType",            "VARCHAR",   255,     nil,         "payload[interactionType]"},
    {"recommendationShown",        "BOOLEAN",   nil,     nil,         "payload[recommendationShown]"},
    {"recommendationType",         "VARCHAR",   255,     nil,         "payload[recommendationType]"},
    {"recommendationSelected",     "BOOLEAN",   nil,     nil,         "payload[recommendationSelected]"},
    {"selectedIndex",              "INTEGER",   nil,     nil,         "payload[selectedIndex]"}
}

Let us know if there is anything else needed.  Thanks!


Test Pilot metrics docs: https://github.com/mozilla/testpilot/blob/master/docs/README-METRICS.md

Universal Search metrics docs:  https://github.com/mozilla/universal-search/blob/master/docs/metrics.md
(Reporter)

Updated

2 years ago
Blocks: 1257690

Updated

2 years ago
Blocks: 1270961
Priority: -- → P2

Updated

2 years ago
Component: Metrics: Pipeline → Metrics: Product Metrics
Priority: P2 → P1
Hey Rebecca,

Anything I can do to move this along? I'm happy to do the legwork if you point me in the right direction.

Thanks!
Flags: needinfo?(rweiss)
(Assignee)

Comment 2

2 years ago
duplicate
Talked offline about this with kparlante.  Here was my proposal:

Assertions:
- There is a testpilot data source available in re:dash already.  I believe this is a redshift instance that contains tables consisting of daily server logs
- Universal Search test pilot test pings are event-based, meaning that client actions emit a ping whenever specific events of interest occur.
Flags: needinfo?(rweiss) → needinfo?(kparlante)
(Assignee)

Comment 3

2 years ago
Talked offline about this with kparlante.  Here's the state of this request:

Assertions:
1) There is a testpilot data source available in re:dash already.  I believe this is a redshift instance that contains tables consisting of daily server logs.  
2) Universal Search test pilot test pings are event-based, meaning that client actions emit a ping whenever specific events of interest occur.
3) We need to decide on an ETL approach for these pings such that ultimately each of these events becomes a single row in a tabular data source that is available within re:dash.

For the sake of decision-making, here are my naively suggested proposals for handling ETL of these pings:
A) We could batch process these pings on a schedule as a Spark job, which could look roughly like the following:
1. On a defined interval (e.g. hourly), collect all pings with doctype testpilottest and test label universal search
2. Create a DataFrame from these pings according to the schema described in the Universal Search metrics plan
3. Update some data source available in re:dash with this DataFrame object
B) We could stream process these pings as they arrive using a Heka filters, which could look roughly like the following:
1. As test pilot test pings arrive, filter into separate test pilot test types.
2. If the ping is universal search, transform into the appropriate row structure.
3. Insert the row into some data source as soon as transformation is complete.

I'm voting for B because since the data itself is event-based we should process them in as close to real-time as possible.  And when I refer to "some data source," I believe we should follow from assumption (1) above and use Redshift as the endpoint for the datasets; we already know how to hook those up to re:dash and we're planning to make Redshift data sources available within a.t.m.o for finer-grained individual-level analysis.  We can add more tables to the Testpilot redshift, or we can create a Universal Search redshift; I'm not sure which one is preferable.  I suspect that the former is easier for now, but the latter might be superior in the long run.  

kparlante suggested rmiller might be able to help tackle the heka filter and ETL process.  Adding them both to this bug while we hash it out.

Updated

2 years ago
Flags: needinfo?(kparlante)
This plan sounds great, thanks for the effort, Rebecca and Katie! Let me know how I can help.

Comment 5

2 years ago
In order to get this up as quickly as possible, we're taking rweiss's A approach above. We are aiming to get this done by Friday, though a more conservative estimate is Monday.
(Reporter)

Comment 6

2 years ago
Any updates on this?

Comment 7

2 years ago
Sorry, this has been done for a little while. In-progress dash: https://sql.telemetry.mozilla.org/dashboard/-in-progress-universal-search-executive-summary

Data is available in presto in the table usearch_daily. An update was pushed today to correct a misinstrumented field, and by tomorrow we will have the most up-to-date version.
Hi all,

The most recent dashboard data stops at July 14. This seems like a bug.

I'm also wondering if there's a bug in the data processing code, as the dashboard says that we had 25 users the week of 7/7, and 13 users the week of 7/14. I'm not sure if this means 'new users' or 'total users', but either way, those numbers seem suspect.

Should I file separate bugs for the missing data and the incorrect counts, or keep commenting in this one? This bug is marked 'new', while comment 7 says the dashboard is done, so I'm not sure.

Thanks,

Jared
We have an Universal Search: Executive Summary dashboard and have for a bit now, so I'm considering this issue to be resolved, we can (and should!) open new bugs for issues that we experience using this data. And I'll be opening one such bug shortly... ;)
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Oh, forgot the link:

https://sql.telemetry.mozilla.org/dashboard/-in-progress-universal-search-executive-summary

Yes, it's explicitly "in progress", but still I think we've accomplished the goal of "making the data available in re:dash" and should track other issues with new bugs.
You need to log in before you can comment on or make changes to this bug.