1270586 - Make Universal Search Telemetry data available in re:dash

Reporter

Description

•

8 years ago

This bug is bug 1264049's cousin, but for an experiment instead of Test Pilot itself.

Universal search is using Telemetry through Test Pilot (using the testpilottest type).  
The testpilottype takes an undefined payload which needs to be defined by each test.  For universal search, that payload is:

{
    "test": "universal-search@mozilla.com",  // The em:id field from the add-on
    "agent": "User Agent String",
    "payload": {
        "didNavigate": true,
        "interactionType": "click",
        "recommendationShown": true,
        "recommendationType": "tld",
        "recommendationSelected": true,
        "selectedIndex": -1
    }
}

And a schema:

local schema = {
--   column name                   field type   length  attributes   field name
    {"timestamp",                  "TIMESTAMP", nil,    "SORTKEY",   "Timestamp"},
    {"uuid",                       "VARCHAR",   36,      nil,         get_uuid},

    {"test",                       "VARCHAR",   255,     nil,         "test"},
    {"agent",                      "VARCHAR",   45,      nil,         "agent"},
    {"didNavigate",                "BOOLEAN",   nil,     nil,         "payload[didNavigate]"},
    {"interactionType",            "VARCHAR",   255,     nil,         "payload[interactionType]"},
    {"recommendationShown",        "BOOLEAN",   nil,     nil,         "payload[recommendationShown]"},
    {"recommendationType",         "VARCHAR",   255,     nil,         "payload[recommendationType]"},
    {"recommendationSelected",     "BOOLEAN",   nil,     nil,         "payload[recommendationSelected]"},
    {"selectedIndex",              "INTEGER",   nil,     nil,         "payload[selectedIndex]"}
}

Let us know if there is anything else needed.  Thanks!


Test Pilot metrics docs: https://github.com/mozilla/testpilot/blob/master/docs/README-METRICS.md

Universal Search metrics docs:  https://github.com/mozilla/universal-search/blob/master/docs/metrics.md

Wil Clouser [:clouserw]

Reporter

Updated

•

8 years ago

Blocks: 1257690

Katie Parlante

Updated

•

8 years ago

Blocks: 1270961

Rob Miller [:rmiller]

Updated

•

8 years ago

Priority: -- → P2

Katie Parlante

Updated

•

8 years ago

Component: Metrics: Pipeline → Metrics: Product Metrics

Priority: P2 → P1

Chuck Harmston [:chuck]

Comment 1

•

8 years ago

Hey Rebecca,

Anything I can do to move this along? I'm happy to do the legwork if you point me in the right direction.

Thanks!

Flags: needinfo?(rweiss)

Comment hidden (duplicate)

Talked offline about this with kparlante.  Here was my proposal:

Assertions:
- There is a testpilot data source available in re:dash already.  I believe this is a redshift instance that contains tables consisting of daily server logs
- Universal Search test pilot test pings are event-based, meaning that client actions emit a ping whenever specific events of interest occur.

rweiss@mozilla.com

Assignee

Comment 3

•

8 years ago

Talked offline about this with kparlante. Here's the state of this request:

Assertions:
1) There is a testpilot data source available in re:dash already. I believe this is a redshift instance that contains tables consisting of daily server logs.
2) Universal Search test pilot test pings are event-based, meaning that client actions emit a ping whenever specific events of interest occur.
3) We need to decide on an ETL approach for these pings such that ultimately each of these events becomes a single row in a tabular data source that is available within re:dash.

For the sake of decision-making, here are my naively suggested proposals for handling ETL of these pings:
A) We could batch process these pings on a schedule as a Spark job, which could look roughly like the following:
1. On a defined interval (e.g. hourly), collect all pings with doctype testpilottest and test label universal search
2. Create a DataFrame from these pings according to the schema described in the Universal Search metrics plan
3. Update some data source available in re:dash with this DataFrame object
B) We could stream process these pings as they arrive using a Heka filters, which could look roughly like the following:
1. As test pilot test pings arrive, filter into separate test pilot test types.
2. If the ping is universal search, transform into the appropriate row structure.
3. Insert the row into some data source as soon as transformation is complete.

I'm voting for B because since the data itself is event-based we should process them in as close to real-time as possible. And when I refer to "some data source," I believe we should follow from assumption (1) above and use Redshift as the endpoint for the datasets; we already know how to hook those up to re:dash and we're planning to make Redshift data sources available within a.t.m.o for finer-grained individual-level analysis. We can add more tables to the Testpilot redshift, or we can create a Universal Search redshift; I'm not sure which one is preferable. I suspect that the former is easier for now, but the latter might be superior in the long run.

kparlante suggested rmiller might be able to help tackle the heka filter and ETL process. Adding them both to this bug while we hash it out.

Katie Parlante

Updated

•

8 years ago

Flags: needinfo?(kparlante)

Chuck Harmston [:chuck]

Comment 4

•

8 years ago

This plan sounds great, thanks for the effort, Rebecca and Katie! Let me know how I can help.

Ilana

Comment 5

•

8 years ago

In order to get this up as quickly as possible, we're taking rweiss's A approach above. We are aiming to get this done by Friday, though a more conservative estimate is Monday.

Wil Clouser [:clouserw]

Reporter

Comment 6

•

8 years ago

Any updates on this?

Ilana

Comment 7

•

8 years ago

Sorry, this has been done for a little while. In-progress dash: https://sql.telemetry.mozilla.org/dashboard/-in-progress-universal-search-executive-summary

Data is available in presto in the table usearch_daily. An update was pushed today to correct a misinstrumented field, and by tomorrow we will have the most up-to-date version.

Jared Hirsch [:jhirsch] (he/him) (Needinfo please)

Comment 8

•

8 years ago

Hi all,

The most recent dashboard data stops at July 14. This seems like a bug.

I'm also wondering if there's a bug in the data processing code, as the dashboard says that we had 25 users the week of 7/7, and 13 users the week of 7/14. I'm not sure if this means 'new users' or 'total users', but either way, those numbers seem suspect.

Should I file separate bugs for the missing data and the incorrect counts, or keep commenting in this one? This bug is marked 'new', while comment 7 says the dashboard is done, so I'm not sure.

Thanks,

Jared

Rob Miller [:rmiller]

Comment 9

•

8 years ago

We have an Universal Search: Executive Summary dashboard and have for a bit now, so I'm considering this issue to be resolved, we can (and should!) open new bugs for issues that we experience using this data. And I'll be opening one such bug shortly... ;)

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Rob Miller [:rmiller]

Comment 10

•

8 years ago

Oh, forgot the link:

https://sql.telemetry.mozilla.org/dashboard/-in-progress-universal-search-executive-summary

Yes, it's explicitly "in progress", but still I think we've accomplished the goal of "making the data available in re:dash" and should track other issues with new bugs.

Bugzilla

Quick Search

Make Universal Search Telemetry data available in re:dash

Categories

(Cloud Services :: Metrics: Product Metrics, defect, P1)

Tracking

(Not tracked)

People

(Reporter: clouserw, Assigned: rweiss)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10