Closed
Bug 1270586
Opened 9 years ago
Closed 9 years ago
Make Universal Search Telemetry data available in re:dash
Categories
(Cloud Services :: Metrics: Product Metrics, defect, P1)
Cloud Services
Metrics: Product Metrics
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: clouserw, Assigned: rweiss)
References
Details
This bug is bug 1264049's cousin, but for an experiment instead of Test Pilot itself.
Universal search is using Telemetry through Test Pilot (using the testpilottest type).
The testpilottype takes an undefined payload which needs to be defined by each test. For universal search, that payload is:
{
"test": "universal-search@mozilla.com", // The em:id field from the add-on
"agent": "User Agent String",
"payload": {
"didNavigate": true,
"interactionType": "click",
"recommendationShown": true,
"recommendationType": "tld",
"recommendationSelected": true,
"selectedIndex": -1
}
}
And a schema:
local schema = {
-- column name field type length attributes field name
{"timestamp", "TIMESTAMP", nil, "SORTKEY", "Timestamp"},
{"uuid", "VARCHAR", 36, nil, get_uuid},
{"test", "VARCHAR", 255, nil, "test"},
{"agent", "VARCHAR", 45, nil, "agent"},
{"didNavigate", "BOOLEAN", nil, nil, "payload[didNavigate]"},
{"interactionType", "VARCHAR", 255, nil, "payload[interactionType]"},
{"recommendationShown", "BOOLEAN", nil, nil, "payload[recommendationShown]"},
{"recommendationType", "VARCHAR", 255, nil, "payload[recommendationType]"},
{"recommendationSelected", "BOOLEAN", nil, nil, "payload[recommendationSelected]"},
{"selectedIndex", "INTEGER", nil, nil, "payload[selectedIndex]"}
}
Let us know if there is anything else needed. Thanks!
Test Pilot metrics docs: https://github.com/mozilla/testpilot/blob/master/docs/README-METRICS.md
Universal Search metrics docs: https://github.com/mozilla/universal-search/blob/master/docs/metrics.md
Updated•9 years ago
|
Priority: -- → P2
Updated•9 years ago
|
Component: Metrics: Pipeline → Metrics: Product Metrics
Priority: P2 → P1
Comment 1•9 years ago
|
||
Hey Rebecca,
Anything I can do to move this along? I'm happy to do the legwork if you point me in the right direction.
Thanks!
Flags: needinfo?(rweiss)
Comment hidden (duplicate) |
Assignee | ||
Comment 3•9 years ago
|
||
Talked offline about this with kparlante. Here's the state of this request:
Assertions:
1) There is a testpilot data source available in re:dash already. I believe this is a redshift instance that contains tables consisting of daily server logs.
2) Universal Search test pilot test pings are event-based, meaning that client actions emit a ping whenever specific events of interest occur.
3) We need to decide on an ETL approach for these pings such that ultimately each of these events becomes a single row in a tabular data source that is available within re:dash.
For the sake of decision-making, here are my naively suggested proposals for handling ETL of these pings:
A) We could batch process these pings on a schedule as a Spark job, which could look roughly like the following:
1. On a defined interval (e.g. hourly), collect all pings with doctype testpilottest and test label universal search
2. Create a DataFrame from these pings according to the schema described in the Universal Search metrics plan
3. Update some data source available in re:dash with this DataFrame object
B) We could stream process these pings as they arrive using a Heka filters, which could look roughly like the following:
1. As test pilot test pings arrive, filter into separate test pilot test types.
2. If the ping is universal search, transform into the appropriate row structure.
3. Insert the row into some data source as soon as transformation is complete.
I'm voting for B because since the data itself is event-based we should process them in as close to real-time as possible. And when I refer to "some data source," I believe we should follow from assumption (1) above and use Redshift as the endpoint for the datasets; we already know how to hook those up to re:dash and we're planning to make Redshift data sources available within a.t.m.o for finer-grained individual-level analysis. We can add more tables to the Testpilot redshift, or we can create a Universal Search redshift; I'm not sure which one is preferable. I suspect that the former is easier for now, but the latter might be superior in the long run.
kparlante suggested rmiller might be able to help tackle the heka filter and ETL process. Adding them both to this bug while we hash it out.
Updated•9 years ago
|
Flags: needinfo?(kparlante)
Comment 4•9 years ago
|
||
This plan sounds great, thanks for the effort, Rebecca and Katie! Let me know how I can help.
In order to get this up as quickly as possible, we're taking rweiss's A approach above. We are aiming to get this done by Friday, though a more conservative estimate is Monday.
Reporter | ||
Comment 6•9 years ago
|
||
Any updates on this?
Sorry, this has been done for a little while. In-progress dash: https://sql.telemetry.mozilla.org/dashboard/-in-progress-universal-search-executive-summary
Data is available in presto in the table usearch_daily. An update was pushed today to correct a misinstrumented field, and by tomorrow we will have the most up-to-date version.
Comment 8•9 years ago
|
||
Hi all,
The most recent dashboard data stops at July 14. This seems like a bug.
I'm also wondering if there's a bug in the data processing code, as the dashboard says that we had 25 users the week of 7/7, and 13 users the week of 7/14. I'm not sure if this means 'new users' or 'total users', but either way, those numbers seem suspect.
Should I file separate bugs for the missing data and the incorrect counts, or keep commenting in this one? This bug is marked 'new', while comment 7 says the dashboard is done, so I'm not sure.
Thanks,
Jared
Comment 9•9 years ago
|
||
We have an Universal Search: Executive Summary dashboard and have for a bit now, so I'm considering this issue to be resolved, we can (and should!) open new bugs for issues that we experience using this data. And I'll be opening one such bug shortly... ;)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 10•9 years ago
|
||
Oh, forgot the link:
https://sql.telemetry.mozilla.org/dashboard/-in-progress-universal-search-executive-summary
Yes, it's explicitly "in progress", but still I think we've accomplished the goal of "making the data available in re:dash" and should track other issues with new bugs.
You need to log in
before you can comment on or make changes to this bug.
Description
•