Closed Bug 1426170 Opened 8 years ago Closed 8 years ago

Add engine level search counts to clients_daily

Categories

(Data Platform and Tools :: General, enhancement, P2)

x86_64
Linux
enhancement
Points:
2

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: harter, Assigned: mreid)

References

Details

We want to be able to analyze search_counts in clients_daily. After discussing with Dave in Austin, I suggest we add columns for total-SAP search_counts for each of the major engines (Google, Bing, & Yahoo). This deviates from our original plan of adding `engine` as a key to clients_daily. Adding `engine` to clients daily will cause there to be duplicate data for several fields (e.g. URIs). This will unblock BD's current analysis needs. Eventually, we'd like to include a nested search_counts structure to include all engines. To be clear, let's add five new columns to clients daily: (search_total, search_google, search_bing, search_yahoo, search_google_nocodes). Google searches should include all searches with engine in ('google', 'google-2018'). We should only include search_counts with a `source` in the whitelist maintained in SEARCH_SOURCE_WHITELIST: https://github.com/mozilla/python_mozetl/blob/master/mozetl/constants.py#L5 The goal is to have this implemented by EOM January 2018
This is no longer necessary. I talked with arana today and it sounds like BD does not need the additional metrics included in clients_daily. Instead, it sounds like it would be more useful to add client_id as a column to search_aggregates. This comes with the advantage of including all search providers without having duplicate data (as noted above).
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
See Also: → 1426437
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.