Closed Bug 1308705 Opened 8 years ago Closed 8 years ago

Validate the URI loads triggered by search (Nightly Channel)

Categories

(Toolkit :: Telemetry, defect, P1)

defect
Points:
2

Tracking

()

RESOLVED FIXED
Tracking Status
firefox52 --- affected

People

(Reporter: Dexter, Assigned: Dexter)

References

Details

(Whiteboard: [measurement:client])

In bug 1303333 we landed a new set of scalar probes to count the number of URIs loaded by the user, breaking down the numbers by the action which triggered the loads.

We should make sure that the probes are behaving correctly by performing a server-side analysis on the Nightly channel as soon as the first week of data is available.
Points: --- → 2
Depends on: 1303333
Whiteboard: [measurement:client]
Priority: -- → P3
Priority: P3 → P1
We should also validate the probe from bug 1304647 here.
Assignee: nobody → alessio.placitelli
I spent some time validating the data coming in from the probes, and I found that about ~73% of Nightly "main" pings from  the 31st of October to the 6th of November are not reporting the the scalar search probes (see cell 55 of [1]), as the "keyedScalars" section of the ping is empty.

I drilled into the issue a bit more, trying to compare the new probe data against the counts stored in the SEARCH_COUNTS histogram. What I found is even more surprising: it looks like about 98% of Nightly pings are not reporting SEARCH_COUNTS (cell 56).

Fearing a regression, I've checked the trends on Nightly for the previous weeks, and again most of the pings do not contain SEARCH_COUNTS. I created a query on sql.t.m.o [2] to check this behaviour and it looks like the number of reported SEARCH_COUNTS on Nightly are consistently low.

To recap, the findings:

- 73% of pings on Nightly do not report the new search probes;
- 98% of pings on Nightly do not report the SEARCH_COUNTS histogram;
- no regression was introduced by bug 1303333, as the low SEARCH_COUNTS availability is consistent with the other data from previous weeks;
- 25% of the pings contain the new search probes, but no SEARCH_COUNTS histogram (see cell 59); unfortunately, this isn't due to one-off searches are not being counted in SEARCH_COUNTS, as we're seeing entries other than 'one-off' from cell 60;
- the data from the new search probe doesn't look bad or surprising (see cell 66 and on)


[1] - https://gist.github.com/Dexterp37/078e254d3e12d57bf45d5417504d9b71
[2] - https://sql.telemetry.mozilla.org/queries/1612
Status: NEW → ASSIGNED
Brendan, comment 2 contains the results from the analysis of the data coming from Nightly.
Would you mind taking a look at the notebook to check if the search data that we have for the new probes looks reasonable?

The key findings are highlighted in the comment above.
Flags: needinfo?(bcolloran)
I am once again impressed with this thorough QA, thanks Alessio!

My first question is about the query in [2]-- why would we see the number of pings with search_count *increasing* on more recent dates? In basically every FHR-based analysis I've ever seen, counts of things fall as you get close to the query or snapshot date (since there is latency in ping submission).

Also wrt the query in [2]-- can we see the % of pings with search_counts over time rather than the actual count of pings with search_counts?

(more soon, just had these quick comments off the top of my head)
(In reply to Alessio Placitelli [:Dexter] from comment #2)
> - 25% of the pings contain the new search probes, but no SEARCH_COUNTS
> histogram (see cell 59); unfortunately, this isn't due to one-off searches
> are not being counted in SEARCH_COUNTS, as we're seeing entries other than
> 'one-off' from cell 60;

i'm concerned about this part. to be 100% clear: are you saying that the new probe is showing searches that are never recorded in the SEARCH_COUNTS histogram?

if this is the case, but this new probe has not introduced a regression that broke SEARCH_COUNTS, (which does not seem to be the case based on what you've looked at so far), then we'd seem to have two options:
(1) for many months (years?) we've been undercounting searches that users have been performing by failing to record them in SEARCH_COUNTS
(2) the new probe is somehow overcounting searches when no search event has happened.

ni-ing sam and dave who have worked a lot more on search than i have
Flags: needinfo?(spenrose)
Flags: needinfo?(dzeber)
Flags: needinfo?(bcolloran)
Flags: needinfo?(alessio.placitelli)
I believe Alessio's results are an artifact of either the longitudinal data source or his query of it. When I aggregate search_counts across all channels between 1 September and 1 November via main_summary and my rollup code, I get very stable values. I have not disaggregated by channel, but the steep climb this query finds for release is not present in my source.

If you would like me to perform my aggregation for nightly only, I can do that. 1-2 day turnaround depending on how wide a net I cast. Contact behind LDAP for code.
Flags: needinfo?(spenrose)
Flags: needinfo?(dzeber)
Here are my intermediate findings.

Using Dataset.records() in https://gist.github.com/SamPenrose/307920f9004a14efd7923939069cb3dd :

1) There were 925,836 nightly main pings between 31 Oct and 6 Nov inclusive (7 days).
2) Of those, 419,081 did not fall into the correct build range.
3) Of the remainder, 2,531 (~0.5%) were duplicates.
4) Of the 504,224 left, 136,815 had non-empty searches.

Using main_summary in a different notebook, I found that the 7 days from the 31st through the 6th (whoops) had 421,148 nightly rows. I did not filter by buildid, which would lead me to expect a value close to (925,836 - 20,000 duplicates) * 0.85 -> 800,000. Of the 420K I did find, 105,791 were not NULL, or very close the fraction in 4). I did not test that array field for having 1+ search_counts structs in them, but I bet they all or almost all do. So: two forays by me finding plausible ratios, one by Alessio finding an absurd ratio.

In cell [4] of Alessio's notebook, he rebinds the builtin name* filter to his own function. In cell [13], he calls filter(get_pings(...)). If the cell numbering and contents are accurate that's will work as expected. If he manually edited the notebook, it might not be.

I believe the next step is to examine the inner workings of these functions:

1) get_pings_properties()
2) rdd.map(lambda p: (p["meta/documentId"], p)).reduceByKey(lambda a, b: a).values()
3) get_pings()

I would also be curious to explore why my main_summary.filter("channel='nightly'") resulted in ~45% as many rows as Dataset.records().

* Tsk, tsk.
These words were a mistake; please ignore.

(In reply to Sam Penrose from comment #7)
> ... (whoops) had 421,148 nightly rows. I did not filter by
> buildid, which would lead me to expect a value close to (925,836 - 20,000
> duplicates) * 0.85 -> 800,000.
(In reply to Sam Penrose from comment #7)
> Here are my intermediate findings.

First of all, thanks Sam! I finally found the problem... yuck!

> Using Dataset.records() in
> https://gist.github.com/SamPenrose/307920f9004a14efd7923939069cb3dd 

Thanks for the notebook. If you slightly change it to filter for the right appName, source and source version, then the record count perfectly matches the ping count from |get_pings|.

> I believe the next step is to examine the inner workings of these functions:
> 
> 1) get_pings_properties()

Bingo, that was the problem. For some reason, |get_pings_properties| is filtering out some keyed histogram data. I've filed bug 1318326 about that.

To work around the issue, I've written my own mapping function for this notebook. The results are in [1] and it seems like there's only one odd thing.

By looking at cell [40] it seems like we're not reporting one-off engines in SEARCH_COUNTS any more and I've probably regressed it (it was lacking test coverage..).

However, this has a super-low impact as it appears to just affect the 0.3% of the total "main" pings on one week's worth of Nightly data.
I've filed bug 1318333 for that.

@Brendan, does the rest look sane enough? Can we close off this bug and move on?

> I would also be curious to explore why my
> main_summary.filter("channel='nightly'") resulted in ~45% as many rows as
> Dataset.records().

I'm not sure about that, but be sure to check that you're filtering for the correct appName, source and source version.


[1] - https://gist.github.com/Dexterp37/078e254d3e12d57bf45d5417504d9b71
Flags: needinfo?(alessio.placitelli) → needinfo?(bcolloran)
Sam and Alessio, thanks for digging in to this.

Regarding closing the bug: I'm going to again defer to Sam and Dave -- they know this specific area a lot better than I do, so they'll have better priors about what we should be expecting to see in this data. When they're happy I'm happy. (I can't recall, is there a new search analyst? If you two think he should take a look, feel free to ping him too)
Flags: needinfo?(spenrose)
Flags: needinfo?(dzeber)
Flags: needinfo?(bcolloran)
(In reply to brendan c from comment #10)
> Sam and Alessio, thanks for digging in to this.
> 
> Regarding closing the bug: I'm going to again defer to Sam and Dave -- they
> know this specific area a lot better than I do, so they'll have better
> priors about what we should be expecting to see in this data. When they're
> happy I'm happy. (I can't recall, is there a new search analyst? If you two
> think he should take a look, feel free to ping him too)

I believe Alessio's last notebook has demonstrated that we do not have a problem with SEARCH_COUNTS upstream of get_pings_properties() and therefore this bug should be closed. I defer to Alessio on the scalars. Alessio if you want me to tackle debugging get_pings_properties() vs payload/keyedHistograms/SEARCH_COUNTS I could work on that early next week, just assign bug 1318326 to me.
Flags: needinfo?(spenrose)
Flags: needinfo?(dzeber)
One more thing I'd like to see Alessio -- wrt cell [40] and [41], it looks like we're just counting the presence/absence of the engagement and SEARCH_COUNTS fields, but we should double check that the same totals are being reported e.g. for each client the total of SEARCH_COUNTS["*.searchbar"] should be the same as the total of browser.engagement.navigation.searchbar["*"], right? Let's double check that, and then we're probably good. Thanks!
(In reply to Sam Penrose from comment #11)
> [...] Alessio if you want me to tackle debugging get_pings_properties() vs
> payload/keyedHistograms/SEARCH_COUNTS I could work on that early next week,
> just assign bug 1318326 to me.

Mh, I *personally* don't know who should fix that, I guess that will be addressed in the Pipeline's triage meeting next week. However, if you think you can devote some of your cycles to that, feel free to, I doubt anybody will object :-D

(In reply to brendan c from comment #12)
> One more thing I'd like to see Alessio -- wrt cell [40] and [41], it looks
> like we're just counting the presence/absence of the engagement and
> SEARCH_COUNTS fields, but we should double check that the same totals are
> being reported e.g. for each client the total of
> SEARCH_COUNTS["*.searchbar"] should be the same as the total of
> browser.engagement.navigation.searchbar["*"], right? Let's double check
> that, and then we're probably good. Thanks!

I've updated the notebook to check that, and the results are in cell 15 of the updated notebook.
I'm reporting them here for reference:

> ok - Ratio 0.983
> searchbar wrong due to One-off Searches (bug 1318333) - Ratio 0.012
> urlbar wrong due to One-off Searches (bug 1318333) - Ratio 0.002
> Sum wrong due to One-off Searches (bug 1318333) - Ratio 0.002
> system is not being record by scalars. - Ratio 0.000
> urlbar doesn't match. - Ratio 0.000
> urlbar is not being record by scalars. - Ratio 0.000

This is saying that 98.3% of the pings with search data have a perfect matching between the search scalars and SEARCH_COUNTS. If we account for the problem of the one-off searches not being counted any more (bug 1318333), then the perfect match percentage grows to 99.9% of the pings.

The remaining pings could just be outliers, people with weird developer builds or corrupted memory. I guess we could see how that holds in Beta.

Let me know if we're good to close this and move on!


[1] - https://gist.github.com/Dexterp37/078e254d3e12d57bf45d5417504d9b71
Flags: needinfo?(bcolloran)
Sounds good Alessio-- if we're at 99.% match (once accounting for the known bug), then I'm happy. I think we're good to close this and move forward! Thank you for all of your great work on this!
Flags: needinfo?(bcolloran)
(In reply to brendan c from comment #14)
> Sounds good Alessio-- if we're at 99.% match (once accounting for the known
> bug), then I'm happy. I think we're good to close this and move forward!
> Thank you for all of your great work on this!

Thanks Brendan! :-)
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.