Unusual telemetry from "welcome back" onboarding screen
Categories
(Firefox :: Messaging System, defect, P2)
Tracking
()
People
(Reporter: aminomancer, Unassigned)
References
(Blocks 2 open bugs)
Details
I made a sql query to check what's going on with the attributed about:welcome flow, and the data seems strange. The number of primary/secondary clicks is vastly in excess of the impression count.
Not sure what to make of it. Nothing in the source code really stands out as a potential cause. Maybe it's somehow an artifact of testing (although the query includes only release channel) or the very low sample size.
I did another sql query to figure out whether these pings are coming from users who should actually see the screen, or not. The campaign
value is supposed to be migration
for this screen to even appear. Yet, for CLICK_BUTTON
and FXA_SIGNIN_FLOW
events, we see all kinds of other campaigns. Both should be impossible since the screen uses this targeting attribute.
Also, for IMPRESSION
events, sometimes campaign
is empty (presumably no attribution data at all) and the screen shows anyway. This doesn't seem like as big of an issue, it could just be due to testing unusual messages, as there are way fewer impression pings (one of the reasons this all seems abnormal).
So it seems like the greatly disproportionate number of click and (to a lesser degree) signin events is somehow due to users without migration attribution data somehow firing them. I think the minuscule impression ping count is closer to the true number of users who've seen this screen. But where are all those click and signin pings coming from? All the pings in question send message IDs that clearly identify that they came from this screen.
It's a real mystery because it would be one thing if we had an equal number of impression and event pings, and the only issue was that many of those pings showed incorrect attribution data. That would tell us users who aren't supposed to see the screen are erroneously seeing it. And that would imply that something is wrong with the isDeviceMigration
targeting evaluation.
But in this case, only the event pings have that huge variety of attribution codes. Having incorrect attribution data seems to prevent users from seeing the screen, since if they saw it, they should send an impression ping. And they clearly don't since the impression ping count is so low. But having incorrect attribution data doesn't seem to prevent a click ping from being sent with this screen ID. So somehow, thousands of users are sending event pings for a screen they haven't seen and can't possibly be on.
Make it make sense!
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 1•1 year ago
|
||
Also, one of my hypotheses while investigating was that this was all just an artifact of the testing that happens during development. You test the screen with no targeting so you can actually see it, and it sends a ping, and so on. But that should all be constrained to a very short window 2 months ago. But these SQL queries show a constant high rate of event pings and a constant low rate of impression pings. I just updated the data to see what happened over the weekend, and the pattern I noticed last week has evidently continued, unabated. So for some reason, users are still sending event pings from a screen that they aren't sending impression pings for and that shouldn't even render for them.
Comment 2•1 year ago
|
||
Hi Shane , Will be good to update queries in description of this bug to include Fx114 experiment e.g Mobile Screen Improvement that enrolled for 2 weeks at 100% ( June 12 - June 26) which is more than MR_WELCOME_DEFAULT (June 6th -June 12th), probably simpler to search on just message_id as %AW_WELCOME_BACK%
https://experimenter.services.mozilla.com/nimbus/mobile-screen-improvements/summary
Also, will be good to have a query that categorize by platform (win 7/8) to see if unusual telemetry seen pattern specific to low-end machines
Reporter | ||
Comment 3•1 year ago
|
||
Hey Punam - I actually did include the experiment pings in the original query. I figured it might be an experiment with different targeting that was throwing off the numbers, so I narrowed it down to just pings from the default message, but the proportions are actually the same. So I think we can confidently say the experiment message is behaving just like the default message, which makes sense.
Good thinking about the platform. I didn't consider what OS the pings might be coming from. Since I'm still learning what's available in the telemetry environment, I probably missed some other things as well. attribution.campaign
was just a lucky guess 😂 Anyway, I'll see if I can make a query with more detail.
Reporter | ||
Comment 4•1 year ago
|
||
Per this query, it seems like some Windows 7 and 8 users are sending pings from this screen, and even a few Mac/Linux users (that may be due to testing, but the pings are still coming months after development). I wonder if my query makes sense. Daniel, if you have any free cycles could you take a look at it sometime and correct my errors?
The message is supposed to be shown to users who are setting up a new device, so we don't exactly want Win7/8 users seeing it. But I think there's nothing exactly stopping them. In particular, since Win7/8 users should be automatically moved to ESR when upgrading from Fx114 to Fx115, there might be an issue there. For example, I'm not sure what exactly happens if a user...
- gets the download link on Fx114 on win7
- 115 gets released
- uses the download link while still on win7
- so installs 115 with the special download link.
I think on windows that would result in ESR being installed right off the bat. but then they'd still have an attributed install on first run, so still see the special about:welcome flow even though they didn't actually set up a new device.
Marius, how feasible would it be to test something like this? When 115 ships, can we test what the download link installs and whether the special signin screen appears in the about:welcome flow and works correctly?
It might also be worthwhile to test the about:welcome flow on a low-end Windows 7/8 machine, just since I'm not certain we've tested with those conditions yet.
Reporter | ||
Comment 5•1 year ago
|
||
And as part of any QA testing, can you also confirm that telemetry is collected? Everything might look and work fine from the user's perspective, but perhaps there's something going wrong in the telemetry code path under the hood.
- There should be an
IMPRESSION
ping when the screen renders - There should be a
CLICK_BUTTON
ping when the primary button is clicked - Same when the secondary button is clicked
- There should be an
FXA_SIGNIN_FLOW
ping when the primary button is clicked and the tab is either closed or the user signs in.
Comment 6•1 year ago
|
||
Hi, Shane!
I have investigated this behavior using the latest Firefox Release 114.0.2 (Build ID: 20230619081400) on Windows 10 x64, Windows 11 x64, Windows 8 x64, and Windows 7 x64, using the following steps:
[Prerequisites]:
- Have any browser except Firefox installed.
[Steps to reproduce]:
- Open the browser from prerequisites and navigate to "https://mzl.la/newdevice".
- Download and install Firefox.
- Observe the Onboarding flow.
After following the steps from above I can confirm the following:
Windows 11 x64:
- 3 times from a total of 5 the "Sign In" screen was not displayed even if I installed Firefox using the steps from above and the
"campaign":"migration"
attribute was present in the telemetry pings from and on the "about:telemetry" page. - You can find a screen recording of this behavior here.
Windows 10 x64, Windows 8 x64, and Windows 7 x64:
- The "Sign In" screen is correctly displayed each time as the first screen of the Onboarding flow.
- The
IMPRESSION
,CLICK_BUTTON
,CLICK_BUTTON
for the secondary button, andFXA_SIGNIN_FLOW
telemetry pings are successfully generated in the "Browser Console".
You can find a list of the received Telemetry Pings here.
Also, as soon as Firefox 115 is released I will perform a spotcheck on Windows 7 and 8 and will leave a comment here with the results.
I you need any other information please don't hesitate to ping me.
Comment 7•1 year ago
|
||
Windows 11 x64:
- 3 times from a total of 5 the "Sign In" screen was not displayed even if I installed Firefox using the steps from above and the
"campaign":"migration"
attribute was present in the telemetry pings from and on the "about:telemetry" page.- You can find a screen recording of this behavior here.
It seems 3/5 user got enrolled in embedded-import-wizard experiment (treatment-a or treatment-b branches) that went live yesterday and enrolling at 100% for a week (ending enrollment July 3rd before Fx115 goes live) for latest windows 10+ users (Need Default and Has Pin)
https://experimenter.services.mozilla.com/nimbus/embedded-import-wizard/summary
Ideally the recipe should have included AW_WELCOME_BACK
screen in treatment branches. Considering the experiment is enrolling subset of population out of which AW_WELCOME_BACK
with campaign migration
is a very small number (~ < 100), we should let the experiment running for a week and account for it in analysis
Reporter | ||
Comment 8•1 year ago
|
||
Awesome, thank you Marius for the thorough investigation and Punam that makes sense to me.
Shane and I talked over Slack and we agree that there are unexpected events sent by users who don't appear to be in the migration campaign. Unsure of the cause at this time.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 10•1 year ago
|
||
The severity field is not set for this bug.
:lsmith, could you have a look please?
For more information, please visit BugBot documentation.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 12•1 year ago
|
||
I don't expect to get to this soon, so I'm removing myself.
Reporter | ||
Updated•1 year ago
|
Updated•1 year ago
|
Reporter | ||
Updated•10 months ago
|
Updated•10 months ago
|
Description
•