Open Bug 1840413 Opened 1 year ago Updated 10 months ago

Unusual telemetry from "welcome back" onboarding screen

Categories

(Firefox :: Messaging System, defect, P2)

defect

Tracking

()

People

(Reporter: aminomancer, Unassigned)

References

(Blocks 2 open bugs)

Details

I made a sql query to check what's going on with the attributed about:welcome flow, and the data seems strange. The number of primary/secondary clicks is vastly in excess of the impression count.

Not sure what to make of it. Nothing in the source code really stands out as a potential cause. Maybe it's somehow an artifact of testing (although the query includes only release channel) or the very low sample size.

I did another sql query to figure out whether these pings are coming from users who should actually see the screen, or not. The campaign value is supposed to be migration for this screen to even appear. Yet, for CLICK_BUTTON and FXA_SIGNIN_FLOW events, we see all kinds of other campaigns. Both should be impossible since the screen uses this targeting attribute.

Also, for IMPRESSION events, sometimes campaign is empty (presumably no attribution data at all) and the screen shows anyway. This doesn't seem like as big of an issue, it could just be due to testing unusual messages, as there are way fewer impression pings (one of the reasons this all seems abnormal).

So it seems like the greatly disproportionate number of click and (to a lesser degree) signin events is somehow due to users without migration attribution data somehow firing them. I think the minuscule impression ping count is closer to the true number of users who've seen this screen. But where are all those click and signin pings coming from? All the pings in question send message IDs that clearly identify that they came from this screen.

It's a real mystery because it would be one thing if we had an equal number of impression and event pings, and the only issue was that many of those pings showed incorrect attribution data. That would tell us users who aren't supposed to see the screen are erroneously seeing it. And that would imply that something is wrong with the isDeviceMigration targeting evaluation.

But in this case, only the event pings have that huge variety of attribution codes. Having incorrect attribution data seems to prevent users from seeing the screen, since if they saw it, they should send an impression ping. And they clearly don't since the impression ping count is so low. But having incorrect attribution data doesn't seem to prevent a click ping from being sent with this screen ID. So somehow, thousands of users are sending event pings for a screen they haven't seen and can't possibly be on.

Make it make sense!

Flags: needinfo?(mcoman)
Flags: needinfo?(dmosedale)

Also, one of my hypotheses while investigating was that this was all just an artifact of the testing that happens during development. You test the screen with no targeting so you can actually see it, and it sends a ping, and so on. But that should all be constrained to a very short window 2 months ago. But these SQL queries show a constant high rate of event pings and a constant low rate of impression pings. I just updated the data to see what happened over the weekend, and the pattern I noticed last week has evidently continued, unabated. So for some reason, users are still sending event pings from a screen that they aren't sending impression pings for and that shouldn't even render for them.

Hi Shane , Will be good to update queries in description of this bug to include Fx114 experiment e.g Mobile Screen Improvement that enrolled for 2 weeks at 100% ( June 12 - June 26) which is more than MR_WELCOME_DEFAULT (June 6th -June 12th), probably simpler to search on just message_id as %AW_WELCOME_BACK%

https://experimenter.services.mozilla.com/nimbus/mobile-screen-improvements/summary

Also, will be good to have a query that categorize by platform (win 7/8) to see if unusual telemetry seen pattern specific to low-end machines

Hey Punam - I actually did include the experiment pings in the original query. I figured it might be an experiment with different targeting that was throwing off the numbers, so I narrowed it down to just pings from the default message, but the proportions are actually the same. So I think we can confidently say the experiment message is behaving just like the default message, which makes sense.

Good thinking about the platform. I didn't consider what OS the pings might be coming from. Since I'm still learning what's available in the telemetry environment, I probably missed some other things as well. attribution.campaign was just a lucky guess 😂 Anyway, I'll see if I can make a query with more detail.

Per this query, it seems like some Windows 7 and 8 users are sending pings from this screen, and even a few Mac/Linux users (that may be due to testing, but the pings are still coming months after development). I wonder if my query makes sense. Daniel, if you have any free cycles could you take a look at it sometime and correct my errors?


The message is supposed to be shown to users who are setting up a new device, so we don't exactly want Win7/8 users seeing it. But I think there's nothing exactly stopping them. In particular, since Win7/8 users should be automatically moved to ESR when upgrading from Fx114 to Fx115, there might be an issue there. For example, I'm not sure what exactly happens if a user...

  1. gets the download link on Fx114 on win7
  2. 115 gets released
  3. uses the download link while still on win7
  4. so installs 115 with the special download link.

I think on windows that would result in ESR being installed right off the bat. but then they'd still have an attributed install on first run, so still see the special about:welcome flow even though they didn't actually set up a new device.

Marius, how feasible would it be to test something like this? When 115 ships, can we test what the download link installs and whether the special signin screen appears in the about:welcome flow and works correctly?

It might also be worthwhile to test the about:welcome flow on a low-end Windows 7/8 machine, just since I'm not certain we've tested with those conditions yet.

Flags: needinfo?(dberry)

And as part of any QA testing, can you also confirm that telemetry is collected? Everything might look and work fine from the user's perspective, but perhaps there's something going wrong in the telemetry code path under the hood.

  1. There should be an IMPRESSION ping when the screen renders
  2. There should be a CLICK_BUTTON ping when the primary button is clicked
  3. Same when the secondary button is clicked
  4. There should be an FXA_SIGNIN_FLOW ping when the primary button is clicked and the tab is either closed or the user signs in.

Hi, Shane!

I have investigated this behavior using the latest Firefox Release 114.0.2 (Build ID: 20230619081400) on Windows 10 x64, Windows 11 x64, Windows 8 x64, and Windows 7 x64, using the following steps:

[Prerequisites]:

  • Have any browser except Firefox installed.

[Steps to reproduce]:

  1. Open the browser from prerequisites and navigate to "https://mzl.la/newdevice".
  2. Download and install Firefox.
  3. Observe the Onboarding flow.

After following the steps from above I can confirm the following:

Windows 11 x64:

  • 3 times from a total of 5 the "Sign In" screen was not displayed even if I installed Firefox using the steps from above and the "campaign":"migration" attribute was present in the telemetry pings from and on the "about:telemetry" page.
  • You can find a screen recording of this behavior here.

Windows 10 x64, Windows 8 x64, and Windows 7 x64:

  • The "Sign In" screen is correctly displayed each time as the first screen of the Onboarding flow.
  • The IMPRESSION, CLICK_BUTTON, CLICK_BUTTON for the secondary button, and FXA_SIGNIN_FLOW telemetry pings are successfully generated in the "Browser Console".

You can find a list of the received Telemetry Pings here.

Also, as soon as Firefox 115 is released I will perform a spotcheck on Windows 7 and 8 and will leave a comment here with the results.

I you need any other information please don't hesitate to ping me.

Flags: needinfo?(mcoman)

Windows 11 x64:

  • 3 times from a total of 5 the "Sign In" screen was not displayed even if I installed Firefox using the steps from above and the "campaign":"migration" attribute was present in the telemetry pings from and on the "about:telemetry" page.
  • You can find a screen recording of this behavior here.

It seems 3/5 user got enrolled in embedded-import-wizard experiment (treatment-a or treatment-b branches) that went live yesterday and enrolling at 100% for a week (ending enrollment July 3rd before Fx115 goes live) for latest windows 10+ users (Need Default and Has Pin)

https://experimenter.services.mozilla.com/nimbus/embedded-import-wizard/summary

Ideally the recipe should have included AW_WELCOME_BACK screen in treatment branches. Considering the experiment is enrolling subset of population out of which AW_WELCOME_BACK with campaign migration is a very small number (~ < 100), we should let the experiment running for a week and account for it in analysis

Awesome, thank you Marius for the thorough investigation and Punam that makes sense to me.

Shane and I talked over Slack and we agree that there are unexpected events sent by users who don't appear to be in the migration campaign. Unsure of the cause at this time.

Flags: needinfo?(dberry)
Assignee: nobody → dmosedale
Flags: needinfo?(dmosedale)
Iteration: --- → 117.1 - July 3 - July 14
Priority: -- → P1

The severity field is not set for this bug.
:lsmith, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(lsmith)

I'll be digging into this.

Flags: needinfo?(lsmith) → needinfo?(dmosedale)
Iteration: 117.1 - July 3 - July 14 → 117.2 - July 17 - July 28
Iteration: 117.2 - July 17 - July 28 → 118.1 - July 31 - Aug 11
No longer blocks: fxms-infra
Flags: needinfo?(dmosedale)
Iteration: 118.1 - July 31 - Aug 11 → 118.2 - Aug 14 - Aug 25
Iteration: 118.2 - Aug 14 - Aug 25 → 119.1 - Aug 28 - Sept 8

I don't expect to get to this soon, so I'm removing myself.

Priority: P1 → P2
Iteration: 119.1 - Aug 28 - Sept 8 → 119.2 - Sept 11 - Sept 22
Iteration: 119.2 - Sept 11 - Sept 22 → ---
Severity: -- → S3
Assignee: dmosedale → nobody
You need to log in before you can comment on or make changes to this bug.