Closed Bug 1490826 Opened 1 year ago Closed 1 year ago

about:welcome fetch from FxA /metrics-flow failing after browser start-up

Categories

(Firefox :: General, defect, P3)

defect

Tracking

()

VERIFIED FIXED
Firefox 64
Iteration:
64.3 - Oct 12
Tracking Status
firefox62 --- wontfix
firefox63 + verified
firefox64 --- verified

People

(Reporter: loines, Assigned: vladikoff)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

With https://bugzilla.mozilla.org/show_bug.cgi?id=1446023
and https://github.com/mozilla/bedrock/pull/5713 a fetch to the fxa metrics endpoint was implemented so that fxa form views on about:welcome could be measured. If the fetch is successful, the form should contain non-default `flow_id` and `flow_begin_time` values (see https://github.com/mozilla/bedrock/blob/5a947ec09c76ed4541e97a06e7fc37d4e3768c92/media/js/base/mozilla-fxa-form.js#L57-L64)

However `flow_id` and `flow_begin_time` appear to have default (null) values when about:welcome loads on a new profile. 

STR:
1. create a new profile
2. wait for about:welcome to load
3. inspect source on email entry form

Expected result: 
`flow_id` and `flow_begin_time` should have non-default values.

Actual result:
```
<input name="flow_id" value="" type="hidden">
<input name="flow_begin_time" value="0" type="hidden">
```

HOWEVER: if you then open a new tab, navigate directly to about:welcome, and inspect source, the relevant parameters do appear to have actual values, e.g.

```
<input name="flow_id" value="87bdff375a6b80b883f6835b84ffcc613b910d648f72fc63ee22cd1ebab2487b" type="hidden">
<input name="flow_begin_time" value="1536784405695" type="hidden">
```

So it appears that the call to the fxa metrics endpoint is failing after about:welcome loads on startup, but not generally.


From the metrics point of view, the above would explain why we are receiving *some* about:welcome form views, but not at the volume that we would expect. see for example this chart:

https://analytics.amplitude.com/mozilla-corp/chart/s3o1wyz

The `fxa_email_first - view` event (tracks the unique form views of the email input form) should always have higher volume than `fxa_reg - view` and `fxa_login - view` since they are only viewable by users after they submit their email through the first form. So the chart above implies we are missing a majority of email form views on about:welcome.
Erica: can you review the bug above and see if it is something can be resolved and how much effort it would take to do?

Thanks!
Chris
Flags: needinfo?(ewright)
Blocks: 1448918
Sorry about the misleading links in the first post, the github links should actually be pointing here: 
https://github.com/mozilla/activity-stream/blob/8033ab78b7ec55ca9d1d3a4a3e9ec799183ddc22/content-src/components/StartupOverlay/StartupOverlay.jsx#L16-L39
FYI I tried pulling down data on the telemetry event that is associated with this line https://github.com/mozilla/activity-stream/blob/a62ae043853bdcb38e4daa94e6fb4495da0805b2/content-src/components/StartupOverlay/StartupOverlay.jsx#L36 on re:dash (data source = tiles, table = assa_masga_daily) but since the `value` field for that event is the string "FXA_METRICS_ERROR" and the `value` column for the re:dash dataset is integer type, I have a feeling that those events are being dropped on the floor during import to redshift/redash. 

If someone knows the s3 bucket for activity stream data I may be able to pull the events w/ spark to know how frequently they are happening.
:cmore looking into it - I'll let you know
Flags: needinfo?(ewright)
Priority: -- → P3
(In reply to Erica Wright [:ewright] from comment #4)
> :cmore looking into it - I'll let you know

Thanks, Erica. Let us know roughly the level of complexity of this fix.
Flags: needinfo?(ewright)
We spent some time on our side trying to figure out how the email form could be submit with default values. It seems like the two most likely scenarios are:

1. The call to /metrics-context has not yet completed when the user submits the form
2. The call to /metrics-context completed and failed.

We are part of the fortunate set of people running fast computers on fast connections. While testing however, I was on the train w/ a terrible connection and I was able to submit the form before the call to /metrics-context completed. I'm not sure if the call to /metrics-context is blocked for a bit for users of slower computers while Firefox is initially loading.

Vlad noticed in at least North America, the request to /metrics-context is done *after* 3 requests to get Pocket tiles. The call to /metrics-context was blocked by the requests to fetch Pocket tiles. Since the FxA screen is displayed as soon as the browser loads, could the call to /metrics-context be done before fetching the Pocket info?
I have only been able to duplicate this on release. I haven't managed on a local build of release, so it's proving very difficult to debug. However, I can add on to what Shane mentioned. When the values are successfully populated there are 4 calls, 3 to pocket for images, then 1 to `metrics-flow`. When it fails to populate the values, it seems it's not a failed fetch, the fetch just doesn't happen at all, there are only 3 requests showing up.
Flags: needinfo?(ewright)
Pasting :vladikoff's comment here from https://github.com/mozilla/fxa-amplitude-send/issues/74#issuecomment-422852943:

> 7.6% of users who loaded about:welcome reported a "FXA_METRICS_ERROR" (from https://github.com/mozilla/activity-stream/blob/847d2292b5aac0a1a35a122c523b2fc46a344dd7/content-src/components/StartupOverlay/StartupOverlay.jsx#L36) in release channel on Sep 13th

That seems pretty high to me, any theories what'd cause that failure?
After chatting with Erica about potential fixes for this we decided to gather more information about these errors.

Our proposal is to add several telemetry probes and uplift them into stable or beta:

- Probe to track a submit with empty params
- Probe to track a submit with good params
- Probe to track a skip button press with empty params
- Probe to track a skip button press with good params

:loines, can you think of anything else that will be useful here? Another related question, is there a way we can track down how Activity Stream is submitting the `screen.enter-email.view` Amplitude event in Firefox 62? (The main flow doesn't send a `form_type` right now, so the event is coming from elsewhere?)
Assignee: nobody → vlad
Flags: needinfo?(loines)
I think those are good events to have. 

In addition, relevant to shane's comment, would it be possible to log the response code as an event? We could log it including cases where its 200, or only in cases when its not 200. Whatever's best from a technical perspective. https://github.com/mozilla/activity-stream/blob/8033ab78b7ec55ca9d1d3a4a3e9ec799183ddc22/content-src/components/StartupOverlay/StartupOverlay.jsx#L31

That way if the fetch fails, we might have a better idea why.

As for your second question, I'm looking into it and will add here if i find anything
Flags: needinfo?(loines)
So, according to re:dash, which I trust more than amplitude, most of mysterious `screen.enter-email.view` form views are actually coming from 63, which I believe included the fix? https://sql.telemetry.mozilla.org/queries/59083/source#152984
:vladikoff asked for a side-by-side comparison of the about:welcome email-first form views for re:dash and amplitude, segmented by version number:

re:dash: https://sql.telemetry.mozilla.org/queries/59083/source#152984
amplitude: https://analytics.amplitude.com/mozilla-corp/chart/fnygbue

As noted above, I believe re:dash has the more accurate numbers, which indicates that the form views are coming from beta 63 and not release 62, which is consistent with the form_type parameter having been added for 63 and not 62. I've done some investigating, but it is still not clear what is causing the version number in amplitude to be inaccurately reported. 

Note also that if we assume re:dash-63 = amplitude-62, then the absolute numbers don't line up exactly. I'm less worried about this because the identifier being counted here (`flow_id` for re:dash, and the amplitude user id for amplitude) isn't *always* the same (though in many cases they will be).
Hey Francois, Firefox Account folks are looking to add some telemetry to the about:welcome page in Activity Stream. Could you take a data review scan, please?

The change is documented at https://github.com/mozilla/activity-stream/pull/4452/files#diff-f9d20d6688eef2754298456e5d5b836b

Thanks!
Attachment #9012624 - Flags: review?(francois)
Comment on attachment 9012624 [details]
data_review_request.txt

1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes, in https://github.com/mozilla/activity-stream/blob/master/docs/v2-system-addon/data_events.md and https://github.com/mozilla/activity-stream/blob/master/docs/v2-system-addon/data_dictionary.md

2) Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, either the telemetry setting or the Activity Stream telemetry setting in about:config.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?**

Yes, Alex Davis (as per IRC conversation with rfkelly).

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?  **

Category 2.

5) Is the data collection request for default-on or default-off?

Default ON, all channels.

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data?

No, permanent.
Attachment #9012624 - Flags: review?(francois) → review+
Nan, you requested release management tracking for the Firefox 63 release, is that because you will need to uplift a patch in beta?
Flags: needinfo?(najiang)
Yes, per :valdikoff, we'd like to get this patch uplifted to 63.

Since this patch is for Activity Stream, I believe we will bundle it with other 63 uplifts in A-S.
Flags: needinfo?(najiang)
Depends on: 1494427
Blocks: 1494427
No longer depends on: 1494427
https://hg.mozilla.org/mozilla-central/rev/410ea4b30b37
Iteration: --- → 64.3 (Oct 12)
Target Milestone: Firefox 62 → Firefox 64
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Attached file Uplift patch
Comment on attachment 9013837 [details]
Uplift patch

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1490826

User impact if declined: We will not be able to measure the error rate and the proper funnel cake conversion of the form. We cannot test this in Nightly because it does not use the `about:welcome` page directly.

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: Yes

If yes, steps to reproduce: * Go to "about:welcome"
* Fill out the Firefox Accounts email form
* You should be redirected to the Firefox Accounts signup / signin form
* Check the telemetry events, it should now have a property `has_flow_params` in the form submit event.

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): It just modifies telemetry probes and adds some new logging. Half of the changes are improving the existing telemetry events. The rest are logging to errors that are thrown as part of the about:welcome form

String changes made/needed: None
Attachment #9013837 - Flags: approval-mozilla-beta?
Comment on attachment 9013837 [details]
Uplift patch

Uplift approved for 63 beta 12, thanks.
Attachment #9013837 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
QA Contact: dolske
QA Contact: dolske
I have verified this bug fix with the latest nightly (Build ID 	20181015100128) and beta 63.0b13 (Build ID 20181008155858)on Windows 10, MacOS 10.14 and Ubuntu 16.04.

Telemetry events have a property `has_flow_params` in the form submit event.

TELEMETRY PING: {"locale":"en-US","topic":"activity-stream","client_id":"efdbdb11-24eb-442a-abc6-7f673474a874","version":"63.0","release_channel":"beta","addon_version":"20181008155858","user_prefs":63,"session_id":"{fde69e24-7dda-437f-b82e-d1d79d5b9702}","page":"about:welcome","event":"SUBMIT_EMAIL","value":{"has_flow_params":true},"action":"activity_stream_user_event","shield_id":"pref-flip-activity-stream-cfr-1491379:Control;","profile_creation_date":17819,"region":"US"}
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.