Closed Bug 1739463 Opened 3 years ago Closed 3 years ago

Permanent Android services/settings/test/unit/test_remote_settings_utils_telemetry.js | test_telemetry - [test_telemetry : 143] parent must be in snapshot. Has []. - false == true when Firefox 95 merges to Release on 2021-11-29

Categories

(Firefox :: Remote Settings Client, defect)

defect

Tracking

()

VERIFIED FIXED
96 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox93 --- unaffected
firefox94 --- unaffected
firefox95 + verified
firefox96 --- fixed

People

(Reporter: aryx, Assigned: robwu)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

[task 2021-11-04T15:41:55.627Z] 15:41:55  WARNING -  TEST-UNEXPECTED-FAIL | services/settings/test/unit/test_remote_settings_utils_telemetry.js | test_telemetry - [test_telemetry : 143] parent must be in snapshot. Has []. - false == true
Flags: needinfo?(rob)

Test fails at https://searchfox.org/mozilla-central/rev/88cd13997fb0747cdcd78638fc762ff2d75e1fc5/services/settings/test/unit/test_remote_settings_utils_telemetry.js#99-110
The telemetry is expected to be reported via the implementation at https://searchfox.org/mozilla-central/rev/a12c2c2e59c92d8f969d8f3f290ab16919449c9d/services/settings/Utils.jsm#126
That part is expected to be reached, because there is another test that exercises that code path, which succeeds at https://searchfox.org/mozilla-central/rev/88cd13997fb0747cdcd78638fc762ff2d75e1fc5/services/settings/test/unit/test_remote_settings_utils.js#127-135

The test does something unusual: it simulates the registration of "builtin" telemetry events at https://searchfox.org/mozilla-central/rev/88cd13997fb0747cdcd78638fc762ff2d75e1fc5/services/settings/test/unit/test_remote_settings_utils_telemetry.js#43-69, because the actual telemetry events are expected to be registered as part of a different patch in bug 1733481.

Since this works on Nightly and Beta, I guess that there is something special preventing this from working on release. Possibly an issue with the test itself, or an issue with the telemetry implementation. Either way, it would probably be good to fix that issue, and also re-enable the unit tests for Telemetry.registerBuiltinEvents that are currently disabled on Android, at https://searchfox.org/mozilla-central/rev/a12c2c2e59c92d8f969d8f3f290ab16919449c9d/toolkit/components/telemetry/tests/unit/xpcshell.ini#85-86,89-90

:chutten, do you know why this telemetry test is failing on release on Android, and what I should do to fix this?

INFO - services/settings/test/unit/test_remote_settings_utils_telemetry.js | Starting setup
INFO - (xpcshell/head.js) | test setup pending (2)
INFO - (xpcshell/head.js) | test run_next_test 0 finished (2)
INFO - (xpcshell/head.js) | test run_next_test 1 pending (2)
INFO - (xpcshell/head.js) | test setup finished (2)
INFO - services/settings/test/unit/test_remote_settings_utils_telemetry.js | Starting test_telemetry
INFO - (xpcshell/head.js) | test test_telemetry pending (2)
INFO - (xpcshell/head.js) | test run_next_test 1 finished (2)
INFO - TEST-PASS | services/settings/test/unit/test_remote_settings_utils_telemetry.js | test_telemetry - [test_telemetry : 77] fetch without proxy succeeded - 412 == 412
INFO - TEST-PASS | services/settings/test/unit/test_remote_settings_utils_telemetry.js | test_telemetry - [test_telemetry : 88] fetch with broken proxy succeeded - 412 == 412
INFO - TEST-PASS | services/settings/test/unit/test_remote_settings_utils_telemetry.js | test_telemetry - [test_telemetry : 97] fetch succeeded with bypassProxy feature - 412 == 412
WARNING - TEST-UNEXPECTED-FAIL | services/settings/test/unit/test_remote_settings_utils_telemetry.js | test_telemetry - [test_telemetry : 143] parent must be in snapshot. Has []. - false == true
INFO - resource://testing-common/TelemetryTestUtils.jsm:assertEvents:143
INFO - test_remote_settings_utils_telemetry.js:assertTelemetryEvents:32
INFO - test_remote_settings_utils_telemetry.js:test_telemetry:99
INFO - /data/local/tmp/test_root/xpc/head.js:_do_main:240
INFO - /data/local/tmp/test_root/xpc/head.js:_execute_test:603
INFO - -e:null:1
INFO - exiting test
INFO - Unexpected exception NS_ERROR_ABORT:
INFO - _abort_failed_test@/data/local/tmp/test_root/xpc/head.js:874:20
INFO - do_report_result@/data/local/tmp/test_root/xpc/head.js:975:5
INFO - Assert<@/data/local/tmp/test_root/xpc/head.js:75:21
INFO - proto.report@resource://testing-common/Assert.jsm:228:10
INFO - proto.ok@resource://testing-common/Assert.jsm:254:10
INFO - assertEvents@resource://testing-common/TelemetryTestUtils.jsm:143:12
INFO - assertTelemetryEvents@test_remote_settings_utils_telemetry.js:32:28
INFO - test_telemetry@test_remote_settings_utils_telemetry.js:99:9
INFO - _do_main@/data/local/tmp/test_root/xpc/head.js:240:6
INFO - _execute_test@/data/local/tmp/test_root/xpc/head.js:603:5
INFO - @-e:1:1
INFO - exiting test

Flags: needinfo?(rob) → needinfo?(chutten)

registerBuiltinEvents is

only meant to be used to support the "artifact build"/
"build faster" developers by allowing to add new events without rebuilding
the C++ components including the headers files.

so isn't expected to be used outside of making it so that developers can test instrumentation without a compile. I'm a little surprised it works outside of developer builds (ie on Nightly and Beta). To fake the Events.yaml entry for your test, I'd recommend adding an Events.yaml entry for your test.

Also, Firefox Telemetry as a whole doesn't work in Android post-Fennec: Fenix doesn't do anything with Telemetry data unless it is one of a few special scalars or histograms. So Android-only test failures aren't high on our priority list unless they're to do with Scalars or Histograms in Gecko.

But to your specific request, I don't see any mechanism by which RegisterBuiltinEvents would fail to work on release. Usually, in cases where there's release-specific behaviour in Telemetry, there'd be a check against canRecordExtended/canRecordPrereleaseData. But even were there something like that, in Telemetry non-data APIs are allowed (and encouraged) to log and throw on failures. (data APIs need to NOOP on failures so we don't break things)

Which means RegisterBuiltinEvents thinks it succeeded. Which means it's the data API call that's failing. You can check the Telemetry histogram TELEMETRY_EVENT_RECORDING_ERROR to see what error we're getting from this. And you can check the event summary keyed scalars to see how often they were called.

And a random thought, perhaps the snapshot's in the dynamic "process" instead of the parent one, for some reason. It shouldn't be. That's the whole difference between registerEvents and registerBuiltinEvents after all. But we're already two degrees out in the wilds here, so it might be worth rechecking some assumptions.

Flags: needinfo?(chutten)

To fake the Events.yaml entry for your test, I'd recommend adding an Events.yaml entry for your test.

That's part of bug 1733481. The purpose of the fake registration and stub implementation is to get the test to pass until that bug lands.

Also, Firefox Telemetry as a whole doesn't work in Android post-Fennec: Fenix doesn't do anything with Telemetry data unless it is one of a few special scalars or histograms.

At the bottom of that doc, there is the statement that events are not supported.
Does that mean that it is not possible at all to report event telemetry at all?
(Looks like I asked the same question without answer before, when you posted a detailed reply back then to my questions about scalars at https://bugzilla.mozilla.org/show_bug.cgi?id=1712990#c5)

So Android-only test failures aren't high on our priority list unless they're to do with Scalars or Histograms in Gecko.

I'm inclined to just disable the test on Android, especially since the test is not that important right now.
For the future I'd like to know the best practices to "report event telemetry" for gecko features that ought to be platform-independent.

Assignee: nobody → rob
Status: NEW → ASSIGNED

(In reply to Rob Wu [:robwu] from comment #3)

At the bottom of that doc, there is the statement that events are not supported.
Does that mean that it is not possible at all to report event telemetry at all?
(Looks like I asked the same question without answer before, when you posted a detailed reply back then to my questions about scalars at https://bugzilla.mozilla.org/show_bug.cgi?id=1712990#c5)

Ack, I'm sorry I missed that. That particular subsystem will not be augmented to support Telemetry Events, no.

For the future I'd like to know the best practices to "report event telemetry" for gecko features that ought to be platform-independent.

That'd be Glean.

In fact, we have an adapter to mirror Glean event records to Telemetry events called the Glean Interface For Firefox Telemetry (GIFFT) which might be helpful depending on your use case. It allows you to record in a Glean way (reported by Firefox Desktop and Fenix) and in Firefox Telemetry (reported by Firefox Desktop) in case your analyses are dependent on information only in Telemetry for now.

The reason you haven't been hearing about this is because all my announcements of the New Way To Do Things have been contingent on some higher-level communications which have, uh, been taking a while. That's why I still haven't felt right removing the notice at the top of this doc... but take this as a Telemetry Module Peer giving you permission.

So, if you'd like, you can follow the docs and add a metrics.yaml in services/settings that has something like this in it

service_request:
  bypass_proxy:
    type: event
    description: | 
      A verbose markdown description.
    bugs:
      - each bug url
    data_reviews:
      - each data review url
    notification_emails:
      - yours
    expires: "whatever version number that was, or never"
    extra_keys:
      source:
        description: Some sort of source, isn't it?
        type: string
      type:
        description: Some sort of type?
        type: string
      service:
        description: Oh, you're using 'value' to shuttle a service too
        type: string
    telemetry_mirror: ServiceRequest_Bypass_ProxyInfo

And then from JS you can import the Glean global and call Glean.serviceRequest.bypassProxy.record({source: 'a source', type: 'a type', service: 'a service'); and it'll record to both

(( Actually, it'll probably error since service isn't an extra_key in the Telemetry Event (which also doesn't exist), but we could comment out the telemetry_mirror until that part lands if you'd like ))

(( It'll require a full build since it depends on C++ having the updated mapping from Glean event to Telemetry event ))

Testing's a sight easier, too. Glean.serviceRequest.bypassProxy.testGetValue() will get you an array of all the times you called record to it since the last time it was sent in a ping. No need to filter full event snapshots any more (you can see test_fog_event_works in test_Glean.js for how it looks.).

Come find me on #telemetry:mozilla.org or Slack#data-help if you'd like some synchronous help. Always happy to be of assistance.

Pushed by rob@robwu.nl:
https://hg.mozilla.org/integration/autoland/rev/bb2eca98ee83
Skip test_remote_settings_utils_telemetry.js on Android release r=mixedpuppy
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch

Verified fixed with yesterday's beta-as-release simulation.

Status: RESOLVED → VERIFIED
Depends on: 1747454
Has Regression Range: --- → yes
See Also: → 1820155
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: