Closed Bug 1974095 Opened 10 months ago Closed 10 months ago

test_ThirdPartyModulesPing.js perma-orange on (Windows) artifact builds

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
142 Branch
Tracking Status
firefox-esr128 --- unaffected
firefox-esr140 --- unaffected
firefox140 --- unaffected
firefox141 --- fixed
firefox142 --- fixed

People

(Reporter: Gijs, Assigned: chutten)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

See e.g. https://treeherder.mozilla.org/jobs?repo=try&test_paths=toolkit%2Fcomponents%2Ftelemetry%2Ftests%2Funit&revision=ba93c30a707d849d5670e3c0b01628dda431ba88

Probably a regression from bug 1963853 because I didn't see this a week or two ago and that bug is a pretty recent change to the test and (by the sounds of the commit message) the implementation.

:chutten... I hate to ask, but, any idea what's up here?

Flags: needinfo?(chutten)

(The test only runs on Windows, btw, so it's orange everywhere it runs...)

Failure message isn't terribly helpful:

TEST-UNEXPECTED-FAIL | toolkit/components/telemetry/tests/unit/test_ThirdPartyModulesPing.js | xpcshell return code: 0 

structure log link to the later-dumped full log which is maybe more helpful? https://treeherder.mozilla.org/logviewer?job_id=514779775&repo=try&lineNumber=4205

Set release status flags based on info from the regressing bug 1963853

Well that's not good:

[task 2025-06-25T15:39:02.539+00:00] 15:39:02 INFO - Unexpected exception Waiting for Glean ping - timed out after 50 tries.

Lemme give this a go locally and see what I can turn up.

Assignee: nobody → chutten
Severity: -- → S4
Status: NEW → ASSIGNED
Flags: needinfo?(chutten)
Priority: -- → P1
See Also: → 1974314

Oh. Oh no. I've committed one of the cardinal sins of JOG: testing using JS something instrumented in C++.

Potentially-Unnecessary Details

C++ and JS use the same infrastructure for testBeforeNextSubmit and testSubmission -- when a ping is about to be submitted in either language, a central registry is queried to see if there's any test callbacks to run first. This registry is keyed by the ping's id. This is how this test works in full builds despite the callback being registered in JS and the ping being submitted in C++.

When JOG registers new pings to ensure JS has up-to-date definitions to operate on, it supplies those new pings with new, non-overlapping ids. (This is to avoid the same id referencing two different ping instances.) This means that the C++ code is submitting "third-party-modules" with id, I dunno, let's say 35, but our JS test registered its callback on GleanPings.thirdPartyModules only after JOG logs: 0:08.76 pid:34720 2025-06-26 20:41:04.180000 UTC - [Parent 34720: Main Thread]: V/jog Registering ping third-party-modules id 32803.

The ids no longer match, so no registered callback can be triggered.


The solution I've chosen is to skip this test on artifact builds and document in the JOG docs this particular failure case. I've filed bug 1974314 to consider a design for testBeforeNextSubmit/testSubmission that will allow us to unskip it in the future.

Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 142 Branch

The patch landed in nightly and beta is affected.
:chutten, is this bug important enough to require an uplift?

For more information, please visit BugBot documentation.

Flags: needinfo?(chutten)

Sure, seems reasonable.

Flags: needinfo?(chutten)
Attachment #9497627 - Flags: approval-mozilla-beta?

firefox-beta Uplift Approval Request

  • User impact if declined: None
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: n/a
  • Risk associated with taking this patch: low/none
  • Explanation of risk level: test-only change
  • String changes made/needed: none
  • Is Android affected?: no

Chris, could you rebase your patch to the beta branch? It doesn't apply cleanly to the branch. Thanks!

Flags: needinfo?(chutten)

Sure thing, I've updated https://phabricator.services.mozilla.com/D255505 to be based on origin/beta

Flags: needinfo?(chutten)
Attachment #9497627 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

I'm so sorry but... I was on PTO when this landed. Just now I did another artifact trypush off this m-c rev from this morning, and push health shows 6 failures (at time of writing) in this test. The failures look like:

TEST-UNEXPECTED-FAIL | toolkit/components/telemetry/tests/unit/test_ThirdPartyModulesPing.js | xpcshell return code: 0
TEST-UNEXPECTED-FAIL | toolkit/components/telemetry/tests/unit/test_ThirdPartyModulesPing.js | test_new_old_instances - [test_new_old_instances : 300] Missing expected exception New instances should not exist!

That's from this test assertion:

await Assert.rejects(
  Telemetry.getUntrustedModuleLoadEvents(),
  e => e.result == Cr.NS_ERROR_NOT_AVAILABLE,
  "New instances should not exist!"
);

The patch skipped the first task, but not the others. Is this correct or should the whole test be skipped in the manifest?

Flags: needinfo?(chutten)

That test doesn't exercise Glean code and thus should be as valid now as it ever was (aSubmitGleanPing is false and aFlags is non-0)...unless it relied on test_send_ping running first in order to work?

On a cost-benefit, I expect investigating this to cost more than the benefit of running down exactly what's happening in this one test, so I'd favourably regard a patch that disables the file on artifact.

Flags: needinfo?(chutten)
Blocks: 1982870
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: