Intermittent toolkit/components/nimbus/test/browser/browser_experimentapi_child.js | Uncaught exception in test bound testGetExperimentFromChildExistingEnrollment - at chrome://mochitests/content/browser/toolkit/components/nimbus/test/browser/
Categories
(Firefox :: Nimbus Desktop Client, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr128 | --- | unaffected |
firefox136 | --- | unaffected |
firefox137 | --- | unaffected |
firefox138 | --- | fixed |
firefox139 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: beth)
References
(Regression)
Details
(Keywords: intermittent-failure, regression, test-verify-fail)
Attachments
(2 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
Filed by: chorotan [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=498879246&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/DBGAtAZUQAKV0w_t-P4S-w/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/DBGAtAZUQAKV0w_t-P4S-w/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - TEST-PASS | toolkit/components/nimbus/test/browser/browser_experimentapi_child.js | Experiment is null - null == null -
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - Leaving test bound testGetExperimentFromChildNewEnrollment
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - Entering test bound testGetExperimentFromChildExistingEnrollment
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - when calling ExperimentAPI.getExperiment with slug:
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - Buffered messages finished
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - TEST-UNEXPECTED-FAIL | toolkit/components/nimbus/test/browser/browser_experimentapi_child.js | Uncaught exception in test bound testGetExperimentFromChildExistingEnrollment - at chrome://mochitests/content/browser/toolkit/components/nimbus/test/browser/browser_experimentapi_child.js:212 - TypeError: can't access property "slug", rv is null
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - Stack trace:
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - @chrome://mochitests/content/browser/toolkit/components/nimbus/test/browser/browser_experimentapi_child.js:212:20
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - async*execute@resource://testing-common/SpecialPowersSandbox.sys.mjs:139:12
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - _spawnTask@resource://testing-common/SpecialPowersChild.sys.mjs:1626:15
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - receiveMessage@resource://testing-common/SpecialPowersChild.sys.mjs:257:21
[task 2025-03-13T00:48:46.076Z] 00:48:46 INFO - JSActor query*receiveMessage@resource://testing-common/SpecialPowersParent.sys.mjs:1384:14
[task 2025-03-13T00:48:46.077Z] 00:48:46 INFO - JSActor query*spawn@resource://testing-common/SpecialPowersChild.sys.mjs:1555:17
[task 2025-03-13T00:48:46.077Z] 00:48:46 INFO - testGetExperimentFromChildExistingEnrollment@chrome://mochitests/content/browser/toolkit/components/nimbus/test/browser/browser_experimentapi_child.js:192:23
[task 2025-03-13T00:48:46.077Z] 00:48:46 INFO - async*handleTask@chrome://mochikit/content/browser-test.js:1170:26
[task 2025-03-13T00:48:46.077Z] 00:48:46 INFO - _runTaskBasedTest@chrome://mochikit/content/browser-test.js:1242:18
[task 2025-03-13T00:48:46.077Z] 00:48:46 INFO - async*Tester_execTest@chrome://mochikit/content/browser-test.js:1383:14
[task 2025-03-13T00:48:46.077Z] 00:48:46 INFO - nextTest/<@chrome://mochikit/content/browser-test.js:1153:14
[task 2025-03-13T00:48:46.077Z] 00:48:46 INFO - SimpleTest.waitForFocus/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:1058:13
[task 2025-03-13T00:48:46.077Z] 00:48:46 INFO - Leaving test bound testGetExperimentFromChildExistingEnrollment
[task 2025-03-13T00:48:46.077Z] 00:48:46 INFO - GECKO(12708) | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /tmp/tmp7vri4om8.mozrunner/runtests_leaks_tab_pid13166.log
[task 2025-03-13T00:48:46.113Z] 00:48:46 INFO - GECKO(12708) | MEMORY STAT | vsize 11606MB | residentFast 515MB | heapAllocated 252MB
[task 2025-03-13T00:48:46.114Z] 00:48:46 INFO - TEST-OK | toolkit/components/nimbus/test/browser/browser_experimentapi_child.js | took 1874ms
[task 2025-03-13T00:48:46.115Z] 00:48:46 INFO - Not taking screenshot here: see the one that was previously logged
[task 2025-03-13T00:48:46.115Z] 00:48:46 INFO - TEST-UNEXPECTED-FAIL | toolkit/components/nimbus/test/browser/browser_experimentapi_child.js | Found an unexpected tab at the end of test run: https://example.com/ -
[task 2025-03-13T00:48:46.191Z] 00:48:46 INFO - checking window state
[task 2025-03-13T00:48:46.321Z] 00:48:46 INFO - GECKO(12708) | Completed ShutdownLeaks collections in process 12708
[task 2025-03-13T00:48:46.322Z] 00:48:46 INFO - TEST-START | Shutdown
[task 2025-03-13T00:48:46.322Z] 00:48:46 INFO - Browser Chrome Test Summary
[task 2025-03-13T00:48:46.323Z] 00:48:46 INFO - Passed: 31
[task 2025-03-13T00:48:46.323Z] 00:48:46 INFO - Failed: 2
[task 2025-03-13T00:48:46.324Z] 00:48:46 INFO - Todo: 0
[task 2025-03-13T00:48:46.324Z] 00:48:46 INFO - Mode: e10s
[task 2025-03-13T00:48:46.324Z] 00:48:46 INFO - *** End BrowserChrome Test Results ***
[task 2025-03-13T00:48:46.348Z] 00:48:46 INFO - GECKO(12708) | Exiting due to channel error.
[task 2025-03-13T00:48:46.350Z] 00:48:46 INFO - GECKO(12708) | Exiting due to channel error.
[task 2025-03-13T00:48:46.352Z] 00:48:46 INFO - GECKO(12708) | Exiting due to channel error.
[task 2025-03-13T00:48:46.354Z] 00:48:46 INFO - GECKO(12708) | Exiting due to channel error.
[task 2025-03-13T00:48:46.355Z] 00:48:46 INFO - GECKO(12708) | Exiting due to channel error.
[task 2025-03-13T00:48:46.359Z] 00:48:46 INFO - GECKO(12708) | Exiting due to channel error.
[task 2025-03-13T00:48:46.361Z] 00:48:46 INFO - GECKO(12708) | Exiting due to channel error.
[task 2025-03-13T00:48:46.363Z] 00:48:46 INFO - GECKO(12708) | Exiting due to channel error.
[task 2025-03-13T00:48:46.369Z] 00:48:46 INFO - TEST-INFO | Main app process: exit 0
Assignee | ||
Updated•2 months ago
|
Comment 1•2 months ago
|
||
Set release status flags based on info from the regressing bug 1943422
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 4•1 month ago
|
||
See this phab comment:
I think updating the data from the parent to already existing children is just intrinsically racy (and always was) and that is what we see here.
testGetExperimentFromChildExistingEnrollment
relies on ExperimentAPI.ready but if we happen to run in a pre-existing content process where we already did ExperimentAPI.ready()
, this becomes basically a no-op and the data written by the parent must arrive via a "change" notification to the ExperimentStore
in the child, IIUC. And that is racy.
I assume this is not a big deal for normal usage, as the data will eventually arrive, but this test definitely needs some re-thinking.
Assignee | ||
Updated•1 month ago
|
Comment 5•1 month ago
•
|
||
So using forceNewProcess
alone seems to not solve the problem locally to me using
MOZ_PROFILER_STARTUP=1 MOZ_PROFILER_SHUTDOWN=/home/jens/src/mozilla-unified/profile10.dat MOZ_CHAOSMODE=true ./mach mochitest toolkit/components/nimbus/test/browser --repeat 10
on an opt build. But at least from local profiling it definitely seems to ensure, that each execution gets a new process (note that this profile is running the below change without failure).
It appears to me that ExperimentStore.addEnrollment()
does not ensure that _syncToChildren
is called with flush: true
. I ignore the details of the underlying Services.ppmm.sharedData
but it sounds likely that we need to force a flush even before we want to poll the data from the child.
Locally this seems to do the trick on my machine, a try push is coming. There might be better places to enforce the flush, maybe directly inside ExperimentStore.addEnrollment
. Or maybe we actually do not want that flush to happen always but only for testing?
Comment 6•1 month ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #5)
It appears to me that
ExperimentStore.addEnrollment()
does not ensure that_syncToChildren
is called withflush: true
. I ignore the details of the underlyingServices.ppmm.sharedData
but it sounds likely that we need to force a flush even before we want to poll the data from the child.
If I clicked right in searchfox I think everything boils down to this idle dispatch without flushing. I think until the idle event runs, nothing is visible to child processes but everything is buffered in mChangedKeys
.
So I think we should definitely flush in the test. Probably for normal operations, things are just fine.
Assignee | ||
Comment 7•1 month ago
|
||
This test is trying to test the scenarios of (a) syncing a new
enrollment to an existing child process and (b) syncing an existing
enrollment to a new child process, so we need to create new processes in
each case. Otherwise, we can run into a race because
ExperimentAPI.ready()
in a re-used child process will resolve
immediately and we actually want to wait for the initial sync from the
parent to the child.
Additionally, we force the SharedMap to flush to child processes after
we modify our enrollments so that child processes will immedaiately see
the changes instead of waiting for idle dispatch to re-serialize the
contents of the enrollment store.
Comment hidden (Intermittent Failures Robot) |
Comment 10•27 days ago
|
||
bugherder |
Comment 11•26 days ago
|
||
The patch landed in nightly and beta is affected.
:beth, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox138
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 12•26 days ago
|
||
This test is trying to test the scenarios of (a) syncing a new
enrollment to an existing child process and (b) syncing an existing
enrollment to a new child process, so we need to create new processes in
each case. Otherwise, we can run into a race because
ExperimentAPI.ready()
in a re-used child process will resolve
immediately and we actually want to wait for the initial sync from the
parent to the child.
Additionally, we force the SharedMap to flush to child processes after
we modify our enrollments so that child processes will immedaiately see
the changes instead of waiting for idle dispatch to re-serialize the
contents of the enrollment store.
Original Revision: https://phabricator.services.mozilla.com/D243530
Updated•26 days ago
|
Comment 13•26 days ago
|
||
beta Uplift Approval Request
- User impact if declined: None. Test only
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: N/A
- Risk associated with taking this patch: None
- Explanation of risk level: test only
- String changes made/needed: no
- Is Android affected?: no
Updated•26 days ago
|
Comment 14•26 days ago
|
||
uplift |
Updated•26 days ago
|
Assignee | ||
Updated•26 days ago
|
Comment hidden (Intermittent Failures Robot) |
Description
•