Bug 1714615 Comment 39 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

I went a short way down the rabbit-hole to see what's going on here, after running into this on my own Try run.

It looks like there was a recent regression here (around Feb 1st), which made this go from ~never to pretty-frequent:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2022-01-09&endday=2022-03-16&tree=trunk&bug=1714615

So, something worth looking into while that's in recent memory (before we end up just disabling a bunch of tests, as it seems we've started doing).

Looking at the test logs (with an eye towards "remote settings" which I believe is [one name we have for this server](https://searchfox.org/mozilla-central/rev/efc480f2188fb43c9cccdfd2eef79749a19c20a3/modules/libpref/init/all.js#2198,2201), I noticed that the following log-spam seems to appear shortly before the crash, which seems to indicate that our "RSLoader" update timer seems to have unexpectedly (?) kicked off:
```
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   Registered update timer
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   Updating recipes
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   Got 1 recipes from Remote Settings
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   No targeting for recipe, so it matches automatically
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   [experiment] xman_test_message matched
INFO - GECKO(4553) | console.debug: ExperimentManager:
INFO - GECKO(4553) |   New experiment started: test_xman_cfr_1647379167641, control
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   1 recipes matched. Finalizing ExperimentManager.
INFO - GECKO(4553) | console.debug: ExperimentManager:
INFO - GECKO(4553) |   Recipe unenrolled: test_xman_cfr_1647379167641
```
https://treeherder.mozilla.org/logviewer?job_id=371166379&repo=autoland&lineNumber=4185

It looks like k88hudson added a mechanism that was intended to turn this off for tests in bug 1709360.  Maybe she would be a good person to take a look here, to diagnose why that's (I think?) not working?  (e.g. maybe groups of tests need to opt-in to that mechanism and we have some new tests that need to be opted in?)

Side note, I assume this is TSAN-specific due to TSAN builds just being a bit slower, and the fact that this behavior is apparently triggered off of a timer (which expires in slower TSAN builds but perhaps does not in other builds).
I went a short way down the rabbit-hole to see what's going on here, after running into this on my own Try run.

It looks like there was a recent regression here (around Feb 1st), which made this intermittent failure go from ~never to pretty-frequent:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2022-01-09&endday=2022-03-16&tree=trunk&bug=1714615

So, something worth looking into while that's in recent memory (before we end up just disabling a bunch of tests, as it seems we've started doing).

Looking at the test logs (with an eye towards "remote settings" which I believe is [one name we have for this server](https://searchfox.org/mozilla-central/rev/efc480f2188fb43c9cccdfd2eef79749a19c20a3/modules/libpref/init/all.js#2198,2201), I noticed that the following log-spam seems to appear shortly before the crash, which seems to indicate that our "RSLoader" update timer seems to have unexpectedly (?) kicked off:
```
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   Registered update timer
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   Updating recipes
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   Got 1 recipes from Remote Settings
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   No targeting for recipe, so it matches automatically
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   [experiment] xman_test_message matched
INFO - GECKO(4553) | console.debug: ExperimentManager:
INFO - GECKO(4553) |   New experiment started: test_xman_cfr_1647379167641, control
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   1 recipes matched. Finalizing ExperimentManager.
INFO - GECKO(4553) | console.debug: ExperimentManager:
INFO - GECKO(4553) |   Recipe unenrolled: test_xman_cfr_1647379167641
```
https://treeherder.mozilla.org/logviewer?job_id=371166379&repo=autoland&lineNumber=4185

It looks like k88hudson added a mechanism that was intended to turn this off for tests in bug 1709360.  Maybe she would be a good person to take a look here, to diagnose why that's (I think?) not working?  (e.g. maybe groups of tests need to opt-in to that mechanism and we have some new tests that need to be opted in?)

Side note, I assume this is TSAN-specific due to TSAN builds just being a bit slower, and the fact that this behavior is apparently triggered off of a timer (which expires in slower TSAN builds but perhaps does not in other builds).
I went a short way down the rabbit-hole to see what's going on here, after running into this on my own Try run.

It looks like there was a recent regression here (around Feb 1st), which made this intermittent failure go from ~never to pretty-frequent:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2022-01-09&endday=2022-03-16&tree=trunk&bug=1714615

So, something worth looking into while that's in recent memory (before we end up just disabling a bunch of tests, as it seems we've started doing).

Looking at the test logs (with an eye towards "remote settings" which I believe is [one name we have for this server](https://searchfox.org/mozilla-central/rev/efc480f2188fb43c9cccdfd2eef79749a19c20a3/modules/libpref/init/all.js#2198,2201), I noticed that the following log-spam seems to appear shortly before the crash, which seems to indicate that our "RSLoader" update timer seems to have unexpectedly (?) kicked off:
```
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   Registered update timer
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   Updating recipes
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   Got 1 recipes from Remote Settings
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   No targeting for recipe, so it matches automatically
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   [experiment] xman_test_message matched
INFO - GECKO(4553) | console.debug: ExperimentManager:
INFO - GECKO(4553) |   New experiment started: test_xman_cfr_1647379167641, control
INFO - GECKO(4553) | console.debug: RSLoader:
INFO - GECKO(4553) |   1 recipes matched. Finalizing ExperimentManager.
INFO - GECKO(4553) | console.debug: ExperimentManager:
INFO - GECKO(4553) |   Recipe unenrolled: test_xman_cfr_1647379167641
```
https://treeherder.mozilla.org/logviewer?job_id=371166379&repo=autoland&lineNumber=4185

It looks like k88hudson added a mechanism that was intended to turn this off for tests in bug 1709360.  Maybe she would be a good person to take a look here, to diagnose why that's (I think?) not working?  (e.g. maybe groups of tests need to opt-in to that mechanism and we have some new tests that need to be opted in?)

Side note, I assume this is TSAN-specific due to TSAN builds just being a bit slower, and the fact that this behavior is apparently triggered off of a timer (and the testrun takes long enough for the timer to fire in slower TSAN builds, vs. other builds probably get lucky enough to complete the tests before the timer fires).

Back to Bug 1714615 Comment 39