Open Bug 1921543 Opened 1 month ago Updated 10 days ago

Spurious failure in jsreftests with MOZ_LOG set ("REFTEST ERROR | Got suite_end message before suite_start.")

Categories

(Testing :: General, defect, P2)

defect

Tracking

(Not tracked)

People

(Reporter: sfink, Unassigned)

Details

I enabled some logging in my try push of jsreftests, and the jobs all started failing with REFTEST ERROR | Got suite_end message before suite_start.

I think the problem is:

something (my guess would be mozharness) is logging a line like

INFO - {"action":"suite_start","time":1727375910032,"thread":null,"pid":null,"source":"reftest","tests":{"js/src/tests/non262/Array/jstests.list":["js/src/tests/non262/Array/11.1.4.js","js/src/tests/non262/Array/15.4.4.5-1.js","js/src/tests/non262/Array/15.4.4.5-2.js",...}}

that is enormous (one chunk of jsreftests has 16536 test paths in it). If a MOZ_LOG message is written while it's still coming out, they will get interleaved, and whatever reads the output will then eventually see the suite_end JSON blob. There is now no valid suite_start blob to pair it up with.

I was running mach jstestbrowser to try to reproduce it locally, but that does not produce the structured suite_start lines, it seems?

The severity field is not set for this bug.
:jmaher, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(jmaher)

so the suite_start and related messages are specific to mozharness, in general we like to match up suite_start/suite_end with test_start/test_end taking place between those. It helps determine which manifest we are testing (if there is a leak on shutdown, etc.)

for jstest, this is sort of a edge case (as are a few other suites/tasks) in that we don't run by manifest, instead run everything at once.

These messages are typically figured out within the harness, not necessarily on log parsing- but with that said, the harness is in python and the tests are in javascript running in node/xpcshell/browser, so we are building the structured logging via log parsing.

for jstest (reftest harness), we have:
https://searchfox.org/mozilla-central/source/layout/tools/reftest/runreftest.py#1114

but the Jittest tasks run:
https://searchfox.org/mozilla-central/source/js/src/jit-test/jit_test.py

which doesn't appear to have as much of a formal logging.

What type of messages were you printing out? which specific tasks on central or try were failing?

Flags: needinfo?(jmaher)

The severity field is not set for this bug.
:jmaher, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(jmaher)
Severity: -- → S2
Flags: needinfo?(jmaher)
Priority: -- → P2

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #2)

What type of messages were you printing out? which specific tasks on central or try were failing?

Example push.

The tasks were all jsreftest tasks, on try.

The messages were MOZ_LOG messages for garbage collection scheduling events from a Web Worker. I don't know if the test harness actually uses Workers enough to trigger a GC there, or I was displaying messages that showed up when just spinning any Worker up. At any rate, the messages look like they came out while the suite_start message was still being generated.

Here is a much-truncated excerpt from the log file:

[task 2024-09-26T17:40:48.400Z] 17:40:48     INFO - REFTEST INFO | Running chunk 1 out of 3 chunks.  tests 1-16536/52124
[task 2024-09-26T17:40:49.490Z] 17:40:49     INFO - {"action":"suite_start","time":1727372448411,"thread":null,"pid":null,"source":"reftest","tests":{"js/src/tests/non262/Array/jstests.list":["js/src/tests/non262/Array/11.1.4.js","js/src/tests/non262/Array/15.4.4.5-1.js",...<396968 bytes elided>..."js/src/tests/test262/built-ins/Array/prototype/forEach/15.4.4.18-3-13.js","js/src/tests/test262/built-ins/Array/prototype/[Parent 2931: DOM Worker]: D/WorkerGC Worker 753d724a7800 idle GC timer: bumping to future
[task 2024-09-26T17:40:49.491Z] 17:40:49     INFO - [Parent 2931: DOM Worker]: D/WorkerGC Worker 753d724a7800 started idle GC timer: restarting idle
[task 2024-09-26T17:40:51.380Z] 17:40:51     INFO - forEach/15.4.4.18-3-14.js","js/src/tests/test262/built-ins/Array/prototype/forEach/15.4.4.18-3-15.js",<...873702 bytes elided...>,"js/src/tests/test262/built-ins/RegExp/CharacterClassEscapes/character-class-non-whitespace-class-escape-flags-u.js"]},"name":"jstestbrowser","runinfo":{"skipped":1800}}
[task 2024-09-26T17:40:51.385Z] 17:40:51     INFO - REFTEST TEST-START | js/src/tests/non262/Array/11.1.4.js
[task 2024-09-26T17:40:51.385Z] 17:40:51     INFO - REFTEST TEST-LOAD | file:///task_172737000940119/build/tests/jsreftest/tests/js/src/tests/jsreftest.html?test=non262/Array/11.1.4.js | 0 / 16536 (0%)

(notice the ...<396968 bytes elided>... and <...873702 bytes elided...> markers that I added.)

So the suite_start JSON gets interrupted with [Parent 2931: DOM Worker]: D/WorkerGC Worker 753d..., which is one of my added MOZ_LOG messages. It has a newline, and a 2nd D/WorkerGC log message comes right after it, then the JSON continues.

You need to log in before you can comment on or make changes to this bug.