Closed Bug 1798796 Opened 3 years ago Closed 3 years ago

TestSafeThreadLocal is failing on windows11/asan

Categories

(Firefox :: Launcher Process, defect)

defect

Tracking

()

RESOLVED FIXED
109 Branch
Tracking Status
firefox108 --- wontfix
firefox109 --- fixed

People

(Reporter: jmaher, Assigned: gstoll)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

we currently run tests on windows 10 and want to run on windows 11 next year.

There are 2 tests which fail on try:

  • cppunit: TestSafeThreadLocal.exe
  • gtest: TestDllBlocklist.BlockThreadWithLoadLibraryEntryPoint

both look related, but if they are not, please let me know and I will ask others for advice on the gtest failure.

What is interesting is these ONLY fail in ASAN mode, not opt, debug, etc.

:haik- do you have any ideas on how to get more information here, fix the os, the tools, the test, or the product?

Flags: needinfo?(haftandilian)

@Greg, could you take a look at these Windows 11 test failures? Testing if they reproduce locally with an ASAN build is probably the first step.

Flags: needinfo?(haftandilian) → needinfo?(gstoll)

Yes, I'll take a look.

Flags: needinfo?(gstoll)
Assignee: nobody → gstoll
Status: NEW → ASSIGNED

:jmaher - are these tests running on Windows 11 right now, or does this bug cover the failures on Windows 10? Myself and at least one other person in the #build Element room are unable to build/run ASAN builds on Windows 11.

Flags: needinfo?(jmaher)

oh, thanks for asking, I forgot to add the mach command:
./mach try fuzzy --no-artifact --worker-override="win10-64-2004=gecko-t/win11-64-2009-alpha" -q 'test-windows10 asan cppunit'

Flags: needinfo?(jmaher)

these tests pass on win10, and we are not running win11 yet in CI, I have been working to get the tests running and look for odd errors (only a few) like this one in case there is something we need to change with win11 (like install tools, configure the OS, etc.) or adjust firefox to account for things like new APIs, etc.

Got it. I'm having major problems getting ASAN stuff to run on my machine - is there a VM I could use to reproduce the test failures? Thanks!

Flags: needinfo?(jmaher)

FWIW I can't run an ASAN build of Firefox on my Windows 11 machine because of this bug I filed to LLVM.

it sounds like you are making some progress, please ping me on #matrix or #slack if the tests are not running for you.

Flags: needinfo?(jmaher)

Hi :jmaher - is there a trick to getting the gtests to run? I've tried the command line you pasted above as well as

./mach try fuzzy --no-artifact --worker-override="win10-64-2004=gecko-t/win11-64-2009-alpha" -q 'test-windows10 asan cppunit | gunit'

but I get results like this try build and it doesn't seem like they're running...

Flags: needinfo?(jmaher)

ok, you might need to manually select the jobs, or add a few extra and have the 1proc:
./mach try fuzzy --no-artifact --worker-override="win10-64-2004=gecko-t/win11-64-2009-alpha" -q 'test-windows10 asan-qr -1proc !mochitest !fun'

I did "add new job" for you on that try push, it adds a task. If you click at the top of the push in treeherder on the little down arrow, it will give you options. When selecting add new jobs, it presents ALL the jobs. you have to just click on what you want, then at the top again click "Trigger X New Jobs"

Flags: needinfo?(jmaher)

Oh, cool, thanks!

Looking at TestDllBlocklist.BlockThreadWithLoadLibraryEntryPoint, the output shows that we are blocking the DLL, but the test is failing because the thread exit code is 7 (not 0 as expected).

My guess as to what's happening is that patched_BaseThreadInitThunk() is not getting patched correctly, so we don't redirect the thread to start in our NoOpThreadProc, but since the DLL is on our blocklist we do stop it from loading. (and output the message saying we're blocking it, which I don't think would have happened if we had redirected the thread to NoOpThreadProc) So the functionality this test is testing isn't working right, it's just showing up in kind of a weird way.

Anyway, I think it would be OK to turn off the check for the thread exit code in ASAN builds; there's some precedent because some other blocklisting tests behave differently under ASAN.

sounds reasonable. I think when you add ASAN to the mix things behave differently- odd that it is only on win11, but all the other opt/debug 64/32 permutations work as expected.

do you want to write a patch to ignore thread exit code for this test on ASAN only? I can give it a go and r?

Ah, I think TestSafeThreadLocal is failing for the same reason. (weird thread exit code)

Yeah, that sounds great. I'll submit a patch and kicked off a try build on Windows 11 that will hopefully show these tests passing.

Thanks!

Pushed by gstoll@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/55e4e1a5ead0 ignore thread exit codes in ASAN builds for a few tests r=jmaher
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch

The patch landed in nightly and beta is affected.
:gstoll, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox108 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(gstoll)
Flags: needinfo?(gstoll)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: