Closed Bug 1904726 Opened 1 year ago Closed 1 year ago

Frequent win asan gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code 1 after AddressSanitizer: out of memory:

Categories

(Core :: Audio/Video, defect, P2)

defect

Tracking

()

RESOLVED FIXED
136 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- wontfix
firefox134 --- wontfix
firefox135 --- wontfix
firefox136 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: truber)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disable-recommended])

Attachments

(1 file)

Filed by: nfay [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=464000603&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/A7D29CRkTJSZMGk-L4QrKQ/runs/0/artifacts/public/logs/live_backing.log


[task 2024-06-25T22:21:34.833Z] 22:21:34     INFO -  TEST-START | GeckoProfiler.FeatureCombinations
[task 2024-06-25T22:24:09.908Z] 22:24:09     INFO -  =================================================================
[task 2024-06-25T22:24:09.910Z] 22:24:09    ERROR -  ==2512==ERROR: AddressSanitizer: out of memory: allocator is trying to allocate 0x200000 bytes
[task 2024-06-25T22:24:09.912Z] 22:24:09     INFO -  ==2512==FATAL: AddressSanitizer: internal allocator is out of memory trying to allocate 0x70 bytes
[task 2024-06-25T22:24:11.868Z] 22:24:11     INFO -  gtest INFO | gtest | process wait complete, returncode=1
[task 2024-06-25T22:24:11.871Z] 22:24:11     INFO -  mozcrash checking D:\task_171935297474541\build\tests\gtest for minidumps...
[task 2024-06-25T22:24:11.880Z] 22:24:11  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code 1
[task 2024-06-25T22:24:11.881Z] 22:24:11     INFO -  gtest INFO | rungtests.py exits with code 1
[task 2024-06-25T22:24:11.961Z] 22:24:11     INFO - Return code: 1
[task 2024-06-25T22:24:11.972Z] 22:24:11    ERROR - No tests run or test summary not found
[task 2024-06-25T22:24:11.973Z] 22:24:11     INFO - TinderboxPrint: gtest-gtest<br/><em class="testfail">T-FAIL</em>
[task 2024-06-25T22:24:11.975Z] 22:24:11  WARNING - setting return code to 2
See Also: → 1923579
Summary: Intermittent gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code 1 → Intermittent win asan gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code 1 after AddressSanitizer: out of memory:
Summary: Intermittent win asan gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code 1 after AddressSanitizer: out of memory: → Frequent win asan gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code 1 after AddressSanitizer: out of memory:

Is there a larger instance type we can use for these Windows ASAN GTest jobs? Seems like throwing more RAM at this problem may be our best option.

Flags: needinfo?(mcornmesser)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #27)

Is there a larger instance type we can use for these Windows ASAN GTest jobs? Seems like throwing more RAM at this problem may be our best option.

For sure. We would have to create an additional pool for it. Which is simple, but so some in-tree items will need to be added.

Do we have an idea of what kind of resources will be needed?

No clue, sorry. I don't even know how much RAM the current instances have :. I guess I'd naively say to "just" double whatever the current amount is, but that's not based on any hard data whatsoever.

We have provisioned gecko-t/win11-64-24h2-large which is based on Standard_E8ads_v5 which has 64GB of ram.

Flags: needinfo?(mcornmesser)

:jmaher will work on this when/as we move to 24H2 (currently blocked on bug 1926680)

Flags: needinfo?(jmaher)
Duplicate of this bug: 1839260

switching to -large workers with 24H2 in bug 1940980.

Flags: needinfo?(jmaher)

The test that we hit this most commonly on is ImageDecoders.CorruptAVIFSingleChunk, it has a chunk in the file that reports that it is 4 GB long. The test expects the file to be reported as corrupt. We try to allocate 4 GB memory to hold the data from the chunk. The test expects this to fail to allocate and that error to propagate and to report the image as corrupt. What happens instead is that the program is terminated due to oom:

==6084==ERROR: AddressSanitizer: out of memory: allocator is trying to allocate 0xfffb0012 bytes

ASAN gives us a helpful hint though:
==6084==HINT: if you don't care about these errors you may set allocator_may_return_null=1

allocator_may_return_null seems like it should return null for these fallible allocations, which is what we want for this test. I see that allocator_may_return_null is indeed set here:

https://searchfox.org/mozilla-central/rev/3076c9156ef84aae253ffdc1d391e0bfab2c406b/mozglue/build/AsanOptions.cpp#78

So that we shouldn't need to use VMs with a large amount of memory for this to work. So I'm not sure what is going wrong. There is another definition of __asan_default_options in the tree

https://searchfox.org/mozilla-central/rev/3076c9156ef84aae253ffdc1d391e0bfab2c406b/third_party/chromium/build/sanitizers/sanitizer_options.cc#74

that doesn't set allocator_may_return_null.

decoder, you originally set this, do you know what might be going wrong?

Flags: needinfo?(choller)

:truber, could you check if other __asan_default_options is ever used here? I wasn't aware we had another definition that might be potentially interfering.

Flags: needinfo?(choller) → needinfo?(jschwartzentruber)

I don't understand the build system there, but looking at the fuzzing-asan builds, I don't see sanitizer_options.cc used for either Linux or Windows.

However, AsanOptions.cpp isn't used on Windows either. Overriding weak definitions seem to be supported on Windows now, but only in the main executable, not DLLs [1].

  1. https://reviews.llvm.org/D28596 (last paragraph of c0)
See Also: → 1036235
Assignee: nobody → jschwartzentruber
Status: NEW → ASSIGNED
Flags: needinfo?(jschwartzentruber)
Duplicate of this bug: 1923579

The severity field for this bug is set to S4. However, the following bug duplicate has higher severity:

:truber, could you consider increasing the severity of this bug to S3?

For more information, please visit BugBot documentation.

Flags: needinfo?(jschwartzentruber)
Attachment #9461162 - Attachment description: Bug 1904726 - Enable sanitizer default options for Windows r?#firefox-build-system-reviewers! → Bug 1904726 - Use a common library for sanitizer default options r?#firefox-build-system-reviewers!
Severity: S4 → S3
Flags: needinfo?(jschwartzentruber)
Priority: P5 → P2
Pushed by jdschwa@gmail.com: https://hg.mozilla.org/integration/autoland/rev/1111b4b228a2 Use a common library for sanitizer default options r=firefox-build-system-reviewers,decoder,glandium
Blocks: 1943634

Backed out for causing linux cppunit perma failures

Backout link: https://hg.mozilla.org/integration/autoland/rev/8eb4ee83e729cc252dec03c431592e3b70001680

Push with failures

Failure log -> AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete)

TEST-START | TestUniquePtr
[task 2025-01-24T18:01:17.606Z] 18:01:17     INFO -  PID 1274 |
[task 2025-01-24T18:01:17.607Z] 18:01:17     INFO -  =================================================================
[task 2025-01-24T18:01:17.607Z] 18:01:17    ERROR -  ==1274==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x50f000000040
[task 2025-01-24T18:01:17.607Z] 18:01:17     INFO -      #0 0x557b3b35566d in operator delete(void*) /builds/worker/fetches/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:143:3
[task 2025-01-24T18:01:17.608Z] 18:01:17     INFO -      #1 0x557b3b357663 in DeleteIntFunction /builds/worker/checkouts/gecko/mfbt/tests/TestUniquePtr.cpp:325:3
[task 2025-01-24T18:01:17.608Z] 18:01:17     INFO -      #2 0x557b3b357663 in reset /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:302:7
[task 2025-01-24T18:01:17.608Z] 18:01:17     INFO -      #3 0x557b3b357663 in TestFunctionReferenceDeleter /builds/worker/checkouts/gecko/mfbt/tests/TestUniquePtr.cpp:354:8
[task 2025-01-24T18:01:17.608Z] 18:01:17     INFO -      #4 0x557b3b357663 in main /builds/worker/checkouts/gecko/mfbt/tests/TestUniquePtr.cpp:587:8
[task 2025-01-24T18:01:17.608Z] 18:01:17     INFO -      #5 0x7fe68b1d7b96 in __libc_start_main /tmp/glibc/csu/../csu/libc-start.c:310
[task 2025-01-24T18:01:17.609Z] 18:01:17     INFO -      #6 0x557b3b277598 in _start (/builds/worker/workspace/build/tests/cppunittest/TestUniquePtr+0xc7598) (BuildId: 00c433fc9c7ed39469f39dd5c79ca32d9f0d4d92)
[task 2025-01-24T18:01:17.609Z] 18:01:17     INFO -  0x50f000000040 is located 0 bytes inside of 168-byte region [0x50f000000040,0x50f0000000e8)
[task 2025-01-24T18:01:17.609Z] 18:01:17     INFO -  allocated by thread T0 here:
[task 2025-01-24T18:01:17.609Z] 18:01:17     INFO -      #0 0x557b3b354f1d in operator new[](unsigned long) /builds/worker/fetches/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:89:3
[task 2025-01-24T18:01:17.609Z] 18:01:17     INFO -      #1 0x557b3b357616 in TestFunctionReferenceDeleter /builds/worker/checkouts/gecko/mfbt/tests/TestUniquePtr.cpp:351:39
[task 2025-01-24T18:01:17.609Z] 18:01:17     INFO -      #2 0x557b3b357616 in main /builds/worker/checkouts/gecko/mfbt/tests/TestUniquePtr.cpp:587:8
[task 2025-01-24T18:01:17.609Z] 18:01:17     INFO -      #3 0x7fe68b1d7b96 in __libc_start_main /tmp/glibc/csu/../csu/libc-start.c:310
[task 2025-01-24T18:01:17.610Z] 18:01:17     INFO -  SUMMARY: AddressSanitizer: alloc-dealloc-mismatch /builds/worker/checkouts/gecko/mfbt/tests/TestUniquePtr.cpp:325:3 in DeleteIntFunction
[task 2025-01-24T18:01:17.610Z] 18:01:17     INFO -  ==1274==HINT: if you don't care about these errors you may set ASAN_OPTIONS=alloc_dealloc_mismatch=0
[task 2025-01-24T18:01:17.610Z] 18:01:17     INFO -  ==1274==ABORTING
[task 2025-01-24T18:01:17.610Z] 18:01:17     INFO -  mozcrash checking /tmp/tmpg8dj1sp9 for minidumps...
[task 2025-01-24T18:01:17.610Z] 18:01:17  WARNING -  TEST-UNEXPECTED-FAIL | TestUniquePtr | test failed with return code 1
[task 2025-01-24T18:01:17.610Z] 18:01:17     INFO -  TEST-INFO took 40ms
[task 2025-01-24T18:01:17.610Z] 18:01:17     INFO -  TEST-START | TestUtf8
Flags: needinfo?(jschwartzentruber)

Thanks. It looks like there's a bunch of tests that pull in mozglue too.

Flags: needinfo?(jschwartzentruber)
Pushed by jdschwa@gmail.com: https://hg.mozilla.org/integration/autoland/rev/b4f173257836 Use a common library for sanitizer default options r=firefox-build-system-reviewers,decoder,glandium
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 136 Branch

I looked at failures classified as this bug recently, most of them do not look like this bug. So I'd encourage a new bug to be filed and classify failures in this job (gtest asan win) to that new bug where investigation can happen because it looks like we did indeed fix the issue that was causing almost all the failures here. The new failures were either masked by this issue, or are new because the asan options we wanted actually got applied and used which surfaced new failures.

as of Friday Jan 24th later in the day, these tests started running on -large instances and updated version of windows. I suspect the failure rate will drop significantly.

I don't think we should need to increase the instance size. The oom was as a result of a configuration error where large allocations were supposed to return null, but instead they crashed the program. But the code change hopefully fixed that.

I am on PTO all week, so if you want to reduce the size, please fix issues and remove https://searchfox.org/mozilla-central/source/taskcluster/kinds/test/compiled.yml#55

FYI, I did try to decrease the size of the machines, but it does seem that there are some other gtests that do allocate a lot of memory that do not intend to OOM like TestIDNA.BenchUrlPunycodeMixed which allocates 50000 uris

https://searchfox.org/mozilla-central/rev/7b3f3fb5fd2cad8f348131498a35a91bef68b47b/netwerk/test/gtest/TestIDNA.cpp#68

So both the patch to fix the asan behaviour and the worker size increase were needed here.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: