Closed Bug 1638856 Opened 2 years ago Closed 1 year ago

Very frequent AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1775f)

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox-esr68 79+ fixed
firefox-esr78 79+ fixed
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- fixed
firefox80 --- fixed

People

(Reporter: malexandru, Assigned: gsvelto)

Details

(Keywords: intermittent-failure, sec-other, Whiteboard: [stockwell fixed][post-critsmash-triage][adv-main79+r])

Attachments

(1 file)

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302756913&repo=autoland&lineNumber=2579

Raw log: https://firefoxci.taskcluster-artifacts.net/FpcEOMZ7TyKWcU8AdrTzLA/0/public/logs/live_backing.log

[task 2020-05-18T14:10:15.595Z] 14:10:15     INFO -  TEST-START | RecursiveMutex.SmokeTest
[task 2020-05-18T14:10:15.595Z] 14:10:15     INFO -  TEST-PASS | RecursiveMutex.SmokeTest | test completed (time: 0ms)
[task 2020-05-18T14:10:15.595Z] 14:10:15     INFO -  TEST-START | STLWrapper.ShouldAbortDeathTest
[task 2020-05-18T14:10:16.112Z] 14:10:16  WARNING -  TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | Death test: ShouldAbort()
[task 2020-05-18T14:10:16.113Z] 14:10:16     INFO -      Result: died but not with expected error.
[task 2020-05-18T14:10:16.114Z] 14:10:16     INFO -    Expected: terminate called after throwing an instance of 'std::out_of_range'|vector::_M_range_check
[task 2020-05-18T14:10:16.115Z] 14:10:16     INFO -  Actual msg:
[task 2020-05-18T14:10:16.116Z] 14:10:16     INFO -  [  DEATH   ] AddressSanitizer:DEADLYSIGNAL
[task 2020-05-18T14:10:16.117Z] 14:10:16     INFO -  [  DEATH   ] =================================================================
[task 2020-05-18T14:10:16.118Z] 14:10:16    ERROR -  [  DEATH   ] ==1295==ERROR: AddressSanitizer: stack-overflow on address 0x7f8b49c50f40 (pc 0x7f8bf303875f bp 0x7f8b49c51fb0 sp 0x7f8b49c50f40 T0)
[task 2020-05-18T14:10:16.119Z] 14:10:16     INFO -  [  DEATH   ]     #0 0x7f8bf303875f  (/lib64/ld-linux-x86-64.so.2+0x1775f)
[task 2020-05-18T14:10:16.120Z] 14:10:16     INFO -  [  DEATH   ]     #1 0x7f8bd0798fe5 in testing::internal::ExecDeathTestChildMain(void*) /builds/worker/checkouts/gecko/testing/gtest/gtest/src/gtest-death-test.cc:1194:3
[task 2020-05-18T14:10:16.121Z] 14:10:16     INFO -  [  DEATH   ]     #2 0x7f8bf1de788e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[task 2020-05-18T14:10:16.121Z] 14:10:16     INFO -  [  DEATH   ]
[task 2020-05-18T14:10:16.122Z] 14:10:16     INFO -  [  DEATH   ] SUMMARY: AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1775f)
[task 2020-05-18T14:10:16.123Z] 14:10:16     INFO -  [  DEATH   ] ==1295==ABORTING
[task 2020-05-18T14:10:16.124Z] 14:10:16     INFO -  [  DEATH   ]  @ /builds/worker/checkouts/gecko/xpcom/tests/gtest/TestSTLWrappers.cpp:73
[task 2020-05-18T14:10:16.124Z] 14:10:16  WARNING -  TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | test completed (time: 517ms)
[task 2020-05-18T14:10:16.125Z] 14:10:16     INFO -  TEST-START | TestSlicedInputStream.Simple
[task 2020-05-18T14:10:16.125Z] 14:10:16     INFO -  TEST-PASS | TestSlicedInputStream.Simple | test completed (time: 0ms)
[task 2020-05-18T14:10:16.125Z] 14:10:16     INFO -  TEST-START | TestSlicedInputStream.Sliced
[task 2020-05-18T14:10:16.125Z] 14:10:16     INFO -  TEST-PASS | TestSlicedInputStream.Sliced | test completed (time: 0ms)
<...>
[task 2020-05-18T14:21:23.902Z] 14:21:23     INFO -  TEST-PASS | SrtpParamInit/TransportSrtpParameterTest.TestSrtpCiphersMismatchCombinations/1 | test completed (time: 231ms)
[task 2020-05-18T14:21:23.902Z] 14:21:23     INFO -  TEST-START | SrtpParamInit/TransportSrtpParameterTest.TestSrtpCiphersMismatchCombinations/2
[task 2020-05-18T14:21:23.917Z] 14:21:23     INFO -  Checking cipher: 1
[task 2020-05-18T14:21:24.134Z] 14:21:24     INFO -  TEST-PASS | SrtpParamInit/TransportSrtpParameterTest.TestSrtpCiphersMismatchCombinations/2 | test completed (time: 225ms)
[task 2020-05-18T14:21:24.134Z] 14:21:24  WARNING -  TEST-UNEXPECTED-FAIL | GTest unit test: failed
[task 2020-05-18T14:21:24.134Z] 14:21:24     INFO -  Passed: 4727
[task 2020-05-18T14:21:24.134Z] 14:21:24  WARNING -  Failed: 1
[task 2020-05-18T14:21:24.134Z] 14:21:24  WARNING -  One or more unittests failed.
[task 2020-05-18T14:21:24.210Z] 14:21:24     INFO -  Finished running GTest tests.
[task 2020-05-18T14:21:24.613Z] 14:21:24     INFO -  gtest INFO | gtest | process wait complete, returncode=1
[task 2020-05-18T14:21:24.613Z] 14:21:24     INFO -  mozcrash checking /builds/worker/workspace/build/tests/gtest for minidumps...
[task 2020-05-18T14:21:24.614Z] 14:21:24  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code 1
[task 2020-05-18T14:21:24.614Z] 14:21:24     INFO -  gtest INFO | rungtests.py exits with code 1
[task 2020-05-18T14:21:24.669Z] 14:21:24    ERROR - Return code: 1
[task 2020-05-18T14:21:24.669Z] 14:21:24     INFO - TinderboxPrint: gtest-gtest<br/>4727/<em class="testfail">1</em>
[task 2020-05-18T14:21:24.669Z] 14:21:24    ERROR - # TBPL FAILURE #
[task 2020-05-18T14:21:24.669Z] 14:21:24  WARNING - setting return code to 2
[task 2020-05-18T14:21:24.669Z] 14:21:24    ERROR - The gtest suite: gtest ran with return status: FAILURE
[task 2020-05-18T14:21:24.669Z] 14:21:24     INFO - Running post-action listener: _package_coverage_data
[task 2020-05-18T14:21:24.669Z] 14:21:24     INFO - Running post-action listener: _resource_record_post_action
[task 2020-05-18T14:21:24.669Z] 14:21:24     INFO - Running post-action listener: process_java_coverage_data
[task 2020-05-18T14:21:24.670Z] 14:21:24     INFO - [mozharness: 2020-05-18 14:21:24.669922Z] Finished run-tests step (success)
[task 2020-05-18T14:21:24.670Z] 14:21:24     INFO - Running post-run listener: _resource_record_post_run
[task 2020-05-18T14:21:24.737Z] 14:21:24     INFO - Validating Perfherder data against /builds/worker/workspace/mozharness/external_tools/performance-artifact-schema.json
[task 2020-05-18T14:21:24.741Z] 14:21:24     INFO - PERFHERDER_DATA: {"framework": {"name": "job_resource_usage"}, "suites": [{"subtests": [{"name": "cpu_percent", "value": 18.061689526184537}, {"name": "io_write_bytes", "value": 5615345664}, {"name": "io.read_bytes", "value": 393216}, {"name": "io_write_time", "value": 2132268}, {"name": "io_read_time", "value": 24}], "extraOptions": ["taskcluster-m5d.xlarge"], "name": "gtest.gtest.overall"}, {"subtests": [{"name": "time", "value": 0.017851829528808594}], "name": "gtest.gtest.start-pulseaudio"}, {"subtests": [{"name": "time", "value": 43.653860092163086}, {"name": "cpu_percent", "value": 25.326785714285716}], "name": "gtest.gtest.install"}, {"subtests": [{"name": "time", "value": 0.9636027812957764}], "name": "gtest.gtest.stage-files"}, {"subtests": [{"name": "time", "value": 758.2599918842316}, {"name": "cpu_percent", "value": 17.626254953764857}], "name": "gtest.gtest.run-tests"}]}
[task 2020-05-18T14:21:24.741Z] 14:21:24     INFO - Total resource usage - Wall time: 803s; CPU: 18.0%; Read bytes: 393216; Write bytes: 5615345664; Read time: 24; Write time: 2132268
[task 2020-05-18T14:21:24.741Z] 14:21:24     INFO - TinderboxPrint: CPU usage<br/>17.6%
[task 2020-05-18T14:21:24.741Z] 14:21:24     INFO - TinderboxPrint: I/O read bytes / time<br/>393,216 / 24
[task 2020-05-18T14:21:24.741Z] 14:21:24     INFO - TinderboxPrint: I/O write bytes / time<br/>5,615,345,664 / 2,132,268
[task 2020-05-18T14:21:24.741Z] 14:21:24     INFO - TinderboxPrint: CPU idle<br/>2,602.7 (81.6%)
[task 2020-05-18T14:21:24.741Z] 14:21:24     INFO - TinderboxPrint: CPU system<br/>164.2 (5.2%)
[task 2020-05-18T14:21:24.741Z] 14:21:24     INFO - TinderboxPrint: CPU user<br/>403.6 (12.7%)
[task 2020-05-18T14:21:24.741Z] 14:21:24     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2020-05-18T14:21:24.742Z] 14:21:24     INFO - start-pulseaudio - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2020-05-18T14:21:24.743Z] 14:21:24     INFO - install - Wall time: 44s; CPU: 25.0%; Read bytes: 0; Write bytes: 2148077568; Read time: 0; Write time: 858024
[task 2020-05-18T14:21:24.743Z] 14:21:24     INFO - stage-files - Wall time: 1s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2020-05-18T14:21:24.749Z] 14:21:24     INFO - run-tests - Wall time: 758s; CPU: 18.0%; Read bytes: 393216; Write bytes: 3465871360; Read time: 24; Write time: 1274244
[task 2020-05-18T14:21:24.891Z] 14:21:24  WARNING - returning nonzero exit status 2
Summary: Intermittent AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1775f) → Very frequent AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1775f)
Component: GTest → XPCOM
Product: Testing → Core

This also hits GTest on mozilla-beta.

Nathan, is this an in-tree regression or could this be from a change outside?

Flags: needinfo?(nfroyd)

I don't see anything obvious in TestSTLWrappers.cpp that would have changed and I don't think we updated our in-tree copy of gtests? (testing/gtest/gtest/include/gtest/gtest.h was last modified last year, so unless we're picking up a different header...)

I see this test is playing games with the crashreporter and I know Gabriele has been doing some refactorings in this area...Gabriele, could the crashreporter changes have triggered this?

Flags: needinfo?(nfroyd) → needinfo?(gsvelto)

Seems to be hitting esr68 as well.

Keywords: sec-other

There are 80 total failures in the last 7 days on linux1804-64-asan opt

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=304349311&repo=autoland&lineNumber=2594

[task 2020-05-30T00:08:39.045Z] 00:08:39 INFO - TEST-START | STLWrapper.ShouldAbortDeathTest
[task 2020-05-30T00:08:39.548Z] 00:08:39 WARNING - TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | Death test: ShouldAbort()
[task 2020-05-30T00:08:39.548Z] 00:08:39 INFO - Result: died but not with expected error.
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - Expected: terminate called after throwing an instance of 'std::out_of_range'|vector::_M_range_check
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - Actual msg:
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - [ DEATH ] AddressSanitizer:DEADLYSIGNAL
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - [ DEATH ] =================================================================
[task 2020-05-30T00:08:39.549Z] 00:08:39 ERROR - [ DEATH ] ==1305==ERROR: AddressSanitizer: stack-overflow on address 0x7f8c6b6c8f80 (pc 0x7f8d1499a75f bp 0x7f8c6b6c9fb0 sp 0x7f8c6b6c8f80 T0)
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - [ DEATH ] #0 0x7f8d1499a75e (/lib64/ld-linux-x86-64.so.2+0x1775e)
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - [ DEATH ] #1 0x7f8cf20413a9 in testing::internal::ExecDeathTestChildMain(void*) /builds/worker/checkouts/gecko/testing/gtest/gtest/src/gtest-death-test.cc:1194:3
[task 2020-05-30T00:08:39.550Z] 00:08:39 INFO - [ DEATH ] #2 0x7f8d1374988e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[task 2020-05-30T00:08:39.551Z] 00:08:39 INFO - [ DEATH ]
[task 2020-05-30T00:08:39.553Z] 00:08:39 INFO - [ DEATH ] SUMMARY: AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1775e)
[task 2020-05-30T00:08:39.554Z] 00:08:39 INFO - [ DEATH ] ==1305==ABORTING
[task 2020-05-30T00:08:39.554Z] 00:08:39 INFO - [ DEATH ] @ /builds/worker/checkouts/gecko/xpcom/tests/gtest/TestSTLWrappers.cpp:73
[task 2020-05-30T00:08:39.555Z] 00:08:39 WARNING - TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | test completed (time: 501ms)

Gabriele are there updates here?

Flags: needinfo?(gsvelto)
Flags: needinfo?(gsvelto)
Whiteboard: [stockwell needswork:owner]

No, sorry, it's on the list of things to do this week but I haven't had time to work on it yet.

Flags: needinfo?(gsvelto)

(In reply to Gabriele Svelto [:gsvelto] from comment #6)

No, sorry, it's on the list of things to do this week but I haven't had time to work on it yet.

No problem, thank you !

Update: there are 63 total failures in the last 7 days on linux1804-64-asan opt

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305296413&repo=autoland&lineNumber=2662

[task 2020-06-06T00:56:13.048Z] 00:56:13 INFO - TEST-START | STLWrapper.ShouldAbortDeathTest
[task 2020-06-06T00:56:13.551Z] 00:56:13 WARNING - TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | Death test: ShouldAbort()
[task 2020-06-06T00:56:13.551Z] 00:56:13 INFO - Result: died but not with expected error.
[task 2020-06-06T00:56:13.551Z] 00:56:13 INFO - Expected: terminate called after throwing an instance of 'std::out_of_range'|vector::_M_range_check
[task 2020-06-06T00:56:13.551Z] 00:56:13 INFO - Actual msg:
[task 2020-06-06T00:56:13.551Z] 00:56:13 INFO - [ DEATH ] AddressSanitizer:DEADLYSIGNAL
[task 2020-06-06T00:56:13.552Z] 00:56:13 INFO - [ DEATH ] =================================================================
[task 2020-06-06T00:56:13.552Z] 00:56:13 ERROR - [ DEATH ] ==1303==ERROR: AddressSanitizer: stack-overflow on address 0x7ff07887cf80 (pc 0x7ff0a434075f bp 0x7ff07887dfb0 sp 0x7ff07887cf80 T0)
[task 2020-06-06T00:56:13.552Z] 00:56:13 INFO - [ DEATH ] #0 0x7ff0a434075e (/lib64/ld-linux-x86-64.so.2+0x1775e)
[task 2020-06-06T00:56:13.552Z] 00:56:13 INFO - [ DEATH ] #1 0x7ff0818ca609 in testing::internal::ExecDeathTestChildMain(void*) /builds/worker/checkouts/gecko/testing/gtest/gtest/src/gtest-death-test.cc:1194:3
[task 2020-06-06T00:56:13.552Z] 00:56:13 INFO - [ DEATH ] #2 0x7ff0a30ef88e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[task 2020-06-06T00:56:13.553Z] 00:56:13 INFO - [ DEATH ]
[task 2020-06-06T00:56:13.553Z] 00:56:13 INFO - [ DEATH ] SUMMARY: AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1775e)
[task 2020-06-06T00:56:13.553Z] 00:56:13 INFO - [ DEATH ] ==1303==ABORTING
[task 2020-06-06T00:56:13.553Z] 00:56:13 INFO - [ DEATH ] @ /builds/worker/checkouts/gecko/xpcom/tests/gtest/TestSTLWrappers.cpp:73
[task 2020-06-06T00:56:13.553Z] 00:56:13 WARNING - TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | test completed (time: 501ms)

Can you take another look, please?

Flags: needinfo?(gsvelto)

I'm on it right now, leaving the NI?

I believe that the stack overflow might be caused by the string concatenation for the first parameter here. I have to think of a way to rewrite the test w/o relying on that but it won't be pretty.

I've done a few more tests and if the lines in the error log are correct I was wrong and the test is crashing in execve() which I find odd. I'll inspect glibc in the hope of figuring out what's going on here.

As I suspected execve() is just a shim that's doing no stack allocations at all, which leaves me quite confused. Either the line address in the log is wrong or something else is going on. If there's a spike of failures here it's entirely due to me re-triggering tasks to trigger them on try since I can't reproduce the issue locally.

I went for the crudest possible fix: doubling the stack size passed to the clone() call. It appears to work. I'll post the patch for review later today.

Flags: needinfo?(gsvelto)
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED

I'm not sure how we accommodate changes to the gtests, I hope this is the appropriate way.

:ryanvm - You updated gtest, back in bug 1534605. Any advice for modifying gtest source? Should we update mozilla-changes.diff to include this change at this time?

Flags: needinfo?(ryanvm)

Looks like that same change was already made upstream to fix what looks like the exact same issue:
https://github.com/google/googletest/commit/681454dae48f109abf68c424c9d2e6db9a092238

However, since the current-latest release (v.1.10.0) doesn't include that change, I would include it in mozilla-changes.diff for now to avoid any problems if we wanted to update from v1.8.1 to v1.10.0 in the future. We can always drop it later when we update to a newer googletest release.

Flags: needinfo?(ryanvm)

Thanks, I've updated the patch so that the changes are also included in mozilla-changes.diff

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80
Flags: qe-verify-
Whiteboard: [stockwell needswork:owner] → [stockwell needswork:owner][post-critsmash-triage]
Whiteboard: [stockwell needswork:owner][post-critsmash-triage] → [stockwell fixed][post-critsmash-triage]
Whiteboard: [stockwell fixed][post-critsmash-triage] → [stockwell fixed][post-critsmash-triage][adv-main79+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.