Very frequent AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1775f)
Categories
(Core :: XPCOM, defect)
Tracking
()
People
(Reporter: malexandru, Assigned: gsvelto)
Details
(Keywords: intermittent-failure, sec-other, Whiteboard: [stockwell fixed][post-critsmash-triage][adv-main79+r])
Attachments
(1 file)
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302756913&repo=autoland&lineNumber=2579
Raw log: https://firefoxci.taskcluster-artifacts.net/FpcEOMZ7TyKWcU8AdrTzLA/0/public/logs/live_backing.log
[task 2020-05-18T14:10:15.595Z] 14:10:15 INFO - TEST-START | RecursiveMutex.SmokeTest
[task 2020-05-18T14:10:15.595Z] 14:10:15 INFO - TEST-PASS | RecursiveMutex.SmokeTest | test completed (time: 0ms)
[task 2020-05-18T14:10:15.595Z] 14:10:15 INFO - TEST-START | STLWrapper.ShouldAbortDeathTest
[task 2020-05-18T14:10:16.112Z] 14:10:16 WARNING - TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | Death test: ShouldAbort()
[task 2020-05-18T14:10:16.113Z] 14:10:16 INFO - Result: died but not with expected error.
[task 2020-05-18T14:10:16.114Z] 14:10:16 INFO - Expected: terminate called after throwing an instance of 'std::out_of_range'|vector::_M_range_check
[task 2020-05-18T14:10:16.115Z] 14:10:16 INFO - Actual msg:
[task 2020-05-18T14:10:16.116Z] 14:10:16 INFO - [ DEATH ] AddressSanitizer:DEADLYSIGNAL
[task 2020-05-18T14:10:16.117Z] 14:10:16 INFO - [ DEATH ] =================================================================
[task 2020-05-18T14:10:16.118Z] 14:10:16 ERROR - [ DEATH ] ==1295==ERROR: AddressSanitizer: stack-overflow on address 0x7f8b49c50f40 (pc 0x7f8bf303875f bp 0x7f8b49c51fb0 sp 0x7f8b49c50f40 T0)
[task 2020-05-18T14:10:16.119Z] 14:10:16 INFO - [ DEATH ] #0 0x7f8bf303875f (/lib64/ld-linux-x86-64.so.2+0x1775f)
[task 2020-05-18T14:10:16.120Z] 14:10:16 INFO - [ DEATH ] #1 0x7f8bd0798fe5 in testing::internal::ExecDeathTestChildMain(void*) /builds/worker/checkouts/gecko/testing/gtest/gtest/src/gtest-death-test.cc:1194:3
[task 2020-05-18T14:10:16.121Z] 14:10:16 INFO - [ DEATH ] #2 0x7f8bf1de788e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[task 2020-05-18T14:10:16.121Z] 14:10:16 INFO - [ DEATH ]
[task 2020-05-18T14:10:16.122Z] 14:10:16 INFO - [ DEATH ] SUMMARY: AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1775f)
[task 2020-05-18T14:10:16.123Z] 14:10:16 INFO - [ DEATH ] ==1295==ABORTING
[task 2020-05-18T14:10:16.124Z] 14:10:16 INFO - [ DEATH ] @ /builds/worker/checkouts/gecko/xpcom/tests/gtest/TestSTLWrappers.cpp:73
[task 2020-05-18T14:10:16.124Z] 14:10:16 WARNING - TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | test completed (time: 517ms)
[task 2020-05-18T14:10:16.125Z] 14:10:16 INFO - TEST-START | TestSlicedInputStream.Simple
[task 2020-05-18T14:10:16.125Z] 14:10:16 INFO - TEST-PASS | TestSlicedInputStream.Simple | test completed (time: 0ms)
[task 2020-05-18T14:10:16.125Z] 14:10:16 INFO - TEST-START | TestSlicedInputStream.Sliced
[task 2020-05-18T14:10:16.125Z] 14:10:16 INFO - TEST-PASS | TestSlicedInputStream.Sliced | test completed (time: 0ms)
<...>
[task 2020-05-18T14:21:23.902Z] 14:21:23 INFO - TEST-PASS | SrtpParamInit/TransportSrtpParameterTest.TestSrtpCiphersMismatchCombinations/1 | test completed (time: 231ms)
[task 2020-05-18T14:21:23.902Z] 14:21:23 INFO - TEST-START | SrtpParamInit/TransportSrtpParameterTest.TestSrtpCiphersMismatchCombinations/2
[task 2020-05-18T14:21:23.917Z] 14:21:23 INFO - Checking cipher: 1
[task 2020-05-18T14:21:24.134Z] 14:21:24 INFO - TEST-PASS | SrtpParamInit/TransportSrtpParameterTest.TestSrtpCiphersMismatchCombinations/2 | test completed (time: 225ms)
[task 2020-05-18T14:21:24.134Z] 14:21:24 WARNING - TEST-UNEXPECTED-FAIL | GTest unit test: failed
[task 2020-05-18T14:21:24.134Z] 14:21:24 INFO - Passed: 4727
[task 2020-05-18T14:21:24.134Z] 14:21:24 WARNING - Failed: 1
[task 2020-05-18T14:21:24.134Z] 14:21:24 WARNING - One or more unittests failed.
[task 2020-05-18T14:21:24.210Z] 14:21:24 INFO - Finished running GTest tests.
[task 2020-05-18T14:21:24.613Z] 14:21:24 INFO - gtest INFO | gtest | process wait complete, returncode=1
[task 2020-05-18T14:21:24.613Z] 14:21:24 INFO - mozcrash checking /builds/worker/workspace/build/tests/gtest for minidumps...
[task 2020-05-18T14:21:24.614Z] 14:21:24 WARNING - gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code 1
[task 2020-05-18T14:21:24.614Z] 14:21:24 INFO - gtest INFO | rungtests.py exits with code 1
[task 2020-05-18T14:21:24.669Z] 14:21:24 ERROR - Return code: 1
[task 2020-05-18T14:21:24.669Z] 14:21:24 INFO - TinderboxPrint: gtest-gtest<br/>4727/<em class="testfail">1</em>
[task 2020-05-18T14:21:24.669Z] 14:21:24 ERROR - # TBPL FAILURE #
[task 2020-05-18T14:21:24.669Z] 14:21:24 WARNING - setting return code to 2
[task 2020-05-18T14:21:24.669Z] 14:21:24 ERROR - The gtest suite: gtest ran with return status: FAILURE
[task 2020-05-18T14:21:24.669Z] 14:21:24 INFO - Running post-action listener: _package_coverage_data
[task 2020-05-18T14:21:24.669Z] 14:21:24 INFO - Running post-action listener: _resource_record_post_action
[task 2020-05-18T14:21:24.669Z] 14:21:24 INFO - Running post-action listener: process_java_coverage_data
[task 2020-05-18T14:21:24.670Z] 14:21:24 INFO - [mozharness: 2020-05-18 14:21:24.669922Z] Finished run-tests step (success)
[task 2020-05-18T14:21:24.670Z] 14:21:24 INFO - Running post-run listener: _resource_record_post_run
[task 2020-05-18T14:21:24.737Z] 14:21:24 INFO - Validating Perfherder data against /builds/worker/workspace/mozharness/external_tools/performance-artifact-schema.json
[task 2020-05-18T14:21:24.741Z] 14:21:24 INFO - PERFHERDER_DATA: {"framework": {"name": "job_resource_usage"}, "suites": [{"subtests": [{"name": "cpu_percent", "value": 18.061689526184537}, {"name": "io_write_bytes", "value": 5615345664}, {"name": "io.read_bytes", "value": 393216}, {"name": "io_write_time", "value": 2132268}, {"name": "io_read_time", "value": 24}], "extraOptions": ["taskcluster-m5d.xlarge"], "name": "gtest.gtest.overall"}, {"subtests": [{"name": "time", "value": 0.017851829528808594}], "name": "gtest.gtest.start-pulseaudio"}, {"subtests": [{"name": "time", "value": 43.653860092163086}, {"name": "cpu_percent", "value": 25.326785714285716}], "name": "gtest.gtest.install"}, {"subtests": [{"name": "time", "value": 0.9636027812957764}], "name": "gtest.gtest.stage-files"}, {"subtests": [{"name": "time", "value": 758.2599918842316}, {"name": "cpu_percent", "value": 17.626254953764857}], "name": "gtest.gtest.run-tests"}]}
[task 2020-05-18T14:21:24.741Z] 14:21:24 INFO - Total resource usage - Wall time: 803s; CPU: 18.0%; Read bytes: 393216; Write bytes: 5615345664; Read time: 24; Write time: 2132268
[task 2020-05-18T14:21:24.741Z] 14:21:24 INFO - TinderboxPrint: CPU usage<br/>17.6%
[task 2020-05-18T14:21:24.741Z] 14:21:24 INFO - TinderboxPrint: I/O read bytes / time<br/>393,216 / 24
[task 2020-05-18T14:21:24.741Z] 14:21:24 INFO - TinderboxPrint: I/O write bytes / time<br/>5,615,345,664 / 2,132,268
[task 2020-05-18T14:21:24.741Z] 14:21:24 INFO - TinderboxPrint: CPU idle<br/>2,602.7 (81.6%)
[task 2020-05-18T14:21:24.741Z] 14:21:24 INFO - TinderboxPrint: CPU system<br/>164.2 (5.2%)
[task 2020-05-18T14:21:24.741Z] 14:21:24 INFO - TinderboxPrint: CPU user<br/>403.6 (12.7%)
[task 2020-05-18T14:21:24.741Z] 14:21:24 INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2020-05-18T14:21:24.742Z] 14:21:24 INFO - start-pulseaudio - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2020-05-18T14:21:24.743Z] 14:21:24 INFO - install - Wall time: 44s; CPU: 25.0%; Read bytes: 0; Write bytes: 2148077568; Read time: 0; Write time: 858024
[task 2020-05-18T14:21:24.743Z] 14:21:24 INFO - stage-files - Wall time: 1s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2020-05-18T14:21:24.749Z] 14:21:24 INFO - run-tests - Wall time: 758s; CPU: 18.0%; Read bytes: 393216; Write bytes: 3465871360; Read time: 24; Write time: 1274244
[task 2020-05-18T14:21:24.891Z] 14:21:24 WARNING - returning nonzero exit status 2
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Retriggered / backfilled down to https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=linux%2C18.04%2Cx64%2Casan%2Copt%2Ctest-linux1804-64-asan%2Fopt-gtest-1proc%2C%28gtest%29&tochange=dffc3c357cee6d6877754d34e15bdcc608329d0d&fromchange=84d70bc1f5c3aa1cc7cc68edb4924b96338f5552&selectedTaskRun=J8G-sjg6TV68gLH4X27v5A-0
Updated•5 years ago
|
Comment 2•5 years ago
|
||
This also hits GTest on mozilla-beta.
Nathan, is this an in-tree regression or could this be from a change outside?
Comment 3•5 years ago
|
||
I don't see anything obvious in TestSTLWrappers.cpp that would have changed and I don't think we updated our in-tree copy of gtests? (testing/gtest/gtest/include/gtest/gtest.h was last modified last year, so unless we're picking up a different header...)
I see this test is playing games with the crashreporter and I know Gabriele has been doing some refactorings in this area...Gabriele, could the crashreporter changes have triggered this?
Comment 4•5 years ago
|
||
Seems to be hitting esr68 as well.
Updated•5 years ago
|
Comment 5•5 years ago
|
||
There are 80 total failures in the last 7 days on linux1804-64-asan opt
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=304349311&repo=autoland&lineNumber=2594
[task 2020-05-30T00:08:39.045Z] 00:08:39 INFO - TEST-START | STLWrapper.ShouldAbortDeathTest
[task 2020-05-30T00:08:39.548Z] 00:08:39 WARNING - TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | Death test: ShouldAbort()
[task 2020-05-30T00:08:39.548Z] 00:08:39 INFO - Result: died but not with expected error.
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - Expected: terminate called after throwing an instance of 'std::out_of_range'|vector::_M_range_check
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - Actual msg:
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - [ DEATH ] AddressSanitizer:DEADLYSIGNAL
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - [ DEATH ] =================================================================
[task 2020-05-30T00:08:39.549Z] 00:08:39 ERROR - [ DEATH ] ==1305==ERROR: AddressSanitizer: stack-overflow on address 0x7f8c6b6c8f80 (pc 0x7f8d1499a75f bp 0x7f8c6b6c9fb0 sp 0x7f8c6b6c8f80 T0)
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - [ DEATH ] #0 0x7f8d1499a75e (/lib64/ld-linux-x86-64.so.2+0x1775e)
[task 2020-05-30T00:08:39.549Z] 00:08:39 INFO - [ DEATH ] #1 0x7f8cf20413a9 in testing::internal::ExecDeathTestChildMain(void*) /builds/worker/checkouts/gecko/testing/gtest/gtest/src/gtest-death-test.cc:1194:3
[task 2020-05-30T00:08:39.550Z] 00:08:39 INFO - [ DEATH ] #2 0x7f8d1374988e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[task 2020-05-30T00:08:39.551Z] 00:08:39 INFO - [ DEATH ]
[task 2020-05-30T00:08:39.553Z] 00:08:39 INFO - [ DEATH ] SUMMARY: AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1775e)
[task 2020-05-30T00:08:39.554Z] 00:08:39 INFO - [ DEATH ] ==1305==ABORTING
[task 2020-05-30T00:08:39.554Z] 00:08:39 INFO - [ DEATH ] @ /builds/worker/checkouts/gecko/xpcom/tests/gtest/TestSTLWrappers.cpp:73
[task 2020-05-30T00:08:39.555Z] 00:08:39 WARNING - TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | test completed (time: 501ms)
Gabriele are there updates here?
Updated•5 years ago
|
| Assignee | ||
Comment 6•5 years ago
|
||
No, sorry, it's on the list of things to do this week but I haven't had time to work on it yet.
Comment 7•5 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #6)
No, sorry, it's on the list of things to do this week but I haven't had time to work on it yet.
No problem, thank you !
Comment 8•5 years ago
|
||
Update: there are 63 total failures in the last 7 days on linux1804-64-asan opt
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305296413&repo=autoland&lineNumber=2662
[task 2020-06-06T00:56:13.048Z] 00:56:13 INFO - TEST-START | STLWrapper.ShouldAbortDeathTest
[task 2020-06-06T00:56:13.551Z] 00:56:13 WARNING - TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | Death test: ShouldAbort()
[task 2020-06-06T00:56:13.551Z] 00:56:13 INFO - Result: died but not with expected error.
[task 2020-06-06T00:56:13.551Z] 00:56:13 INFO - Expected: terminate called after throwing an instance of 'std::out_of_range'|vector::_M_range_check
[task 2020-06-06T00:56:13.551Z] 00:56:13 INFO - Actual msg:
[task 2020-06-06T00:56:13.551Z] 00:56:13 INFO - [ DEATH ] AddressSanitizer:DEADLYSIGNAL
[task 2020-06-06T00:56:13.552Z] 00:56:13 INFO - [ DEATH ] =================================================================
[task 2020-06-06T00:56:13.552Z] 00:56:13 ERROR - [ DEATH ] ==1303==ERROR: AddressSanitizer: stack-overflow on address 0x7ff07887cf80 (pc 0x7ff0a434075f bp 0x7ff07887dfb0 sp 0x7ff07887cf80 T0)
[task 2020-06-06T00:56:13.552Z] 00:56:13 INFO - [ DEATH ] #0 0x7ff0a434075e (/lib64/ld-linux-x86-64.so.2+0x1775e)
[task 2020-06-06T00:56:13.552Z] 00:56:13 INFO - [ DEATH ] #1 0x7ff0818ca609 in testing::internal::ExecDeathTestChildMain(void*) /builds/worker/checkouts/gecko/testing/gtest/gtest/src/gtest-death-test.cc:1194:3
[task 2020-06-06T00:56:13.552Z] 00:56:13 INFO - [ DEATH ] #2 0x7ff0a30ef88e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[task 2020-06-06T00:56:13.553Z] 00:56:13 INFO - [ DEATH ]
[task 2020-06-06T00:56:13.553Z] 00:56:13 INFO - [ DEATH ] SUMMARY: AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1775e)
[task 2020-06-06T00:56:13.553Z] 00:56:13 INFO - [ DEATH ] ==1303==ABORTING
[task 2020-06-06T00:56:13.553Z] 00:56:13 INFO - [ DEATH ] @ /builds/worker/checkouts/gecko/xpcom/tests/gtest/TestSTLWrappers.cpp:73
[task 2020-06-06T00:56:13.553Z] 00:56:13 WARNING - TEST-UNEXPECTED-FAIL | STLWrapper.ShouldAbortDeathTest | test completed (time: 501ms)
| Assignee | ||
Comment 10•5 years ago
|
||
I'm on it right now, leaving the NI?
| Assignee | ||
Comment 11•5 years ago
|
||
I believe that the stack overflow might be caused by the string concatenation for the first parameter here. I have to think of a way to rewrite the test w/o relying on that but it won't be pretty.
| Assignee | ||
Comment 12•5 years ago
|
||
I've done a few more tests and if the lines in the error log are correct I was wrong and the test is crashing in execve() which I find odd. I'll inspect glibc in the hope of figuring out what's going on here.
| Assignee | ||
Comment 13•5 years ago
|
||
As I suspected execve() is just a shim that's doing no stack allocations at all, which leaves me quite confused. Either the line address in the log is wrong or something else is going on. If there's a spike of failures here it's entirely due to me re-triggering tasks to trigger them on try since I can't reproduce the issue locally.
| Assignee | ||
Comment 14•5 years ago
|
||
I went for the crudest possible fix: doubling the stack size passed to the clone() call. It appears to work. I'll post the patch for review later today.
| Assignee | ||
Comment 15•5 years ago
|
||
Updated•5 years ago
|
| Assignee | ||
Comment 16•5 years ago
|
||
I'm not sure how we accommodate changes to the gtests, I hope this is the appropriate way.
Comment 17•5 years ago
|
||
:ryanvm - You updated gtest, back in bug 1534605. Any advice for modifying gtest source? Should we update mozilla-changes.diff to include this change at this time?
Comment 18•5 years ago
|
||
Looks like that same change was already made upstream to fix what looks like the exact same issue:
https://github.com/google/googletest/commit/681454dae48f109abf68c424c9d2e6db9a092238
However, since the current-latest release (v.1.10.0) doesn't include that change, I would include it in mozilla-changes.diff for now to avoid any problems if we wanted to update from v1.8.1 to v1.10.0 in the future. We can always drop it later when we update to a newer googletest release.
| Assignee | ||
Comment 19•5 years ago
|
||
Thanks, I've updated the patch so that the changes are also included in mozilla-changes.diff
Updated•5 years ago
|
Comment 20•5 years ago
|
||
Comment 21•5 years ago
|
||
Comment 22•5 years ago
|
||
| uplift | ||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•4 years ago
|
Description
•