Closed Bug 1433015 Opened 7 years ago Closed 7 years ago

Intermittent AllocReplacement.malloc_check,AllocReplacement.calloc_check,AllocReplacement.realloc_check | Value of: (ValidateHookedAllocation([] { return malloc(kAllocAmount); }, free))

Categories

(Core :: XPCOM, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: nika)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:product])

Attachments

(3 files)

Summary: Intermittent AllocReplacement.malloc_check,AllocReplacement.calloc_check | Value of: (ValidateHookedAllocation([] { return malloc(kAllocAmount); }, free)) → Intermittent AllocReplacement.malloc_check,AllocReplacement.calloc_check,AllocReplacement.realloc_check | Value of: (ValidateHookedAllocation([] { return malloc(kAllocAmount); }, free))
This bug has failed 54 times since it was created about 24 hours ago. It fails mostly on Linux and has a few failures on linux 64 and nightly. Fails on opt and debug build types. Here is a log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=158489641&lineNumber=1726 And here is a part of the log: INFO - Running pre-action listener: _resource_record_pre_action [task 2018-01-25T17:26:27.797Z] 17:26:27 INFO - Running main action method: stage_files [task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving /builds/worker/workspace/build/tests/bin/plugins/gmp-fakeopenh264 to /builds/worker/workspace/build/application/firefox [task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving /builds/worker/workspace/build/tests/bin/plugins/gmp-fake to /builds/worker/workspace/build/application/firefox [task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving /builds/worker/workspace/build/tests/bin/plugins/gmp-clearkey to /builds/worker/workspace/build/application/firefox [task 2018-01-25T17:26:27.798Z] 17:26:27 ERROR - shutil error: Destination path '/builds/worker/workspace/build/application/firefox/gmp-clearkey' already exists [task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - Moving /builds/worker/workspace/build/tests/gtest/dependentlibs.list.gtest to /builds/worker/workspace/build/application/firefox [task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - copying tree: /builds/worker/workspace/build/tests/gtest/gtest_bin to /builds/worker/workspace/build/application/firefox [task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - mkdir: /builds/worker/workspace/build/application/firefox/gtest [task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - copying tree: /builds/worker/workspace/build/tests/gtest/gtest_bin/gtest to /builds/worker/workspace/build/application/firefox/gtest [task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - rmtree: /builds/worker/workspace/build/application/firefox/gtest [task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - retry: Calling rmtree with args: ('/builds/worker/workspace/build/application/firefox/gtest',), kwargs: {}, attempt #1 [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - Running post-action listener: _resource_record_post_action [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - [mozharness: 2018-01-25 17:26:27.951175Z] Finished stage-files step (success) [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - [mozharness: 2018-01-25 17:26:27.951371Z] Running run-tests step. [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - Running pre-action listener: _resource_record_pre_action [task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running pre-action listener: _set_gcov_prefix [task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running main action method: run_tests [task 2018-01-25T17:26:27.955Z] 17:26:27 INFO - Running pre test command disable_screen_saver with 'xset s off s reset' [task 2018-01-25T17:26:27.956Z] 17:26:27 INFO - Running command: ('xset', 's', 'off', 's', 'reset') in /builds/worker/workspace/build [task 2018-01-25T17:26:27.956Z] 17:26:27 INFO - Copy/paste: xset s off s reset :froydnj Can you please take a look at this bug?
Flags: needinfo?(nfroyd)
Whiteboard: [stockwell needswork]
We have 5 failure points in `ValidateHookedAllocation` [1], it would be nice to know which one is failing. Also: > [task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running pre-action listener: _set_gcov_prefix Does that mean gcov is running? That's messed with our allocator in other tests.
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #3) > We have 5 failure points in `ValidateHookedAllocation` [1], it would be nice > to know which one is failing. It is almost certainly the one that verifies whether the heap stats changed as a result of our allocation: http://dxr.mozilla.org/mozilla-central/source/xpcom/tests/gtest/TestAllocReplacement.cpp#92 or perhaps the one after we free the allocated memory: http://dxr.mozilla.org/mozilla-central/source/xpcom/tests/gtest/TestAllocReplacement.cpp#98 > Also: > > > [task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running pre-action listener: _set_gcov_prefix > > Does that mean gcov is running? That's messed with our allocator in other > tests. Setup code for gcov is running, but it appears that the function doesn't do anything in non-code coverage builds. http://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/codecoverage.py#81 (In reply to Dorel Luca [:dluca] from comment #2) > This bug has failed 54 times since it was created about 24 hours ago. It > fails mostly on Linux and has a few failures on linux 64 and nightly. Fails > on opt and debug build types. > > Here is a log: > https://treeherder.mozilla.org/logviewer. > html#?repo=autoland&job_id=158489641&lineNumber=1726 > > And here is a part of the log: > INFO - Running pre-action listener: _resource_record_pre_action > [task 2018-01-25T17:26:27.797Z] 17:26:27 INFO - Running main action > method: stage_files > [task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving > /builds/worker/workspace/build/tests/bin/plugins/gmp-fakeopenh264 to > /builds/worker/workspace/build/application/firefox > [task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving > /builds/worker/workspace/build/tests/bin/plugins/gmp-fake to > /builds/worker/workspace/build/application/firefox > [task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving > /builds/worker/workspace/build/tests/bin/plugins/gmp-clearkey to > /builds/worker/workspace/build/application/firefox > [task 2018-01-25T17:26:27.798Z] 17:26:27 ERROR - shutil error: > Destination path > '/builds/worker/workspace/build/application/firefox/gmp-clearkey' already > exists > [task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - Moving > /builds/worker/workspace/build/tests/gtest/dependentlibs.list.gtest to > /builds/worker/workspace/build/application/firefox > [task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - copying tree: > /builds/worker/workspace/build/tests/gtest/gtest_bin to > /builds/worker/workspace/build/application/firefox > [task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - mkdir: > /builds/worker/workspace/build/application/firefox/gtest > [task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - copying tree: > /builds/worker/workspace/build/tests/gtest/gtest_bin/gtest to > /builds/worker/workspace/build/application/firefox/gtest > [task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - rmtree: > /builds/worker/workspace/build/application/firefox/gtest > [task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - retry: Calling rmtree > with args: ('/builds/worker/workspace/build/application/firefox/gtest',), > kwargs: {}, attempt #1 > [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - Running post-action > listener: _resource_record_post_action > [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - [mozharness: 2018-01-25 > 17:26:27.951175Z] Finished stage-files step (success) > [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - [mozharness: 2018-01-25 > 17:26:27.951371Z] Running run-tests step. > [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - Running pre-action > listener: _resource_record_pre_action > [task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running pre-action > listener: _set_gcov_prefix > [task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running main action > method: run_tests > [task 2018-01-25T17:26:27.955Z] 17:26:27 INFO - Running pre test command > disable_screen_saver with 'xset s off s reset' > [task 2018-01-25T17:26:27.956Z] 17:26:27 INFO - Running command: > ('xset', 's', 'off', 's', 'reset') in /builds/worker/workspace/build > [task 2018-01-25T17:26:27.956Z] 17:26:27 INFO - Copy/paste: xset s off s > reset For future reference, this log is not helpful, because it doesn't reference the error messages from the failing test, but rather informative logging about generic setup code which happens before the tests even begin executing. For the particular log linked, you want to click on the first "TEST-UNEXPECTED_FAIL" line in the upper-right window of Treeherder's log viewer to locate the logging for the failing test(s). I skimmed through the pushlog for the push in comment 0 when things started failing: https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=32b850fa28ae1c29039cb7ddcdfd71b324762c05 on the assumption that somebody added gtests that started threads or something similar, because we say that might cause problems with the test: http://dxr.mozilla.org/mozilla-central/source/xpcom/tests/gtest/TestAllocReplacement.cpp#30 If that assumption is valid, then this commit from bug 1293362: https://hg.mozilla.org/mozilla-central/rev/129011c68478 is problematic, because the IO service starts the DNS lookup service, which I imagine starts threads, which might be contributing to the intermittency of the test. So, possible fix: grab some other service, like the observer service, and use that to test instead. mystor, do you want to handle that?
Flags: needinfo?(nfroyd) → needinfo?(nika)
Sure thing :-)
oops - looks like I messed up when refactoring >.> MozReview-Commit-ID: 3f1wsWgQfpG
Attachment #8946441 - Flags: review?(nfroyd)
MozReview-Commit-ID: JSnMIvu97aT
Attachment #8946442 - Flags: review?(nfroyd)
Flags: needinfo?(nika)
Comment on attachment 8946441 [details] [diff] [review] Part 1: Fix a typo in xpcom_macros Review of attachment 8946441 [details] [diff] [review]: ----------------------------------------------------------------- Doh.
Attachment #8946441 - Flags: review?(nfroyd) → review+
Comment on attachment 8946442 [details] [diff] [review] Part 2: Change the xpcom test to use the observer service Review of attachment 8946442 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/rust/gtest/xpcom/test.rs @@ +46,5 @@ > + assert!(rv.succeeded()); > + > + let rv = obssvc.NotifyObservers(ptr::null(), topic.as_ptr(), ptr::null()); > + assert!(rv.succeeded()); > + assert!(run, "The observer should have been run!"); Yup, still terrifying.
Attachment #8946442 - Flags: review?(nfroyd) → review+
Pushed by nika@thelayzells.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/316ac856dc89 Part 1: Fix a typo in xpcom_macros, r=froydnj https://hg.mozilla.org/integration/mozilla-inbound/rev/ba613dacbfa6 Part 2: Change the xpcom test to use the observer service, r=froydnj
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
Assignee: nobody → nika
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'm not sure what would've caused these problems again :-/, any ideas?
Flags: needinfo?(nfroyd)
(In reply to Nika Layzell [:mystor] from comment #17) > I'm not sure what would've caused these problems again :-/, any ideas? I don't know offhand. ttaubert was saying in #developers that some new security/certverifier tests might be causing problems. He was going to poke at them...ttaubert? Maybe we should hack the gtest runner so that the allocation tests always run first and are therefore free and clear of any threads started by other tests...
Flags: needinfo?(nfroyd) → needinfo?(ttaubert)
(In reply to Nathan Froyd [:froydnj] from comment #19) > (In reply to Nika Layzell [:mystor] from comment #17) > > I'm not sure what would've caused these problems again :-/, any ideas? > > I don't know offhand. ttaubert was saying in #developers that some new > security/certverifier tests might be causing problems. He was going to poke > at them...ttaubert? Yeah, that was bug 1434300, which has since been backed out. > Maybe we should hack the gtest runner so that the allocation tests always > run first and are therefore free and clear of any threads started by other > tests... I tried that by setting gtest_filter to AllocReplacement.*, call RUN_ALL_TESTS(), set the filter to -AllocReplacement.* and call RUN_ALL_TESTS() again. That works but I found a thread where the gtest devs say that this basically isn't supported and might break. They want to be able to run tests in any order.
Flags: needinfo?(ttaubert)
(In reply to Nathan Froyd [:froydnj] from comment #21) > One hacky possibility is to write the AllocReplacement tests as death tests: That seems like a great idea to me, we're doing the same for the storage init test [1]. I can still reproduce the intermittent issues with the test in bug 1440029, but I can't when AllocReplacment tests are turned into death tests. [1] https://searchfox.org/mozilla-central/source/storage/test/gtest/test_service_init_background_thread.cpp
Comment on attachment 8952992 [details] [diff] [review] 0001-Bug-1433015-Turn-AllocReplacement-gtests-into-death-.patch Review of attachment 8952992 [details] [diff] [review]: ----------------------------------------------------------------- I leave the wording of the comment up to you, but I think it'd be good to have *something* for this. ::: xpcom/tests/gtest/TestAllocReplacement.cpp @@ +105,5 @@ > > return before == after; > } > > +TEST(AllocReplacementDeathTest, malloc_check) It's not super-obvious to a reader why these things would be named "XDeathTest", I think, even with some familiarity with GTest. Can you write a comment explaining what's going on here? In particular, that these tests get executed prior to all tests, so we don't have to worry about threads spawned from other tests interfering with these? Pointing to AdvancedGuide.md might not be a bad idea.
Attachment #8952992 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #25) > It's not super-obvious to a reader why these things would be named > "XDeathTest", I think, even with some familiarity with GTest. Can you write > a comment explaining what's going on here? In particular, that these tests > get executed prior to all tests, so we don't have to worry about threads > spawned from other tests interfering with these? Pointing to > AdvancedGuide.md might not be a bad idea. Will do. Thanks!
Pushed by ttaubert@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b1420da362d9 Turn AllocReplacement gtests into death tests r=froydnj
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: