Closed
Bug 1433015
Opened 7 years ago
Closed 7 years ago
Intermittent AllocReplacement.malloc_check,AllocReplacement.calloc_check,AllocReplacement.realloc_check | Value of: (ValidateHookedAllocation([] { return malloc(kAllocAmount); }, free))
Categories
(Core :: XPCOM, defect, P5)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla60
Tracking | Status | |
---|---|---|
firefox60 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: nika)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:product])
Attachments
(3 files)
1.06 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
4.20 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
2.29 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
Updated•7 years ago
|
Summary: Intermittent AllocReplacement.malloc_check,AllocReplacement.calloc_check | Value of: (ValidateHookedAllocation([] { return malloc(kAllocAmount); }, free)) → Intermittent AllocReplacement.malloc_check,AllocReplacement.calloc_check,AllocReplacement.realloc_check | Value of: (ValidateHookedAllocation([] { return malloc(kAllocAmount); }, free))
Comment hidden (Intermittent Failures Robot) |
Comment 2•7 years ago
|
||
This bug has failed 54 times since it was created about 24 hours ago. It fails mostly on Linux and has a few failures on linux 64 and nightly. Fails on opt and debug build types.
Here is a log:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=158489641&lineNumber=1726
And here is a part of the log:
INFO - Running pre-action listener: _resource_record_pre_action
[task 2018-01-25T17:26:27.797Z] 17:26:27 INFO - Running main action method: stage_files
[task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving /builds/worker/workspace/build/tests/bin/plugins/gmp-fakeopenh264 to /builds/worker/workspace/build/application/firefox
[task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving /builds/worker/workspace/build/tests/bin/plugins/gmp-fake to /builds/worker/workspace/build/application/firefox
[task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving /builds/worker/workspace/build/tests/bin/plugins/gmp-clearkey to /builds/worker/workspace/build/application/firefox
[task 2018-01-25T17:26:27.798Z] 17:26:27 ERROR - shutil error: Destination path '/builds/worker/workspace/build/application/firefox/gmp-clearkey' already exists
[task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - Moving /builds/worker/workspace/build/tests/gtest/dependentlibs.list.gtest to /builds/worker/workspace/build/application/firefox
[task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - copying tree: /builds/worker/workspace/build/tests/gtest/gtest_bin to /builds/worker/workspace/build/application/firefox
[task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - mkdir: /builds/worker/workspace/build/application/firefox/gtest
[task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - copying tree: /builds/worker/workspace/build/tests/gtest/gtest_bin/gtest to /builds/worker/workspace/build/application/firefox/gtest
[task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - rmtree: /builds/worker/workspace/build/application/firefox/gtest
[task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - retry: Calling rmtree with args: ('/builds/worker/workspace/build/application/firefox/gtest',), kwargs: {}, attempt #1
[task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - Running post-action listener: _resource_record_post_action
[task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - [mozharness: 2018-01-25 17:26:27.951175Z] Finished stage-files step (success)
[task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - [mozharness: 2018-01-25 17:26:27.951371Z] Running run-tests step.
[task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - Running pre-action listener: _resource_record_pre_action
[task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running pre-action listener: _set_gcov_prefix
[task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running main action method: run_tests
[task 2018-01-25T17:26:27.955Z] 17:26:27 INFO - Running pre test command disable_screen_saver with 'xset s off s reset'
[task 2018-01-25T17:26:27.956Z] 17:26:27 INFO - Running command: ('xset', 's', 'off', 's', 'reset') in /builds/worker/workspace/build
[task 2018-01-25T17:26:27.956Z] 17:26:27 INFO - Copy/paste: xset s off s reset
:froydnj Can you please take a look at this bug?
Flags: needinfo?(nfroyd)
Whiteboard: [stockwell needswork]
Comment 3•7 years ago
|
||
We have 5 failure points in `ValidateHookedAllocation` [1], it would be nice to know which one is failing.
Also:
> [task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running pre-action listener: _set_gcov_prefix
Does that mean gcov is running? That's messed with our allocator in other tests.
Comment 4•7 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment 6•7 years ago
|
||
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #3)
> We have 5 failure points in `ValidateHookedAllocation` [1], it would be nice
> to know which one is failing.
It is almost certainly the one that verifies whether the heap stats changed as a result of our allocation:
http://dxr.mozilla.org/mozilla-central/source/xpcom/tests/gtest/TestAllocReplacement.cpp#92
or perhaps the one after we free the allocated memory:
http://dxr.mozilla.org/mozilla-central/source/xpcom/tests/gtest/TestAllocReplacement.cpp#98
> Also:
>
> > [task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running pre-action listener: _set_gcov_prefix
>
> Does that mean gcov is running? That's messed with our allocator in other
> tests.
Setup code for gcov is running, but it appears that the function doesn't do anything in non-code coverage builds.
http://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/codecoverage.py#81
(In reply to Dorel Luca [:dluca] from comment #2)
> This bug has failed 54 times since it was created about 24 hours ago. It
> fails mostly on Linux and has a few failures on linux 64 and nightly. Fails
> on opt and debug build types.
>
> Here is a log:
> https://treeherder.mozilla.org/logviewer.
> html#?repo=autoland&job_id=158489641&lineNumber=1726
>
> And here is a part of the log:
> INFO - Running pre-action listener: _resource_record_pre_action
> [task 2018-01-25T17:26:27.797Z] 17:26:27 INFO - Running main action
> method: stage_files
> [task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving
> /builds/worker/workspace/build/tests/bin/plugins/gmp-fakeopenh264 to
> /builds/worker/workspace/build/application/firefox
> [task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving
> /builds/worker/workspace/build/tests/bin/plugins/gmp-fake to
> /builds/worker/workspace/build/application/firefox
> [task 2018-01-25T17:26:27.798Z] 17:26:27 INFO - Moving
> /builds/worker/workspace/build/tests/bin/plugins/gmp-clearkey to
> /builds/worker/workspace/build/application/firefox
> [task 2018-01-25T17:26:27.798Z] 17:26:27 ERROR - shutil error:
> Destination path
> '/builds/worker/workspace/build/application/firefox/gmp-clearkey' already
> exists
> [task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - Moving
> /builds/worker/workspace/build/tests/gtest/dependentlibs.list.gtest to
> /builds/worker/workspace/build/application/firefox
> [task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - copying tree:
> /builds/worker/workspace/build/tests/gtest/gtest_bin to
> /builds/worker/workspace/build/application/firefox
> [task 2018-01-25T17:26:27.799Z] 17:26:27 INFO - mkdir:
> /builds/worker/workspace/build/application/firefox/gtest
> [task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - copying tree:
> /builds/worker/workspace/build/tests/gtest/gtest_bin/gtest to
> /builds/worker/workspace/build/application/firefox/gtest
> [task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - rmtree:
> /builds/worker/workspace/build/application/firefox/gtest
> [task 2018-01-25T17:26:27.800Z] 17:26:27 INFO - retry: Calling rmtree
> with args: ('/builds/worker/workspace/build/application/firefox/gtest',),
> kwargs: {}, attempt #1
> [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - Running post-action
> listener: _resource_record_post_action
> [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - [mozharness: 2018-01-25
> 17:26:27.951175Z] Finished stage-files step (success)
> [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - [mozharness: 2018-01-25
> 17:26:27.951371Z] Running run-tests step.
> [task 2018-01-25T17:26:27.951Z] 17:26:27 INFO - Running pre-action
> listener: _resource_record_pre_action
> [task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running pre-action
> listener: _set_gcov_prefix
> [task 2018-01-25T17:26:27.952Z] 17:26:27 INFO - Running main action
> method: run_tests
> [task 2018-01-25T17:26:27.955Z] 17:26:27 INFO - Running pre test command
> disable_screen_saver with 'xset s off s reset'
> [task 2018-01-25T17:26:27.956Z] 17:26:27 INFO - Running command:
> ('xset', 's', 'off', 's', 'reset') in /builds/worker/workspace/build
> [task 2018-01-25T17:26:27.956Z] 17:26:27 INFO - Copy/paste: xset s off s
> reset
For future reference, this log is not helpful, because it doesn't reference the error messages from the failing test, but rather informative logging about generic setup code which happens before the tests even begin executing. For the particular log linked, you want to click on the first "TEST-UNEXPECTED_FAIL" line in the upper-right window of Treeherder's log viewer to locate the logging for the failing test(s).
I skimmed through the pushlog for the push in comment 0 when things started failing:
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=32b850fa28ae1c29039cb7ddcdfd71b324762c05
on the assumption that somebody added gtests that started threads or something similar, because we say that might cause problems with the test:
http://dxr.mozilla.org/mozilla-central/source/xpcom/tests/gtest/TestAllocReplacement.cpp#30
If that assumption is valid, then this commit from bug 1293362:
https://hg.mozilla.org/mozilla-central/rev/129011c68478
is problematic, because the IO service starts the DNS lookup service, which I imagine starts threads, which might be contributing to the intermittency of the test.
So, possible fix: grab some other service, like the observer service, and use that to test instead. mystor, do you want to handle that?
Flags: needinfo?(nfroyd) → needinfo?(nika)
Assignee | ||
Comment 7•7 years ago
|
||
Sure thing :-)
Assignee | ||
Comment 8•7 years ago
|
||
oops - looks like I messed up when refactoring >.>
MozReview-Commit-ID: 3f1wsWgQfpG
Attachment #8946441 -
Flags: review?(nfroyd)
Assignee | ||
Comment 9•7 years ago
|
||
MozReview-Commit-ID: JSnMIvu97aT
Attachment #8946442 -
Flags: review?(nfroyd)
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(nika)
Comment 10•7 years ago
|
||
Comment on attachment 8946441 [details] [diff] [review]
Part 1: Fix a typo in xpcom_macros
Review of attachment 8946441 [details] [diff] [review]:
-----------------------------------------------------------------
Doh.
Attachment #8946441 -
Flags: review?(nfroyd) → review+
Comment 11•7 years ago
|
||
Comment on attachment 8946442 [details] [diff] [review]
Part 2: Change the xpcom test to use the observer service
Review of attachment 8946442 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/rust/gtest/xpcom/test.rs
@@ +46,5 @@
> + assert!(rv.succeeded());
> +
> + let rv = obssvc.NotifyObservers(ptr::null(), topic.as_ptr(), ptr::null());
> + assert!(rv.succeeded());
> + assert!(run, "The observer should have been run!");
Yup, still terrifying.
Attachment #8946442 -
Flags: review?(nfroyd) → review+
Comment 12•7 years ago
|
||
Pushed by nika@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/316ac856dc89
Part 1: Fix a typo in xpcom_macros, r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/ba613dacbfa6
Part 2: Change the xpcom test to use the observer service, r=froydnj
Comment 13•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/316ac856dc89
https://hg.mozilla.org/mozilla-central/rev/ba613dacbfa6
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox60:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Updated•7 years ago
|
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → nika
Comment hidden (Intermittent Failures Robot) |
Comment 16•7 years ago
|
||
Recent failure log:
https://treeherder.mozilla.org/logviewer.html#?job_id=163143828&repo=autoland&lineNumber=2867
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 17•7 years ago
|
||
I'm not sure what would've caused these problems again :-/, any ideas?
Flags: needinfo?(nfroyd)
Comment hidden (Intermittent Failures Robot) |
Comment 19•7 years ago
|
||
(In reply to Nika Layzell [:mystor] from comment #17)
> I'm not sure what would've caused these problems again :-/, any ideas?
I don't know offhand. ttaubert was saying in #developers that some new security/certverifier tests might be causing problems. He was going to poke at them...ttaubert?
Maybe we should hack the gtest runner so that the allocation tests always run first and are therefore free and clear of any threads started by other tests...
Flags: needinfo?(nfroyd) → needinfo?(ttaubert)
Comment 20•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #19)
> (In reply to Nika Layzell [:mystor] from comment #17)
> > I'm not sure what would've caused these problems again :-/, any ideas?
>
> I don't know offhand. ttaubert was saying in #developers that some new
> security/certverifier tests might be causing problems. He was going to poke
> at them...ttaubert?
Yeah, that was bug 1434300, which has since been backed out.
> Maybe we should hack the gtest runner so that the allocation tests always
> run first and are therefore free and clear of any threads started by other
> tests...
I tried that by setting gtest_filter to AllocReplacement.*, call RUN_ALL_TESTS(), set the filter to -AllocReplacement.* and call RUN_ALL_TESTS() again. That works but I found a thread where the gtest devs say that this basically isn't supported and might break. They want to be able to run tests in any order.
Flags: needinfo?(ttaubert)
Comment 21•7 years ago
|
||
One hacky possibility is to write the AllocReplacement tests as death tests:
https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#death-tests
that don't actually crash, but exit successfully, so we could be sure they are run in a single-threaded context:
https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#how-it-works
https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#death-tests-and-threads
Comment 22•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #21)
> One hacky possibility is to write the AllocReplacement tests as death tests:
That seems like a great idea to me, we're doing the same for the storage init test [1].
I can still reproduce the intermittent issues with the test in bug 1440029, but I can't when AllocReplacment tests are turned into death tests.
[1] https://searchfox.org/mozilla-central/source/storage/test/gtest/test_service_init_background_thread.cpp
Comment 23•7 years ago
|
||
Attachment #8952992 -
Flags: review?(nfroyd)
Comment 24•7 years ago
|
||
Comment 25•7 years ago
|
||
Comment on attachment 8952992 [details] [diff] [review]
0001-Bug-1433015-Turn-AllocReplacement-gtests-into-death-.patch
Review of attachment 8952992 [details] [diff] [review]:
-----------------------------------------------------------------
I leave the wording of the comment up to you, but I think it'd be good to have *something* for this.
::: xpcom/tests/gtest/TestAllocReplacement.cpp
@@ +105,5 @@
>
> return before == after;
> }
>
> +TEST(AllocReplacementDeathTest, malloc_check)
It's not super-obvious to a reader why these things would be named "XDeathTest", I think, even with some familiarity with GTest. Can you write a comment explaining what's going on here? In particular, that these tests get executed prior to all tests, so we don't have to worry about threads spawned from other tests interfering with these? Pointing to AdvancedGuide.md might not be a bad idea.
Attachment #8952992 -
Flags: review?(nfroyd) → review+
Comment 26•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #25)
> It's not super-obvious to a reader why these things would be named
> "XDeathTest", I think, even with some familiarity with GTest. Can you write
> a comment explaining what's going on here? In particular, that these tests
> get executed prior to all tests, so we don't have to worry about threads
> spawned from other tests interfering with these? Pointing to
> AdvancedGuide.md might not be a bad idea.
Will do. Thanks!
Comment 27•7 years ago
|
||
Pushed by ttaubert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b1420da362d9
Turn AllocReplacement gtests into death tests r=froydnj
Comment 28•7 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•