Closed
Bug 1455637
Opened 7 years ago
Closed 6 years ago
Intermittent [first test after startup] | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead()
Categories
(Core :: XPConnect, defect, P5)
Core
XPConnect
Tracking
()
RESOLVED
FIXED
mozilla64
Tracking | Status | |
---|---|---|
firefox64 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: kmag)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
Filed by: ncsoregi [at] mozilla.com
https://treeherder.mozilla.org/logviewer.html#?job_id=174774955&repo=autoland
https://queue.taskcluster.net/v1/task/eTx0nao2S9iCwzXzbxbDFQ/runs/0/artifacts/public/logs/live_backing.log
07:02:49 INFO - TEST-START | dom/flex/test/chrome/test_flex_items.html
07:02:49 INFO - GECKO(822) | ++DOMWINDOW == 14 (0x134f57400) [pid = 822] [serial = 14] [outer = 0x13418bc00]
07:02:49 INFO - GECKO(822) | ++DOMWINDOW == 15 (0x134f5e800) [pid = 822] [serial = 15] [outer = 0x13418bc00]
07:02:49 INFO - GECKO(822) | MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
07:02:49 INFO - GECKO(822) | MEMORY STAT | vsize 4133MB | residentFast 306MB | heapAllocated 115MB
07:02:49 INFO - TEST-OK | dom/flex/test/chrome/test_flex_items.html | took 1128ms
07:02:49 INFO - GECKO(822) | ++DOMWINDOW == 16 (0x134f63000) [pid = 822] [serial = 16] [outer = 0x13418bc00]
07:02:49 INFO - TEST-UNEXPECTED-FAIL | dom/flex/test/chrome/test_flex_items.html | assertion count 1 is more than expected 0 assertions
07:02:49 INFO - TEST-START | dom/flex/test/chrome/test_flex_lines.html
07:02:49 INFO - GECKO(822) | ++DOMWINDOW == 17 (0x134f64400) [pid = 822] [serial = 17] [outer = 0x13418bc00]
07:02:49 INFO - GECKO(822) | MEMORY STAT | vsize 4134MB | residentFast 310MB | heapAllocated 118MB
07:02:49 INFO - TEST-OK | dom/flex/test/chrome/test_flex_lines.html | took 297ms
Comment 1•7 years ago
|
||
The assertion here is:
###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file /builds/worker/workspace/build/src/obj-firefox/dist/include/nsCOMPtr.h, line 436
...with this backtrace
07:02:49 INFO - GECKO(822) | #01: mozilla::URLPreloader::BeginBackgroundRead() [xpcom/base/nsCOMPtr.h:424]
07:02:49 INFO -
07:02:49 INFO - GECKO(822) | #02: mozilla::ScriptPreloader::InitCache(nsTSubstring<char16_t> const&) [js/xpconnect/loader/ScriptPreloader.cpp:438]
07:02:49 INFO -
07:02:49 INFO - GECKO(822) | #03: mozilla::ScriptPreloader::GetChildSingleton() [js/xpconnect/loader/ScriptPreloader.cpp:149]
07:02:49 INFO -
07:02:49 INFO - GECKO(822) | #04: mozilla::ScriptPreloader::GetSingleton() [js/xpconnect/loader/ScriptPreloader.cpp:104]
07:02:49 INFO -
07:02:49 INFO - GECKO(822) | #05: NS_InitXPCOM2 [xpcom/build/XPCOMInit.cpp:669]
07:02:49 INFO -
07:02:49 INFO - GECKO(822) | #06: ScopedXPCOMStartup::Initialize() [toolkit/xre/nsAppRunner.cpp:1568]
07:02:49 INFO -
07:02:49 INFO - GECKO(822) | #07: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:4975]
07:02:49 INFO -
07:02:49 INFO - GECKO(822) | #08: XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5071]
07:02:49 INFO -
07:02:49 INFO - GECKO(822) | #09: main [browser/app/nsBrowserApp.cpp:0]
The "guilty" line at stack level #01 is ScriptPreloader.cpp:438 which is here:
> MOZ_TRY(OpenCache());
https://hg.mozilla.org/integration/autoland/file/3f4820437c25/js/xpconnect/loader/ScriptPreloader.cpp#l438
...so really something is probably failing inside of OpenCache(), but sadly the backtrace doesn't quite tell us what (aside from the fact that it's a nsCOMPtr operation where we're failing to QueryInterface).
Looks like kmag has worked on this ScriptPreloader.cpp OpenCache() stuff. kmag, maybe you could take a look here and see if it's obvious what's going on here? (I think "QueryInterface needed" assertions are kind of worrisome, though I'm not sure.)
Component: CSS Parsing and Computation → XPConnect
Flags: needinfo?(kmaglione+bmo)
Summary: Intermittent dom/flex/test/chrome/test_flex_items.html | assertion count 1 is more than expected 0 assertions → Intermittent dom/flex/test/chrome/test_flex_items.html | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead()
Comment 2•7 years ago
|
||
Sorry, got my stack levels wrong -- that ScriptPreloader.cpp:438 line is stack level #02.
Stack level #01 is in ~nsCOMPtr:
> NSCAP_LOG_RELEASE(this, mRawPtr);
https://hg.mozilla.org/integration/autoland/file/3f4820437c25/xpcom/base/nsCOMPtr.h#l424
And then, digging in from the other side (the line mentioned in the assertion-failure itself) -- that's in Assert_NoQueryNeeded(), here:
https://hg.mozilla.org/integration/autoland/file/3f4820437c25/xpcom/base/nsCOMPtr.h#l436
...which seems to be called (as macro NSCAP_ASSERT_NO_QUERY_NEEDED()) as a sanity check from most nsCOMPtr internal methods.
Anyway. It seems clear this is an issue with the sole nsCOMPtr that exists in OpenCache(), which is "nsCOMPtr<nsIFile> cacheFile", and we detect the issue when we destruct that nsCOMPtr. Hoping kmag can take it from here. :)
Assignee | ||
Comment 3•7 years ago
|
||
I agree that the QueryInterface needed assertions are worrisome, but the stacks don't make sense...
I think that this is probably in URLPreloader::BeginBackgroundRead, and not OpenCache. But nsCOMPtr.h:424 doesn't do anything that should trigger Assert_NoQueryNeeded, so it's hard to tell where this is actually happening.
The only two things in BeginBackgroundRead that should be able to trigger that assertion are:
nsCOMPtr<nsIRunnable> runnable = NewRunnableMethod(...)
and
NS_NewNamedThread(..., getter_AddRefs(mReaderThread), ...);
Neither of those should ever need a QI.
If I had to guess, though, just based on the weirdness of the stack, I'd say this is probably happening in the NewRunnableMethod assignment, since there's a lot of inlining there, and do_AddRef call creates a temporary nsCOMPtr, which means its destructor will be called at about the same time as the assignment.
Not sure where to go from there, though. I can't see how this could happen without memory corruption, and we've only seen it happen once so far.
Comment hidden (Intermittent Failures Robot) |
Comment 5•7 years ago
|
||
Bug 1456690 has another instance of this.
Assignee | ||
Comment 6•7 years ago
|
||
I just saw another similar crash, with a similarly weird stack, this time with the top frame in ScriptPreloader::InvalidateCache().
That function also has a NS_NewNamedThread(..., getter_AddRefs(mReaderThread), ...); call at the end of it, so I'm guessing that's where the problem is, and the assertion is probably at:
https://searchfox.org/mozilla-central/source/xpcom/base/nsCOMPtr.h#1325
Still no idea how we could end up needing a QI there, though...
Flags: needinfo?(kmaglione+bmo)
Assignee | ||
Comment 7•7 years ago
|
||
Ohhhhh... I think I may see what's going on.
Those threads both release release their thread pointers when they're done running. There are invariants in place to make sure we never touch those members on two threads at the same time, but they don't take nsGetterAddRefs into account. Which means it's possible for it to be released in the middle of the Assert_NoQueryNeeded check, and us to wind up comparing a null mRawPtr against the previous nsIThread value.
That's kind of annoying...
Assignee: nobody → kmaglione+bmo
Comment hidden (Intermittent Failures Robot) |
Comment 12•7 years ago
|
||
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Updated•7 years ago
|
Summary: Intermittent dom/flex/test/chrome/test_flex_items.html | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead() → Intermittent [first reftest after startup] | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead()
Comment 16•7 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → INCOMPLETE
Comment 18•7 years ago
|
||
This is still happening; just got a new report in bug 1481743. --> Reopening.
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Comment hidden (Intermittent Failures Robot) |
Comment 20•7 years ago
|
||
This started to happen for me a lot when running the webdriver web-platform tests locally on MacOS with an artifact debug build. Maybe some recent commit made this more visible? I haven't seen this happening that much last week.
Comment 21•7 years ago
|
||
Ok, so I used bisecting to figure out when this started, and it is actually Kris' patch from bug 1484496 which causes the assertion to happen multiple times during startup of Firefox.
Kris, should we get a separate bug filed for that?
Flags: needinfo?(kmaglione+bmo)
Assignee | ||
Comment 22•6 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #21)
> Ok, so I used bisecting to figure out when this started, and it is actually
> Kris' patch from bug 1484496 which causes the assertion to happen multiple
> times during startup of Firefox.
>
> Kris, should we get a separate bug filed for that?
That sounds like bug 1486311, which is unrelated to this.
Flags: needinfo?(kmaglione+bmo)
Comment 26•6 years ago
|
||
Note: initially we were only getting reports of this for reftests, but we just got one instance for a mochitest run -- bug 1455637.
Log: https://treeherder.mozilla.org/logviewer.html#?job_id=200496598&repo=mozilla-inbound
(this is for OS X 10.10 debug Mochitests test-macosx64/debug-mochitest-chrome-2 )
--> Adjusting title to say "first test" rather than "first reftest"
Summary: Intermittent [first reftest after startup] | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead() → Intermittent [first test after startup] | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead()
Comment 27•6 years ago
|
||
Did you ever figure out anything about this beyond what is in this bug? I can try taking a look.
Assignee | ||
Comment 28•6 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #27)
> Did you ever figure out anything about this beyond what is in this bug? I
> can try taking a look.
It's just a race, where the AssertNoQueryNeeded checks run at a time that violates invariants. I'm probably just going to have to change it to a RefPtr.
Flags: needinfo?(kmaglione+bmo)
Updated•6 years ago
|
QA Contact: overholt
Comment 29•6 years ago
|
||
Are you going to be able to fix that in the next few days or should I downgrade the nsCOMPtr assertion to be non-fatal?
Flags: needinfo?(kmaglione+bmo)
Assignee | ||
Comment 30•6 years ago
|
||
getter_AddRefs for nsCOMPtr does an AssertNoQueryNeeded check when its
temporary is destroyed. For the mReaderThread, this happens at a time when
control of the member variable has been handed over to the reader thread,
which causes a data race in the query needed check when the reader thread
shuts itself down and clears the member.
Switching to RefPtr solves this problem by removing the unnecessary check.
Updated•6 years ago
|
Assignee | ||
Comment 31•6 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/983a5ee19bb3844c4d7f00ba55bb958b01e7341c
Bug 1455637: Use RefPtr rather than nsCOMPtr to avoid data race. r=mccr8
Comment 32•6 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 7 years ago → 6 years ago
status-firefox64:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Updated•6 years ago
|
Flags: needinfo?(kmaglione+bmo)
You need to log in
before you can comment on or make changes to this bug.
Description
•