Closed Bug 1455637 Opened 2 years ago Closed Last year

Intermittent [first test after startup] | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead()

Categories

(Core :: XPConnect, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kmag)

References

(Blocks 2 open bugs)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Filed by: ncsoregi [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=174774955&repo=autoland

https://queue.taskcluster.net/v1/task/eTx0nao2S9iCwzXzbxbDFQ/runs/0/artifacts/public/logs/live_backing.log

07:02:49     INFO - TEST-START | dom/flex/test/chrome/test_flex_items.html
07:02:49     INFO - GECKO(822) | ++DOMWINDOW == 14 (0x134f57400) [pid = 822] [serial = 14] [outer = 0x13418bc00]
07:02:49     INFO - GECKO(822) | ++DOMWINDOW == 15 (0x134f5e800) [pid = 822] [serial = 15] [outer = 0x13418bc00]
07:02:49     INFO - GECKO(822) | MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
07:02:49     INFO - GECKO(822) | MEMORY STAT | vsize 4133MB | residentFast 306MB | heapAllocated 115MB
07:02:49     INFO - TEST-OK | dom/flex/test/chrome/test_flex_items.html | took 1128ms
07:02:49     INFO - GECKO(822) | ++DOMWINDOW == 16 (0x134f63000) [pid = 822] [serial = 16] [outer = 0x13418bc00]
07:02:49     INFO - TEST-UNEXPECTED-FAIL | dom/flex/test/chrome/test_flex_items.html | assertion count 1 is more than expected 0 assertions
07:02:49     INFO - TEST-START | dom/flex/test/chrome/test_flex_lines.html
07:02:49     INFO - GECKO(822) | ++DOMWINDOW == 17 (0x134f64400) [pid = 822] [serial = 17] [outer = 0x13418bc00]
07:02:49     INFO - GECKO(822) | MEMORY STAT | vsize 4134MB | residentFast 310MB | heapAllocated 118MB
07:02:49     INFO - TEST-OK | dom/flex/test/chrome/test_flex_lines.html | took 297ms
The assertion here is:
###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file /builds/worker/workspace/build/src/obj-firefox/dist/include/nsCOMPtr.h, line 436

...with this backtrace 
07:02:49     INFO - GECKO(822) | #01: mozilla::URLPreloader::BeginBackgroundRead() [xpcom/base/nsCOMPtr.h:424]
07:02:49     INFO - 
07:02:49     INFO - GECKO(822) | #02: mozilla::ScriptPreloader::InitCache(nsTSubstring<char16_t> const&) [js/xpconnect/loader/ScriptPreloader.cpp:438]
07:02:49     INFO - 
07:02:49     INFO - GECKO(822) | #03: mozilla::ScriptPreloader::GetChildSingleton() [js/xpconnect/loader/ScriptPreloader.cpp:149]
07:02:49     INFO - 
07:02:49     INFO - GECKO(822) | #04: mozilla::ScriptPreloader::GetSingleton() [js/xpconnect/loader/ScriptPreloader.cpp:104]
07:02:49     INFO - 
07:02:49     INFO - GECKO(822) | #05: NS_InitXPCOM2 [xpcom/build/XPCOMInit.cpp:669]
07:02:49     INFO - 
07:02:49     INFO - GECKO(822) | #06: ScopedXPCOMStartup::Initialize() [toolkit/xre/nsAppRunner.cpp:1568]
07:02:49     INFO - 
07:02:49     INFO - GECKO(822) | #07: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:4975]
07:02:49     INFO - 
07:02:49     INFO - GECKO(822) | #08: XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5071]
07:02:49     INFO - 
07:02:49     INFO - GECKO(822) | #09: main [browser/app/nsBrowserApp.cpp:0]


The "guilty" line at stack level #01 is ScriptPreloader.cpp:438 which is here:
>     MOZ_TRY(OpenCache());
https://hg.mozilla.org/integration/autoland/file/3f4820437c25/js/xpconnect/loader/ScriptPreloader.cpp#l438

...so really something is probably failing inside of OpenCache(), but sadly the backtrace doesn't quite tell us what (aside from the fact that it's a nsCOMPtr operation where we're failing to QueryInterface).

Looks like kmag has worked on this ScriptPreloader.cpp OpenCache() stuff.  kmag, maybe you could take a look here and see if it's obvious what's going on here?  (I think "QueryInterface needed" assertions are kind of worrisome, though I'm not sure.)
Component: CSS Parsing and Computation → XPConnect
Flags: needinfo?(kmaglione+bmo)
Summary: Intermittent dom/flex/test/chrome/test_flex_items.html | assertion count 1 is more than expected 0 assertions → Intermittent dom/flex/test/chrome/test_flex_items.html | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead()
Sorry, got my stack levels wrong -- that ScriptPreloader.cpp:438 line is stack level #02.

Stack level #01 is in ~nsCOMPtr:
> NSCAP_LOG_RELEASE(this, mRawPtr);
https://hg.mozilla.org/integration/autoland/file/3f4820437c25/xpcom/base/nsCOMPtr.h#l424

And then, digging in from the other side (the line mentioned in the assertion-failure itself) -- that's in Assert_NoQueryNeeded(), here:
  https://hg.mozilla.org/integration/autoland/file/3f4820437c25/xpcom/base/nsCOMPtr.h#l436
...which seems to be called (as macro NSCAP_ASSERT_NO_QUERY_NEEDED()) as a sanity check from most nsCOMPtr internal methods.


Anyway. It seems clear this is an issue with the sole nsCOMPtr that exists in OpenCache(), which is "nsCOMPtr<nsIFile> cacheFile", and we detect the issue when we destruct that nsCOMPtr.  Hoping kmag can take it from here. :)
I agree that the QueryInterface needed assertions are worrisome, but the stacks don't make sense...

I think that this is probably in URLPreloader::BeginBackgroundRead, and not OpenCache. But nsCOMPtr.h:424 doesn't do anything that should trigger Assert_NoQueryNeeded, so it's hard to tell where this is actually happening.

The only two things in BeginBackgroundRead that should be able to trigger that assertion are:

  nsCOMPtr<nsIRunnable> runnable = NewRunnableMethod(...)

and

  NS_NewNamedThread(..., getter_AddRefs(mReaderThread), ...);

Neither of those should ever need a QI.

If I had to guess, though, just based on the weirdness of the stack, I'd say this is probably happening in the NewRunnableMethod assignment, since there's a lot of inlining there, and do_AddRef call creates a temporary nsCOMPtr, which means its destructor will be called at about the same time as the assignment.

Not sure where to go from there, though. I can't see how this could happen without memory corruption, and we've only seen it happen once so far.
Bug 1456690 has another instance of this.
I just saw another similar crash, with a similarly weird stack, this time with the top frame in ScriptPreloader::InvalidateCache().

That function also has a NS_NewNamedThread(..., getter_AddRefs(mReaderThread), ...); call at the end of it, so I'm guessing that's where the problem is, and the assertion is probably at:

https://searchfox.org/mozilla-central/source/xpcom/base/nsCOMPtr.h#1325

Still no idea how we could end up needing a QI there, though...
Flags: needinfo?(kmaglione+bmo)
Ohhhhh... I think I may see what's going on.

Those threads both release release their thread pointers when they're done running. There are invariants in place to make sure we never touch those members on two threads at the same time, but they don't take nsGetterAddRefs into account. Which means it's possible for it to be released in the middle of the Assert_NoQueryNeeded check, and us to wind up comparing a null mRawPtr against the previous nsIThread value.

That's kind of annoying...
Assignee: nobody → kmaglione+bmo
Duplicate of this bug: 1457037
Duplicate of this bug: 1461235
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Summary: Intermittent dom/flex/test/chrome/test_flex_items.html | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead() → Intermittent [first reftest after startup] | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead()
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → INCOMPLETE
This is still happening; just got a new report in bug 1481743.  --> Reopening.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
This started to happen for me a lot when running the webdriver web-platform tests locally on MacOS with an artifact debug build. Maybe some recent commit made this more visible? I haven't seen this happening that much last week.
Ok, so I used bisecting to figure out when this started, and it is actually Kris' patch from bug 1484496 which causes the assertion to happen multiple times during startup of Firefox.

Kris, should we get a separate bug filed for that?
Flags: needinfo?(kmaglione+bmo)
(In reply to Henrik Skupin (:whimboo) from comment #21)
> Ok, so I used bisecting to figure out when this started, and it is actually
> Kris' patch from bug 1484496 which causes the assertion to happen multiple
> times during startup of Firefox.
> 
> Kris, should we get a separate bug filed for that?

That sounds like bug 1486311, which is unrelated to this.
Flags: needinfo?(kmaglione+bmo)
Note: initially we were only getting reports of this for reftests, but we just got one instance for a mochitest run -- bug 1455637.

Log: https://treeherder.mozilla.org/logviewer.html#?job_id=200496598&repo=mozilla-inbound
(this is for OS X 10.10 debug Mochitests test-macosx64/debug-mochitest-chrome-2 )

--> Adjusting title to say "first test" rather than "first reftest"
Summary: Intermittent [first reftest after startup] | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead() → Intermittent [first test after startup] | assertion count 1 is more than expected 0 assertions, with " ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file nsCOMPtr.h, line 436" in URLPreloader::BeginBackgroundRead()
Did you ever figure out anything about this beyond what is in this bug? I can try taking a look.
Blocks: 1496629
Flags: needinfo?(kmaglione+bmo)
QA Contact: overholt
Depends on: 1223177
(In reply to Andrew McCreight [:mccr8] from comment #27)
> Did you ever figure out anything about this beyond what is in this bug? I
> can try taking a look.

It's just a race, where the AssertNoQueryNeeded checks run at a time that violates invariants. I'm probably just going to have to change it to a RefPtr.
Flags: needinfo?(kmaglione+bmo)
QA Contact: overholt
Are you going to be able to fix that in the next few days or should I downgrade the nsCOMPtr assertion to be non-fatal?
Flags: needinfo?(kmaglione+bmo)
getter_AddRefs for nsCOMPtr does an AssertNoQueryNeeded check when its
temporary is destroyed. For the mReaderThread, this happens at a time when
control of the member variable has been handed over to the reader thread,
which causes a data race in the query needed check when the reader thread
shuts itself down and clears the member.

Switching to RefPtr solves this problem by removing the unnecessary check.
Blocks: 1223177
No longer depends on: 1223177
https://hg.mozilla.org/mozilla-central/rev/983a5ee19bb3
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Flags: needinfo?(kmaglione+bmo)
You need to log in before you can comment on or make changes to this bug.