Closed Bug 1370783 Opened 2 years ago Closed 10 months ago

netwerk/test/browser/browser_child_resource.js disabled on Linux64 debug due to | uncaught exception - Error: Assertion failure at assert@chrome://browser/content/tabbrowser.xml:4227:25

Categories

(Firefox :: Tabbed Browser, defect, P3)

defect

Tracking

()

RESOLVED INACTIVE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(1 file, 1 obsolete file)

Filed by: philringnalda [at] gmail.com

https://treeherder.mozilla.org/logviewer.html#?job_id=105070023&repo=try

https://archive.mozilla.org/pub/firefox/try-builds/philringnalda@gmail.com-4ad947a83cfaf37b67206eb782b5cc2c0da819a6/try-win64-debug/try_win8_64-debug_test-mochitest-e10s-browser-chrome-6-bm111-tests1-windows-build102.txt.gz

Be interesting to find out whether this is only intermittent on Gecko 55 after it hits mozilla-beta (I've seen it several times while doing simulations of 55-as-beta on try), or is intermittently happening on the trunk and nobody can be bothered to file it.
Indeed, 11 failures on beta already.
The crash stack.

>20:24:14     INFO - GECKO(1232) | [Parent 1232] ###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\src\obj-firefox\dist\include\nsCOMPtr.h, line 414
>20:24:15     INFO - GECKO(1232) | #01: nsDocShell::InternalLoad(nsIURI *,nsIURI *,bool,nsIURI *,unsigned int,nsIPrincipal *,nsIPrincipal *,unsigned int,nsAString const &,char const *,nsAString const &,nsIInputStream *,nsIInputStream *,unsigned int,nsISHEntry *,bool,nsAString const &,nsIDocShell *,nsIURI *,bool,nsIDocShell * *,nsIRequest * *) [docshell/base/nsDocShell.cpp:10752]
>20:24:15     INFO - 
>20:24:15     INFO - GECKO(1232) | #02: nsDocShell::OnLinkClickSync(nsIContent *,nsIURI *,char16_t const *,nsAString const &,nsIInputStream *,nsIInputStream *,bool,nsIDocShell * *,nsIRequest * *,nsIPrincipal *) [docshell/base/nsDocShell.cpp:14212]
>20:24:15     INFO - 
>20:24:15     INFO - GECKO(1232) | #03: OnLinkClickEvent::Run() [docshell/base/nsDocShell.cpp:13972]
>20:24:15     INFO - 
>20:24:15     INFO - GECKO(1232) | #04: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:1369]
>20:24:15     INFO - 
>20:24:15     INFO - GECKO(1232) | #05: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/threads/nsThreadUtils.cpp:472]
>20:24:15     INFO - 
>20:24:15     INFO - GECKO(1232) | #06: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:96]
>20:24:15     INFO - 
>20:24:15     INFO - GECKO(1232) | #07: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:238]
>20:24:15     INFO - 
>20:24:15     INFO - GECKO(1232) | #08: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:232]
>20:24:15     INFO - 
>20:24:15     INFO - GECKO(1232) | #09: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:212]
>20:24:15     INFO - 
>20:24:15     INFO - GECKO(1232) | #10: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:158]
>20:24:15     INFO - 
>20:24:15     INFO - GECKO(1232) | #11: nsAppShell::Run() [widget/windows/nsAppShell.cpp:271]
The crash complains about the conversion from nsIRequest to nsCOMPtr<nsIRequest> via getter_AddRefs<>.
https://searchfox.org/mozilla-central/rev/61054508641ee76f9c49bcf7303ef3cfb6b410d2/docshell/base/nsDocShell.cpp#10747
https://searchfox.org/mozilla-central/rev/61054508641ee76f9c49bcf7303ef3cfb6b410d2/xpcom/base/nsCOMPtr.h#414

Maybe replacing the NS_ADDREF statement with

nsCOMPtr<nsIRequest> req = do_QueryInterface(channel);
req.forget(aRequest);

in nsDocShell::DoURILoad can fix it. However it would be good to know which channel implementation makes the static type casting failed before we cover it up.
https://searchfox.org/mozilla-central/rev/61054508641ee76f9c49bcf7303ef3cfb6b410d2/docshell/base/nsDocShell.cpp#11131
Hmm...this test case is about res:// protocol, however there is no recent modification on nsResProtocolHandler.
Do you want to take a look?
Flags: needinfo?(schien)
Patch is created according to my previous analysis on comment #4.
Flags: needinfo?(schien)
Comment on attachment 8882046 [details]
Bug 1370783 - use QI instead of static cast from nsIChannel to nsIRequest.

https://reviewboard.mozilla.org/r/153054/#review158414

This really looks like just hiding some actual bug elsewhere. How do we get wrong channel pointer there? nsIChannel after all extends nsIRequest.
Attachment #8882046 - Flags: review?(bugs) → review-
Whiteboard: [necko-next]
After further debugging, the 'QueryInterface needed' assertion is generated by browser_about_cache.js but not browser_child_resource.js. Bug 1379095 is filed for the corresponding fix. Unfortunately this intermittent failure still exists even if Bug 1379095 is fixed.
This test case failure is complaining about an unexpected assertion in tabbrowser.xml
>09:09:48     INFO -  GECKO(1708) | assert@chrome://browser/content/tabbrowser.xml:4273:46
>09:09:48     INFO -  GECKO(1708) | finish@chrome://browser/content/tabbrowser.xml:4130:15
>09:09:48     INFO -  GECKO(1708) | postActions@chrome://browser/content/tabbrowser.xml:4394:17
>09:09:48     INFO -  GECKO(1708) | onUnloadTimeout@chrome://browser/content/tabbrowser.xml:4432:15
>09:09:48     INFO -  GECKO(1708) | requestTab/this.unloadTimer<@chrome://browser/content/tabbrowser.xml:4626:54

Tried to enable log in tabbrowser.xml.

In success case, the tab is consider as non-blank while unload timer timeout.
https://treeherder.mozilla.org/logviewer.html#?job_id=112953460&repo=try&lineNumber=43608

However in failure case, the tab is consider as blank at that time.
https://treeherder.mozilla.org/logviewer.html#?job_id=112953462&repo=try&lineNumber=43761

This is less likely to be a networking issue but a UI issue.


[full try push]
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e38b6142b94bc332324c7bc3c44494b075f675d9
Assignee: nobody → schien
Just as failtastic on Beta56 in case anybody was wondering. Really hope we can make some progress on this soon.
Based on my investigation on comment #16, need more information from someone familiar with tabbrowser.xml.
Unassigned myself and dispatch this bug to Firefox:Tabbed Browser component.
Assignee: schien → nobody
Component: Networking → Tabbed Browser
Product: Core → Firefox
Whiteboard: [necko-next]
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #25)
> Based on my investigation on comment #16, need more information from someone
> familiar with tabbrowser.xml.
> Unassigned myself and dispatch this bug to Firefox:Tabbed Browser component.

Just moving the bug doesn't do much. Please find an owner if you can't own this, and/or clarify what is actually needed here - why is this a tabbrowser issue? What information do you need? What does comment 16 even mean?

(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #16)
> This test case failure is complaining about an unexpected assertion in
> tabbrowser.xml
> >09:09:48     INFO -  GECKO(1708) | assert@chrome://browser/content/tabbrowser.xml:4273:46
> >09:09:48     INFO -  GECKO(1708) | finish@chrome://browser/content/tabbrowser.xml:4130:15
> >09:09:48     INFO -  GECKO(1708) | postActions@chrome://browser/content/tabbrowser.xml:4394:17
> >09:09:48     INFO -  GECKO(1708) | onUnloadTimeout@chrome://browser/content/tabbrowser.xml:4432:15
> >09:09:48     INFO -  GECKO(1708) | requestTab/this.unloadTimer<@chrome://browser/content/tabbrowser.xml:4626:54
> 
> Tried to enable log in tabbrowser.xml.
> 
> In success case, the tab is consider as non-blank while unload timer timeout.
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=112953460&repo=try&lineNumber=43608
> 
> However in failure case, the tab is consider as blank at that time.
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=112953462&repo=try&lineNumber=43761
> 
> This is less likely to be a networking issue but a UI issue.
> 
> 
> [full try push]
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=e38b6142b94bc332324c7bc3c44494b075f675d9

What is the "unload timer" ? What is the test trying to test and what does that have to do with tabbrowser, if it's a netwerk/ test?
Flags: needinfo?(schien)
See Also: → 1372308
See Also: → 1378628
See Also: → 1381204
browser_child_resource.js uses "await BrowserTestUtils.crashBrowser(browser); browser.reload();" to ensure resource URL loaded on a fresh content process. (@mossop can help clarify the purpose since he is the author of this test case).

Test case is failed to complete due to an assertion in tabbrowser.xml.
>00:03:11     INFO -  finish@chrome://browser/content/tabbrowser.xml:4195:15
>00:03:11     INFO -  postActions@chrome://browser/content/tabbrowser.xml:4470:17
>00:03:11     INFO -  handleEvent@chrome://browser/content/tabbrowser.xml:4735:15
>00:03:11     INFO -  updateBrowserRemoteness@chrome://browser/content/tabbrowser.xml:1958:13
>00:03:11     INFO -  onxbloop-browser-crashed@chrome://browser/content/tabbrowser.xml:5791:13

https://dxr.mozilla.org/mozilla-beta/rev/423e59c17f37225371b4d9fa8c207eb1e0a752b5/browser/base/content/tabbrowser.xml#4195
> this.assert(!this.loadTimer);

I'd like someone familiar with tabbrowser.xml to check why loadTimer is not null when |BrowserTestUtils.crashBrowser| is invoked.

@gijs can you help on this?
Flags: needinfo?(schien) → needinfo?(gijskruitbosch+bugs)
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #27)
> browser_child_resource.js uses "await
> BrowserTestUtils.crashBrowser(browser); browser.reload();" to ensure
> resource URL loaded on a fresh content process. (@mossop can help clarify
> the purpose since he is the author of this test case).
> 
> Test case is failed to complete due to an assertion in tabbrowser.xml.
> >00:03:11     INFO -  finish@chrome://browser/content/tabbrowser.xml:4195:15
> >00:03:11     INFO -  postActions@chrome://browser/content/tabbrowser.xml:4470:17
> >00:03:11     INFO -  handleEvent@chrome://browser/content/tabbrowser.xml:4735:15
> >00:03:11     INFO -  updateBrowserRemoteness@chrome://browser/content/tabbrowser.xml:1958:13
> >00:03:11     INFO -  onxbloop-browser-crashed@chrome://browser/content/tabbrowser.xml:5791:13
> 
> https://dxr.mozilla.org/mozilla-beta/rev/
> 423e59c17f37225371b4d9fa8c207eb1e0a752b5/browser/base/content/tabbrowser.
> xml#4195
> > this.assert(!this.loadTimer);
> 
> I'd like someone familiar with tabbrowser.xml to check why loadTimer is not
> null when |BrowserTestUtils.crashBrowser| is invoked.
> 
> @gijs can you help on this?

I'm not sure off-hand. Maybe Dave knows or knows someone who knows.
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(dtownsend)
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #27)
> browser_child_resource.js uses "await
> BrowserTestUtils.crashBrowser(browser); browser.reload();" to ensure
> resource URL loaded on a fresh content process. (@mossop can help clarify
> the purpose since he is the author of this test case).

The point is to verify that new content processes have the resource mappings.

> Test case is failed to complete due to an assertion in tabbrowser.xml.
> >00:03:11     INFO -  finish@chrome://browser/content/tabbrowser.xml:4195:15
> >00:03:11     INFO -  postActions@chrome://browser/content/tabbrowser.xml:4470:17
> >00:03:11     INFO -  handleEvent@chrome://browser/content/tabbrowser.xml:4735:15
> >00:03:11     INFO -  updateBrowserRemoteness@chrome://browser/content/tabbrowser.xml:1958:13
> >00:03:11     INFO -  onxbloop-browser-crashed@chrome://browser/content/tabbrowser.xml:5791:13
> 
> https://dxr.mozilla.org/mozilla-beta/rev/
> 423e59c17f37225371b4d9fa8c207eb1e0a752b5/browser/base/content/tabbrowser.
> xml#4195
> > this.assert(!this.loadTimer);
> 
> I'd like someone familiar with tabbrowser.xml to check why loadTimer is not
> null when |BrowserTestUtils.crashBrowser| is invoked.
> 
> @gijs can you help on this?

I'm going to defer to mconley, I think he did most of the async tab switching code.
Flags: needinfo?(dtownsend) → needinfo?(mconley)
Looking.
Assignee: nobody → mconley
Flags: needinfo?(mconley)
It _looks_ like this fixes the intermittent, but it also introduces a mostly permanent failure in toolkit/content/tests/browser/browser_sound_indicator_silent_video.js of all places.

Still poking.
Hey Mike, I don't suppose you've had any time to take a fresh look at this one? Still the #1 orange on Beta56 and on the 57-as-Beta simulations.
Flags: needinfo?(mconley)
Attachment #8899921 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8899921 [details]
Bug 1370783 - Account for tabs that have just crashed when updating state in preActions of async tab switcher.

https://reviewboard.mozilla.org/r/171252/#review185054

I mean, this change makes sense, so tentative r+ .

Did you figure out the permafailure in that video test? Your last trypush doesn't seem to have the win8 tests it was failing on.

::: browser/base/content/tabbrowser.xml:4402
(Diff revision 1)
> +                 (this.loadingTab.linkedBrowser.isRemoteBrowser &&
> +                  !this.loadingTab.linkedBrowser.frameLoader.tabParent));

As discussed on IRC, we probably want to do the same in the loop at the top of this method.
Attachment #8899921 - Flags: review?(gijskruitbosch+bugs) → review+
Comment on attachment 8899921 [details]
Bug 1370783 - Account for tabs that have just crashed when updating state in preActions of async tab switcher.

https://reviewboard.mozilla.org/r/171252/#review185054

Thanks for the heads up - I used the same try syntax, but I guess the defaults changed.

I did a run on Win 8 64 debug bc1, and it ran browser_sound_indicator_silent_video.js and seems to pass now.

¯\_(ツ)_/¯

> As discussed on IRC, we probably want to do the same in the loop at the top of this method.

Thanks, will do and then land.
will this land today? I was going to disable this test and then saw the comment from Friday about a fix!
Comment on attachment 8899921 [details]
Bug 1370783 - Account for tabs that have just crashed when updating state in preActions of async tab switcher.

https://reviewboard.mozilla.org/r/171252/#review185054

> Thanks, will do and then land.

So I did this locally, and when manually testing, noticed that there seemed to be cases where switching to crashed tabs would result in a perma-spinner and we'd never show about:tabcrashed. We're late enough in the cycle, and I'm sufficiently spooked enough, that I don't think I'm going to have this change be part of the patch. I'll file a new bug and we can investigate in the next cycle.
I'm sufficiently paranoid about this change so late in the cycle that I'm going to ni? billm just to make sure my thinking here is sound.
Flags: needinfo?(mconley) → needinfo?(wmccloskey)
Er, to be clear, my needinfo? is essentially a feedback? flag.
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/14e8c355a039
Disable netwerk/test/browser/browser_child_resource.js on debug for frequent failures. r=me, a=test-only
disabled this test to reduce the impact on the trees- please remember to enable this when testing a fix on try and landing!
Keywords: leave-open
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
Could you explain this change more, Mike? From the exception stack, we went through onRemotenessChange (presumably due to the crash). Is the problem that isRemoteBrowser is still true? I don't see why that would be since updateBrowserRemoteness should have already set it to false before it fires the TabRemotenessChange event.

If isRemoteBrowser is false, then I would expect us to take this path:
http://searchfox.org/mozilla-central/rev/1c13d5cf85f904afb8976c02a80daa252b893fca/browser/base/content/tabbrowser.xml#4829
And I would expect loadTimer and loadingTab to be nulled out there.
Flags: needinfo?(wmccloskey) → needinfo?(mconley)
(In reply to Bill McCloskey (:billm) from comment #60)
> I don't see why that would be since
> updateBrowserRemoteness should have already set it to false before it fires
> the TabRemotenessChange event.
> 

You're right. I forgot we'd gotten through the updateBrowserRemoteness path. My change doesn't make much sense now - thanks for the second set of eyes!

For some reason I thought we might have gotten into a case where a remote browser had crashed, but we hadn't flipped its remoteness, so it was in that "just crashed" busted state.

So I think we're back at square one here.
Flags: needinfo?(mconley)
See Also: 1381204
Priority: -- → P3
The leave-open keyword is there and there is no activity for 6 months.
:mconley, maybe it's time to close this bug?
Flags: needinfo?(mconley)
Sure, sounds good.
Assignee: mconley → nobody
Status: NEW → RESOLVED
Closed: 10 months ago
Flags: needinfo?(mconley)
Resolution: --- → INCOMPLETE
Since this is still the bug about a disabled test, let's use the pointless "resolution" to shut up the housekeeper, not a resolution which implies something's missing other than the will to fix it (or even to find out whether or not it still happens).
Resolution: INCOMPLETE → INACTIVE
Summary: Intermittent netwerk/test/browser/browser_child_resource.js | uncaught exception - Error: Assertion failure at assert@chrome://browser/content/tabbrowser.xml:4227:25 → netwerk/test/browser/browser_child_resource.js disabled on Linux64 debug due to | uncaught exception - Error: Assertion failure at assert@chrome://browser/content/tabbrowser.xml:4227:25
You need to log in before you can comment on or make changes to this bug.