Closed Bug 1588256 Opened 5 months ago Closed 4 months ago

Intermittent newmailaccount | application crashed @ nsWindowWatcher::OpenWindowInternal

Categories

(Thunderbird :: General, defect, critical)

defect
Not set
critical

Tracking

(thunderbird71 fixed, thunderbird72 fixed)

RESOLVED FIXED
Thunderbird 72.0
Tracking Status
thunderbird71 --- fixed
thunderbird72 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jorgk-bmo)

Details

(Keywords: crash, Whiteboard: [Thunderbird-testfailure: Z all][Thunderbird-disabled-test])

Crash Data

Attachments

(1 file, 3 obsolete files)

Filed by: geoff [at] darktrojan.net
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=270917061&repo=comm-central
Full log: https://queue.taskcluster.net/v1/task/J9Fs5Q-0Tr-P7DIEULt4Ow/runs/0/artifacts/public/logs/live_backing.log


Assertion failure: !chan (Why is there a document channel?), at /builds/worker/workspace/build/src/toolkit/components/windowwatcher/nsWindowWatcher.cpp:996
Disconnect Error: Application unexpectedly closed
newmailaccount | application crashed [@ nsWindowWatcher::OpenWindowInternal(mozIDOMWindowProxy*, char const*, char const*, char const*, bool, bool, bool, nsIArray*, bool, bool, bool, nsDocShellLoadState*, mozilla::dom::BrowsingContext**)]
(runtestlist.py) | Exited with code 1 during directory run
Return code: 1

Crash in debug mozmill, all platforms.

Here's the mozilla-central range: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=106fddd0491636603c83208fdb5f953b8f&tochange=a7ca5ad33f3df4477ddb16560386b1b5aa

Or it could be our response to bug 1578624 (bug 1588057).

Wow, I was just about to file this one.

Or it could be our response to bug 1578624 (bug 1588057).

It could be that we're doing something that doesn't work any more in the new Fission world, see fixes in bug 1588006 and bug 1582941 for other occasions were opening a new window needed to be tweaked.

mozmake SOLO_TEST=newmailaccount/test-newmailaccount.js mozmill-one

Crashes in test_window_open_link_opening_behaviour. Disabling that test makes the rest run through.

Bug 1588057 (https://hg.mozilla.org/comm-central/rev/d7b26f4d5ea1) mostly changed getting messages/parts/attachments for mailbox, news and imap protocols. I've flipped the false to true in nsMsgMailNewsUrl.cpp and nsMessenger.cpp and the test still crashes.

Matt, could you give us a hint what could be wrong in this test
https://searchfox.org/comm-central/rev/db35bf838043bf2ba0aa78d3906c0ee7d4cddacb/mail/test/mozmill/newmailaccount/test-newmailaccount.js#1088-1105
after bug 1578624.

Here are a few more lines of the crash stack:

Assertion failure: !chan (Why is there a document channel?), at c:/mozilla-source/comm-central/toolkit/components/windowwatcher/nsWindowWatcher.cpp:996
#01: nsGlobalWindowOuter::OpenInternal (c:\mozilla-source\comm-central\dom\base\nsGlobalWindowOuter.cpp:7246)
#02: nsGlobalWindowOuter::OpenOuter (c:\mozilla-source\comm-central\dom\base\nsGlobalWindowOuter.cpp:5729)
#03: nsGlobalWindowInner::Open (c:\mozilla-source\comm-central\dom\base\nsGlobalWindowInner.cpp:3794)
#04: mozilla::dom::Window_Binding::open (c:\mozilla-source\comm-central\obj-x86_64-pc-mingw32\dom\bindings\WindowBinding.cpp:2636)
#05: mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::MaybeCrossOriginObjectThisPolicy,mozilla::dom::binding_detail::ThrowExceptions> (c:\mozilla-source\comm-central\dom\bindings\BindingUtils.cpp:3252)
#06: CallJSNative (c:\mozilla-source\comm-central\js\src\vm\Interpreter.cpp:457)
#07: js::InternalCallOrConstruct (c:\mozilla-source\comm-central\js\src\vm\Interpreter.cpp:549)
#08: InternalCall (c:\mozilla-source\comm-central\js\src\vm\Interpreter.cpp:618)
#09: Interpret (c:\mozilla-source\comm-central\js\src\vm\Interpreter.cpp:0)
#10: js::RunScript (c:\mozilla-source\comm-central\js\src\vm\Interpreter.cpp:424)
Flags: needinfo?(matt.woodrow)
Keywords: leave-open
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/ed82c4b6b9ed
disable failing subtest test-newmailaccount::test_window_open_link_opening_behaviour. rs=bustage-fix DONTBUILD
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/f183f40dfbce
followup to make prettier happy (remove a space). rs=bustage-fix

It's a regression from https://hg.mozilla.org/mozilla-central/rev/ba47db006dae9ac8697c58354951d4de90b406e3

The nsWindowWatcher::OpenWindowInternal code calls ProvideWindow to get a new Window object (and usually starts loading the URI). It then explicitly cancelled the load, and then started the load again itself. This had weird race conditions when run across multiple processes.

The patch above removes the explicit cancellation of the load, and changes the Firefox frontend code to not have the implicit initial load.

The assert you're seeing is because an implicit load is still happening, so I think we need similar changes to the Thunderbird frontend.

The main difference is that 'createContentWindow' is not supposed to start loading, and 'openURI' is.

The firefox changes add a new bool parameter to specify this behaviour, and then passes that around to know when to skip loading.

Skimming through the code I suspect it's this [1] called from within createContentWindow, which is initiating a load of about:blank.

[1] https://searchfox.org/comm-central/rev/1d167ecf8b55fa5425102d49172b55ae1a23881f/mail/base/content/specialTabs.js#955

Flags: needinfo?(matt.woodrow)
Whiteboard: [Thunderbird-testfailure: Z all][Thunderbird-disabled-test]

Thanks for the comment, Matt. Sorry it took a while to come back to this.

test_window_open_link_opening_behaviour is notorious for failing since it's the only piece of our code which calls this:
https://searchfox.org/comm-central/rev/a58553160c7536c98db48a12eddbef96f9b274b3/mail/base/content/mailWindow.js#727

That calls our own version of getContentWindowOrOpenURI() and I think that needs to be equipped with the aSkipLoad argument like you have here:
https://searchfox.org/mozilla-central/rev/7536d7f480a7f18c941a590a2d4c5119d9f52770/browser/base/content/browser.js#6472

I'll see what I can whip up on a quiet Saturday evening.

And we finally also have to port bug 1544863.

I tried adding the aSkipLoad parameter, but wasn't successful. The patch has some debug and this is what I see:

=== before open_content_tab_with_click
++DOCSHELL 0000017CF751F800 == 21 [pid = 11236] [id = {9ea8cdb1-06e5-44fe-9c9a-733101715ad4}]
++DOMWINDOW == 95 (0000017CFDAE6880) [pid = 11236] [serial = 126] [outer = 0000000000000000]
=== skip load true
=== NOT calling aTab.browser.loadURI
=== After tabmail.openTab()
++DOMWINDOW == 96 (0000017D26C03800) [pid = 11236] [serial = 127] [outer = 0000017CFDAE6880]
=== no URI, not loading
=== before return
Assertion failure: !chan (Why is there a document channel?), at c:/mozilla-source/comm-central/toolkit/components/windowwatcher/nsWindowWatcher.cpp:996
#01: nsGlobalWindowOuter::OpenInternal (c:\mozilla-source\comm-central\dom\base\nsGlobalWindowOuter.cpp:7264)
#02: nsGlobalWindowOuter::OpenOuter (c:\mozilla-source\comm-central\dom\base\nsGlobalWindowOuter.cpp:5739)
[snip]
#28: mozilla::EventStateManager::DispatchClickEvents (c:\mozilla-source\comm-central\dom\events\EventStateManager.cpp:5101)

I went in with the debugger and got the JS stack at the time of the crash:

0 onclick(event = "[object MouseEvent]") ["http://localhost:43336/registration.html?product=personalized_email&quote=b28acb3c0a464d33af22":1:7]
    this = [object HTMLParagraphElement]
1 synthesizeMouse(aTarget = [cross-compartment wrapper], aOffsetX = "414", aOffsetY = "9", aEvent = "[object Object]", aWindow = [cross-compartment wrapper]) ["resource://testing-common/mozmill/EventUtils.jsm":83:12]
    this = [object Object]
2 MozMillController.prototype.mouseEvent(aTarget = "[object Object]", aOffsetX = "414", aOffsetY = "9", aEvent = "[object Object]", aExpectedEvent = "undefined") ["resource://testing-common/mozmill/controller.jsm":553:15]
    this = [object Object]
3 MozMillController.prototype.click(elem = "[object Object]", left = "414", top = "9") ["resource://testing-common/mozmill/controller.jsm":575:9]
    this = [object Object]
4 wrapperFunc(aElem = "[object Object]") ["resource://testing-common/mozmill/WindowHelpers.jsm":1339:31]
    this = [object Object]
5 open_content_tab_with_click(aElem = [cross-compartment wrapper], aExpectedURL = [function]) ["resource://testing-common/mozmill/ContentTabHelpers.jsm":215:16]
6 test_window_open_link_opening_behaviour() ["file:///c:/mozilla-source/comm-central/comm/mail/test/mozmill/newmailaccount/test-newmailaccount.js":1098:29]

So we're coming from test-newmailaccount.js here:
https://searchfox.org/comm-central/rev/50ac80fa95642644080f7b13b6073d8900d931dc/mail/test/mozmill/newmailaccount/test-newmailaccount.js#1089
and in ContentTabHelpers.jsm we click on a link:
https://searchfox.org/comm-central/rev/50ac80fa95642644080f7b13b6073d8900d931dc/mail/test/mozmill/shared-modules/ContentTabHelpers.jsm#215

That causes a crash. That seems unrelated to createContentWindow() which we have already left, see debug "=== before return" above.

Looking at the comment at the assert site:
https://searchfox.org/mozilla-central/rev/67ecac7a2427b0c32266bd8f118c2a4ee6339a3e/toolkit/components/windowwatcher/nsWindowWatcher.cpp#993
something is still loading while the click happens?

Matt, do you have further insight?

Assignee: nobody → jorgk
Flags: needinfo?(matt.woodrow)

You might need something like this? https://hg.mozilla.org/mozilla-central/rev/ba47db006dae9ac8697c58354951d4de90b406e3#l2.86

I suspect the new browser object is initiating a load to about:blank (the default src, if no src= attribute was provided).

If you add the nodefaultsrc attribute to the browser element (before appending the clone tree into the panel), then I think I should fix this.

Failing that, I added a breakpoint on a channel being added to the docshell and made it print out a backtrace each time. Looking at backtraces printed between your existing printfs should show the one that's at fault here.

Flags: needinfo?(matt.woodrow)

Thanks for the comment, Matt. I've added the browser.setAttribute("nodefaultsrc", "true"); in a spot that I thought might be appropriate.

FYI, I'm really a backend guy and Thunderbird's overall sheriff and the guy who puts code for releases together. All this is well beyond me so I'm relying on "pattern matching" skills here.

If you add the nodefaultsrc attribute to the browser element (before appending the clone tree into the panel), then I think I should fix this.

Hmm, I have no idea were we "append a clone tree into the panel". In specialTabs.js we seem to be loading a URI into a browser element in our tab.

I added a breakpoint on a channel being added to the docshell and made it print out a backtrace each time.

Hmm, should I be seeing more useful debug in a recent build? My build is from Saturday, so perhaps I'm missing something. The debug I get is just the same:
++DOCSHELL 000001FA4334F800 == 22 [pid = 10440] [id = {a8d811ed-5f4b-454a-a097-2798d07d2e10}]
++DOMWINDOW == 100 (000001FA57985A60) [pid = 10440] [serial = 126] [outer = 0000000000000000]
=== Setting nodefaultsrc
=== skip load true
=== NOT calling aTab.browser.loadURI
=== After tabmail.openTab()
++DOMWINDOW == 101 (000001FA56A35400) [pid = 10440] [serial = 127] [outer = 000001FA57985A60]
=== no URI, not loading
=== before return
Assertion failure: !chan (Why is there a document channel?), at c:/mozilla-source/comm-central/toolkit/components/windowwatcher/nsWindowWatcher.cpp:998
#01: nsGlobalWindowOuter::OpenInternal (c:\mozilla-source\comm-central\dom\base\nsGlobalWindowOuter.cpp:7252)
#02: nsGlobalWindowOuter::OpenOuter (c:\mozilla-source\comm-central\dom\base\nsGlobalWindowOuter.cpp:5727)
#03: nsGlobalWindowInner::Open (c:\mozilla-source\comm-central\dom\base\nsGlobalWindowInner.cpp:3791)

Attachment #9104439 - Attachment is obsolete: true

Sorry, you'd need to run with a debugger attached and ask it to break on channel creation and print the backtrace.

Appending the cloned tree into the panel happens here: https://searchfox.org/comm-central/rev/1d167ecf8b55fa5425102d49172b55ae1a23881f/mail/base/content/specialTabs.js#858

I think that's the point where the new browser element will be in the Document, and we'd trigger the default load.

So at some point before that, try adding clone.querySelector("browser").setAttribute("nodefaultsrc", "true");

Sorry, you'd need to run with a debugger attached and ask it to break on channel creation and print the backtrace.

Hmm, I'm on Windows, so debugging there is a little "3rd class" (Holzklasse due to the wooden benches in the old German rail system). It would be useful to know where the channel creation is. I'll try the other hint first.

Attached patch 1588256-nodefaultsrc.patch (obsolete) — Splinter Review

OK, that fixes the test. Looks like we don't need to pass around the skipLoad stuff from the other patch since this needs to be done unconditionally. Adding the attribute only when skipLoad was set didn't help.

Should I still add bits from the other patch?

Attachment #9105387 - Flags: review?(matt.woodrow)

Adding the attribute only when skipLoad was set didn't help.

Hmm, I think I was doing too much multi-tasking last night and confused my self. Here's the more complete patch with skipLoad added and passed around and the nodefaultsrc attribute only added when skipLoad is requested. Test passes with this.

Attachment #9105510 - Flags: review?(matt.woodrow)

I have a "green" (with the expected failures) try for the second patch:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=1d75871fd5a93ca4c049657a0fc23077fbb00013

Attachment #9105510 - Flags: review?(matt.woodrow) → review+
Comment on attachment 9105387 [details] [diff] [review]
1588256-nodefaultsrc.patch

Thanks, Matt. So we won't need this version.
Attachment #9105387 - Attachment is obsolete: true
Attachment #9105387 - Flags: review?(matt.woodrow)
Attachment #9105364 - Attachment is obsolete: true
Target Milestone: --- → Thunderbird 72.0
Attachment #9105510 - Flags: approval-comm-beta+

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/e17835f833d8
Port bug 1578624, part 11: add aSkipLoad paremeter to getContentWindowOrOpenURI(). r=mattwoodrow

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.