Closed Bug 858014 Opened 11 years ago Closed 11 years ago

MailNews 3-pane window broken [msgWindow.rootDocShell is null]

Categories

(Core :: DOM: Navigation, defect)

defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla23

People

(Reporter: tonymec, Assigned: smaug)

References

Details

(Keywords: dogfood, regression)

Attachments

(4 files)

Attached image screenshot
Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0 SeaMonkey/2.20a1 ID:20130403003001 c-c:4ad03358510a m-c:97cfc16ba5dc
is GOOD.

Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0 SeaMonkey/2.20a1 ID:20130404003001 c-c:fbc1cf677a5c m-c:c232bec6974d
is BAD.

3-pane window is broken, see screenshot.
Keywords: regression
Error console message:

Timestamp: 04/04/13 14:39:32
Error: TypeError: msgWindow.rootDocShell is null
Source File: chrome://messenger/content/mailWindow.js
Line: 131
Changes in comm-central between those changesets:
bug 851899, bug 856478, bug 837643, bug 853135, bug 794303, bug 838805

Mozilla-central list follows
bug 784739 sounds like a possible culprit
changeset:   127006:5b2c0b85aea3
user:        matekm <matekm@gmail.com>
date:        Tue Apr 02 20:24:42 2013 -0400
files:       docshell/build/nsDocShellModule.cpp
description:
Bug 784739 - Switch from NULL to nullptr in docshell/; r=ehsan
Wayne: Do you see this in Thunderbird too (new in today's Daily)?
Flags: needinfo?(vseerror)
Usul reports the same problem with Thunderbird (see irc logs for #maildev)
Flags: needinfo?(vseerror)
(In reply to Philip Chee from comment #7)
> Usul reports the same problem with Thunderbird (see irc logs for #maildev)

OK, well, let's throw the hot potato at the Core guys.
Component: MailNews: Message Display → Document Navigation
Product: SeaMonkey → Core
Does comment 5 means that that changeset is confirmed to be the problem?
Flags: needinfo?(antoine.mechelynck)
(In reply to Boris Zbarsky (:bz) from comment #9)
> Does comment 5 means that that changeset is confirmed to be the problem?

If yes, I can back it out immediately and then investigate offline.
BTW: mailWindow.js is in forked source:
mail/base/content/mailWindow.js
suite/mailnews/mailWindow.js
Flags: needinfo?(antoine.mechelynck)
(In reply to Boris Zbarsky (:bz) from comment #9)
> Does comment 5 means that that changeset is confirmed to be the problem?

It means that Ratty and I think it could be a possible culprit. You might compare a Thunderbird try-build without it with today's Daily (which has the bug) and see if it makes a difference.
Well, so...

In my opinion that changeset has an incredibly low chance of being the culprit.

Is someone who has a Tbird build environment in a position to bisect on the m-c changes to cut down the list somewhat?
Flags: needinfo?
(In reply to Boris Zbarsky (:bz) from comment #13)
> Well, so...
> 
> In my opinion that changeset has an incredibly low chance of being the
> culprit.
> 
> Is someone who has a Tbird build environment in a position to bisect on the
> m-c changes to cut down the list somewhat?

Not me. Wayne? Ludovic? (or for SeaMonkey) Philip? Robert?
Flags: needinfo?
Flags: needinfo?
P.S. What I can do is test the already existing hourly builds.
Flags: needinfo?
Flags: needinfo?
Mconley could you do that ?
Flags: needinfo?
If there are hourly builds across m-c or better yet inbound changesets that would be testable, that would be very welcome!
Bisecting. I'll report back when I know more.
The above are existing hourly builds of SeaMonkey. mconley might refine it still further if he can build, or compare with Thunderbird hourlies.
OK, that seems to be this push: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=475dc5f51bdb&tochange=445d8eecdd80

Nothing really jumps out at me except _maybe_ Bobby's changes...
(In reply to Boris Zbarsky (:bz) from comment #21)
> Nothing really jumps out at me except _maybe_ Bobby's changes...

That would only have an effect if something here were implemented as in-content XBL. Is that the case?
http://hg.mozilla.org/comm-central/file/fbc1cf677a5c/mailnews/base/src/nsMsgWindow.cpp

This seems to be the code for msgWindow.rootDocShell.
nsMsgWindow::GetRootDocShell(nsIDocShell * *aDocShell)
{
  if (mRootDocShellWeak)
    CallQueryReferent(mRootDocShellWeak.get(), aDocShell);
  else
    *aDocShell = nullptr;
  return NS_OK;
}

And this is the caller: http://hg.mozilla.org/comm-central/file/fbc1cf677a5c/mail/base/content/mailWindow.js#l107
For SeaMonkey, the same function is called in the same way at http://hg.mozilla.org/comm-central/file/fbc1cf677a5c/suite/mailnews/mailWindow.js#l131
Can somebody try backing out Bobby's changesets in that range and see if that fixes the problem?
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #25)
> Can somebody try backing out Bobby's changesets in that range and see if
> that fixes the problem?

Sure, on it.
I have just built TB from trunk and can not see the breakage, without reverting anything.
(In reply to :aceman from comment #27)
> I have just built TB from trunk and can not see the breakage, without
> reverting anything.

ah, that's strange, Usul and I could. What OS are you on? Can you see the bug with a Mozilla build? (e.g. from http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-central/ )? If you can, there must be something different, maybe in your mozconfig.
I'm on linux 32bit. Yes, I can see the problem with a TB23 build from that link.
Yes, I have many things disabled in mozconfig.
(In reply to :aceman from comment #29)
> I'm on linux 32bit. Yes, I can see the problem with a TB23 build from that
> link.
> Yes, I have many things disabled in mozconfig.

Aha! Maybe you could attach your mozconfig to this bug so developers could see the differences. I guess the bug is just in one of the things you disabled or changed.
Keywords: dogfood
Setting platform to "All" given that today's SeaMonkey 2.20a1 Windows trunk nightly build has an equally broken 3-pane Mail & News window.
OS: Linux → All
Hardware: x86_64 → All
I've been locally backing out bholley's patches, compiling, and testing. Still busted. About half-way through those patches - I should hopefully know by tonight if they're the cause.
Bisecting bholley's patches, I should say
OS: All → Linux
Hardware: All → x86_64
Duplicate bug 858221 is for the same problem on Mac OS X.
OS: Linux → All
Hardware: x86_64 → All
Still no clues? Has anybody tried my mozconfig?
(In reply to :aceman from comment #37)
> Still no clues? Has anybody tried my mozconfig?

There's too much in the mozconfig to try evaluating anything.

I'm checking I can see the regression range on my machine, then I'll get an auto-bisect running, so expect something in a few hours.
These are the culprits, according to my bisection:

changeset:   127556:9d72648fb769
user:        Eitan Isaacson <eitan@monotonous.org>
date:        Wed Apr 03 15:13:16 2013 -0700
summary:     Bug 525444 - (Part 1/3) Basic SpeechSynthesis setup and voice registration. r=smaug

changeset:   127557:408e6df83e2c
user:        Eitan Isaacson <eitan@monotonous.org>
date:        Wed Apr 03 15:13:17 2013 -0700
summary:     Bug 525444 - (Part 2/3) Added speech service API. r=smaug

changeset:   127558:ddaf0ebcd927
user:        Eitan Isaacson <eitan@monotonous.org>
date:        Wed Apr 03 15:13:17 2013 -0700
summary:     Bug 525444 - (Part 3/3) Support OOP speech synth. r=smaug
(In reply to Mike Conley (:mconley) from comment #39)
> These are the culprits, according to my bisection:

I believe Eitan is out for the next week or two :(

btw this is a kindof  crazy regression but believable I guess.
Really really odd regression range.
So, what's the planned course of action? Backing out the changesets in question on mozilla-central to see if it fixes things, or is someone trying to work out a solution on the comm-central side?

It would be nice to get the TB/SM trunk builds working again...
FYI, I don't expect to have time to look at this before the next week.

Do we get any assertions in debug builds?
(In reply to Olli Pettay [:smaug] from comment #43)
> FYI, I don't expect to have time to look at this before the next week.
> 
> Do we get any assertions in debug builds?

I don't see any on the mozmill tests that are triggered before we discover that the docshell is null and proceed to screw everything up.
Oh interesting. I just noticed that this only failing in opt builds, not in debug builds.
Correction:

It fails on Linux opt, Linux 64 opt, OS X opt and debug but not Linux or Linux 64 debug (even when built from source recently). I can't tell about Windows since we have unrelated errors there.
I am seeing this breakage on Linux debug (32bit, Ubuntu).

Can anyone else confirm that 525444 is causing this? Should we back that out until we can solve this?
I'm attempting two things on try server:

1) Add the .xpt that was added in bug 525444 to the packaging manifest:

https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=7e1e985cbebe

2) Build Thunderbird with the mozilla-central revision before bug 525444 landed:

https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=cffecd68365e
3) fix https://bugzilla.mozilla.org/show_bug.cgi?id=525444#c69 which should be trivial and 
compile SM and TB with --disable-webspeech
(In reply to Mark Banner (:standard8) from comment #50)
> I'm attempting two things on try server:
> 
> 1) Add the .xpt that was added in bug 525444 to the packaging manifest:
> 
> https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=7e1e985cbebe

This one missed dom_webspeechrecognition as well, which is wrapped on this same configure arg.
Setting Hardware/OS to All/All in accordance with comment #47 and 48.
oh? Wrong display again. Sorry.
Summary: 3-pane window broken → MailNews 3-pane window broken [msgWindow.rootDocShell is null]
I reported this same issue as Bug ID: 858951
           Summary: Seamonkey Email component does not display any content
                    at all

In my case, the platform is Linux i686; was using Seamonkey Nightly.  Confirmed builds work fine up to and including 04-03-2013; the 04-04-2013 Seamonkey Nightly introduces the issues that I see (no viewable Email or folders), and none of the Seamonkey builds after that point that I've tried so far work; the 04-03-2013 build DOES work for me.
Try server confirms that Bug 525444 is the cause of this bug.

Adding the missing xpt files to the manifest does not fix it.

I'll see if I can do a --disable-webspeech, but that is not a long-term solution to this issue.
Adding --disable-webspeech did not build for me, but maybe you have more success.
Ditto, I got:

No IDL file found for interface nsIDOMSpeechSynthesisGetter in include path ['../../../dist/idl']
make[6]: *** [dom_quickstubs.cpp] Error 1
make[6]: *** Waiting for unfinished jobs....
/media/Projects/mozilla/thunderbird/mozilla/js/xpconnect/src/XPCComponents.cpp: In member function ‘virtual nsresult nsXPCComponents::GetProperty(nsIXPConnectWrappedNative*, JSContext*, JSObject*, jsid, jsval*, bool*)’:
/media/Projects/mozilla/thunderbird/mozilla/js/xpconnect/src/XPCComponents.cpp:4801:14: warning: variable ‘res’ set but not used [-Wunused-but-set-variable]
Creating makedepend file .deps/dictionary_helper_gen.pp
make[5]: *** [libs] Error 2
make[4]: *** [libs_tier_platform] Error 2
make[3]: *** [tier_platform] Error 2
make[2]: *** [default] Error 2
make[1]: *** [default] Error 2
make: *** [build] Error 2
I've reported that in bug 525444 but maybe we need a new bug for that.
Either that or reopen bug 525444. It is my understanding from the comments over there that other issues haven't been addressed and may need follow-up work.
(In reply to :aceman from comment #60)
> I've reported that in bug 525444 but maybe we need a new bug for that.
I'm fixing that build problem while investigating this one. I'll file a new bug.

I went through the patches for bug 525444 and didn't see anything obvious which could have
caused this one.
(In reply to Olli Pettay [:smaug] from comment #62)
> (In reply to :aceman from comment #60)
> > I've reported that in bug 525444 but maybe we need a new bug for that.
> I'm fixing that build problem while investigating this one. I'll file a new
> bug.
Bug 858973
	


--disable-webspeech does indeed fix the problem. Trying to figure out why.
(In reply to Olli Pettay [:smaug] from comment #62)
> (In reply to :aceman from comment #60)
> > I've reported that in bug 525444 but maybe we need a new bug for that.
> I'm fixing that build problem while investigating this one. I'll file a new
> bug.
> 
> I went through the patches for bug 525444 and didn't see anything obvious
> which could have
> caused this one.

Dug into this a little bit - nsMsgWindow::SetRootDocShell is never being called with the web speech patches, which is why rootDocShell is null - it's never set in the first place.
In case it helps, for this call in mailWindow.js:

> msgWindow.domWindow = window;

nsMsgWindow::SetDomWindow is being passed the inner window, not the outer one.

So of course it never finds the root doc shell.
Sorry, I confused myself here, because while of course SetDocShell is only called on outer windows, but GetDocShell should work on the inner window too...
It suddenly dawned on me what was wrong when my debugger gave a different display for win.mRawPtr as compared to (nsPIDOMWindow*)(uintptr_t)win.mRawPtr!

To display win.mRawPtr it used the mail symbols for win.mRawPtr. As mail hasn't heard of MOZ_WEBSPEECH, it doesn't know about the mSpeechSynthesis member.

But the cast caused it to look up the libxul symbol for nsPIDOMWindow, which was compiled with MOZ_WEBSPEECH, so it did have the mSpeechSynthesis member.

Because GetDocShell is inline on nsPIDOMWindow, it needs to have all the correct conditional defines whenever it is included.

So there are a number of alternatives:
1. Move the conditional member to the end of nsPIDOMWindow
2. Change comm-central to detect whether webspeech is enabled
3. Change comm-central not to use nsPIDOMWindow
4. Change comm-central to build as a submodule of mozilla-central
(In reply to neil@parkwaycc.co.uk from comment #67)
> So there are a number of alternatives:
> 1. Move the conditional member to the end of nsPIDOMWindow
> 2. Change comm-central to detect whether webspeech is enabled
> 3. Change comm-central not to use nsPIDOMWindow
> 4. Change comm-central to build as a submodule of mozilla-central

5. Check bug 846540 in as fast as possible?
6. Move mSpeechSynthesis; to nsGlobalWindow
In reply to comment #67: IIUC, option 4 is bug 648979 and I suppose it will happen someday, but it'll need more than just a snap of the fingers, so we'll need some other interim solution to get the mailer working again.
I do 6. then.
We've got review on bug 846540 now so we should be able to fix with 5 either today or tomorrow so no need to do 6 unless core desires it.
Attached patch approach 6Splinter Review
Assignee: nobody → bugs
Attachment #734365 - Flags: review?(neil)
I think we want approach 6 anyway.
Comment on attachment 734365 [details] [diff] [review]
approach 6

Bah, I had just finished writing and testing a patch too!

Except I forgot to move the forward declaration or revise the IID.

And my declarations were in slightly different places.

But at least my nsGlobalWindow.cpp changes were the same ;-)
Attachment #734365 - Flags: review?(neil) → review+
(In reply to Joshua Cranmer from comment #68)
> 5. Check bug 846540 in as fast as possible?

Part 2 looks particularly useful!

(In reply to Olli Pettay from comment #69)
> 6. Move mSpeechSynthesis; to nsGlobalWindow

Indeed, it seems strange to see something with an IID having an #ifdef ;-)
(In reply to Olli Pettay [:smaug] from comment #77)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/d510b884c45f

hm, I just noticed that there even exist hourly builds from mozilla-inbound :-)
… but only of Firefox, so not very useful to test a bug which appears in mailnews :-(
https://hg.mozilla.org/mozilla-central/rev/d510b884c45f
https://hg.mozilla.org/mozilla-central/rev/8739c3bd5734
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Verified fixed for SeaMonkey 2.20a1 Win32 tinderbox build #1365368630, opening Mail & News shows a functional 3-pane window again, sending+receiving works.

I'll leave it up to others to mark this verified after testing on Linux and Mac.
Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0 SeaMonkey/2.20a1 ID:20130408003001 c-c:6fc0ac415187 m-c:b0d842380959

The bug has disappeared from this nightly, made from a changeset one later than the newer one of comment #80 (thanks Philor for pushing them :-) ). For some reason I don't see this build on the SeaMonkey tbpl page, http://tbpl-dev.callek.net/

However, for such a severe bug as this one, I'd rather see confirmations from the Thunderbird side, and from non-Linux platforms, before I set VERIFIED.
Keywords: verifyme
See Also: → 859246
Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Thunderbird/23.0a1 ID:20130408030555 c-c:6fc0ac415187 m-c:b0d842380959

This Thunderbird build does not suffer from the bug. A test with a Mac would still be desirable.
Status: RESOLVED → VERIFIED
Keywords: verifyme
(In reply to Tony Mechelynck [:tonymec] from comment #83) 
> A test with a Mac would still be desirable.

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Thunderbird/23.0a1 works. :-)
Great cooperative work, blazing speed - thanks! :)

I'm trying to imagine how great Thunderbird could be if we tackled some more bad bugs (or even papercuts) in the same manner, a concerted effort of cooperative brain power focussed on one bad issue at a time, and fixing it with the speed, precision and blazing power of a cutting torch. This only took 4 days to narrow down and fix, let's say if at half the speed we could fix something like 1 bug per week, that would still be an amazing 50 power bugs per year. Imagine... |)
Thanks Sören; I suppose you used the latest Daily? (May the fingerprint paranoids rot in hell, who removed all build identification from the UA string; and long live the Nightly Tester Tools extension.)
OT: (In reply to Thomas D. from comment #85)
> Great cooperative work, blazing speed - thanks! :)
> I'm trying to imagine how great Thunderbird could be if we tackled some more
> bad bugs (or even papercuts) in the same manner

Of course we would no longer expect Mozilla (FF) employees to fix things for TB as such, but it would be equally great to see this kind of energy go into fixing some relevant and/or long-standing bugs which actually fall into the domain of FF (e.g. Core:XUL or Core:Layout bugs like bug 80713), and incidentally, TB would benefit greatly from having them fixed... ;)
In reply to comment #85 and #87: This is a strange case of a bug which was created and fixed by changes in Core source files (and in sppech synthesis, in fact!) but whose effect was to make both SeaMonkey and Thunderbird mailers totally unusable; so of course developers and testers from all three of Core, Thunderbird and SeaMonkey (including some whose areas of expertise overlapped several of these) cooperated. This is Mozilla bug-fixing at its best.
(In reply to Tony Mechelynck [:tonymec] from comment #86)
> Thanks Sören; I suppose you used the latest Daily? (May the fingerprint
> paranoids rot in hell, who removed all build identification from the UA
> string; and long live the Nightly Tester Tools extension.)

Oh, yes. The latest Daily. ;)

with Nightly Tester Tools:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Thunderbird/23.0a1 ID:20130408030555 CSet: 6fc0ac415187
Tinderbox build:

User agent: Mozilla/5.0 (X11; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0 SeaMonkey/2.20a1
Build identifier: 20130408003001

Tinderbox build works for me now; has the fix made it to a working nightly build yet?  That's when I consider it "fixed" from my perspective.
Now confirmed in the latest nightly build as well:


User agent: Mozilla/5.0 (X11; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0 SeaMonkey/2.20a1
Build identifier: 20130408003001
You need to log in before you can comment on or make changes to this bug.