Closed Bug 1595747 Opened 5 months ago Closed 4 months ago

TEST-UNEXPECTED-FAIL | [snip]/mozmill/composition/test-attachment.js | test-attachment.js::test_open_attachment - Can't open attachment for attachment bucket in compose window

Categories

(Thunderbird :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 72.0

People

(Reporter: jorgk-bmo, Assigned: jorgk-bmo)

References

(Regression)

Details

(Whiteboard: [Thunderbird-testfailure: Z all][Thunderbird-disabled-test])

Attachments

(1 file, 2 obsolete files)

First seen:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=a9f7c4aa047f78a9f6f21e9206db567de8196f75&selectedJob=275788024

I ran it manually and got an unexpected prompt to for a draft to "Save this message - Save/Discard/Cancel".

Clicking on Discard makes the test run through and it fails in the end:

SUMMARY-UNEXPECTED-FAIL | c:\mozilla-source\comm-central\comm\mail\test\mozmill\composition\test-attachment.js | test-attachment.js::test_open_attachment
  EXCEPTION: Timeout waiting for modal dialog to open.
    at: utils.jsm line 390
       TimeoutError utils.jsm:390 13
       waitFor utils.jsm:446 11
       WindowWatcher_waitForModalDialog WindowHelpers.jsm:419 11
       wait_for_modal_dialog WindowHelpers.jsm:680 17
       test_open_attachment test-attachment.js:316 24

I also see:
JavaScript error: resource://testing-common/mozmill/utils.jsm, line 429: uncaught exception: 2153578497
JavaScript error: resource://testing-common/mozmill/WindowHelpers.jsm, line 402: Error: Timeout while waiting for modal dialog.

That error is: 0x805D0001 - NS_ERROR_WONT_HANDLE_CONTENT.

I don't see anything we could have caused ourselves here, the M-C range is:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=49320c7fe8b3a547856dc67e7d57a64db6&tochange=5f0b392beadb7300abdaa3e5e1cc1c0d5a

The code that fails is this:

  plan_for_modal_dialog("unknownContentType", subtest_open_attachment);
  cwc.doubleClick(new elib.Elem(node));
  wait_for_modal_dialog("unknownContentType");

so I guess we just don't get the unknownContentType dialog any more due to some M-C changes.

Aceman, could you look into this one for us?

Flags: needinfo?(acelists)
Summary: TEST-UNEXPECTED-FAIL | /builds/worker/workspace/build/tests/mozmill/composition/test-attachment.js | test-attachment.js::test_open_attachment → TEST-UNEXPECTED-FAIL | [snip]/mozmill/composition/test-attachment.js | test-attachment.js::test_open_attachment
Keywords: leave-open
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/70edaf9ceae6
Temporarily disable failing test-attachment.js::test_open_attachment. rs=bustage-fix

I debugged this and it is caused by bug 1589270.
Test passes with m-c revision 501539 and fails with 501541 (all 3 parts of bug 1589270 applied).
The bug does touch nsExternalHelperAppService which could explain the TB problem.
The test fails inside of https://searchfox.org/comm-central/source/mozilla/toolkit/mozapps/downloads/HelperAppDlg.jsm#146 with an error like this:
Missing window information when showing nsIHelperAppLauncherDialog: TypeError: aContext is null HelperAppDlg.jsm:156

So are we not passing the aContext? But the patches in bug 1589270 are cpp only so what do we need to update in TB?

Flags: needinfo?(matt.woodrow)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(acelists)
Regressed by: 1589270

Thanks, Aceman. Let let me just clarify the "changeset numbers".

Test passes with m-c revision 501539 and fails with 501541 (all 3 parts of bug 1589270 applied).

Our test passes with part one applied, but not with all three parts applied. Two and three are CPP only.

It also works with m-c revision 501540, so it is part 3 of bug 1589270 (https://hg.mozilla.org/mozilla-central/rev/baffa8a7b9618e9a02e4bd808adb71dead99cce5) that is breaking us.

So are we not passing the aContext? But the patches in bug 1589270 are cpp only so what do we need to update in TB?

That is a good question. Presumably the relevant call to Show() is at https://searchfox.org/mozilla-central/rev/e7c61f4a68b974d5fecd216dc7407b631a24eb8f/uriloader/exthandler/nsExternalHelperAppService.cpp#1689? It would be good to verify whether it is.

If it is, then it sounds like GetDialogParent returned null. And if that's the case, it's worth trying to figure out why. Most importantly, what are mWindowContext and mBrowsingContext in this case in https://searchfox.org/mozilla-central/rev/e7c61f4a68b974d5fecd216dc7407b631a24eb8f/uriloader/exthandler/nsExternalHelperAppService.cpp#1495-1513?

Flags: needinfo?(bzbarsky) → needinfo?(acelists)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #5)

That is a good question. Presumably the relevant call to Show() is at https://searchfox.org/mozilla-central/rev/e7c61f4a68b974d5fecd216dc7407b631a24eb8f/uriloader/exthandler/nsExternalHelperAppService.cpp#1689? It would be good to verify whether it is.

Yes, wrapping some debug around that show() printing the 2nd and 3rd argument call gives:
JavaScript error: resource://testing-common/mozmill/utils.jsm, line 429: uncaught exception: 2153578497
=== GetDialogParent 0000000000000000 0000000000000000
=== before show 0000000000000000 0
=== after show

If it is, then it sounds like GetDialogParent returned null. And if that's the case, it's worth trying to figure out why. Most importantly, what are mWindowContext and mBrowsingContext in this case in https://searchfox.org/mozilla-central/rev/e7c61f4a68b974d5fecd216dc7407b631a24eb8f/uriloader/exthandler/nsExternalHelperAppService.cpp#1495-1513?

They are both null. Where from here? Note that the exception 2153578497 = 0x805D0001 - NS_ERROR_WONT_HANDLE_CONTENT seems to be printed before we get to the show call.

Flags: needinfo?(matt.woodrow)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(acelists)

They are both null. Where from here?

Figure out who created this nsExternalAppHandler and why they did not give it any sort of window context or browsing context?

seems to be printed before we get to the show call.

It's hard to reason about code I don't get to see (whatever code adds the logging you are looking at); even harder than reasoning about this helper app code to start with. And that's bad enough....

Note that there are presumably two nsExternalAppHandler objects involved here: one in the content process and one in the parent. At least in e10s there are. Thunderbird doesn't run e10s, so I really don't know quite what the setup there is...

One thing worth checking is what the aBrowsingContext passed to the nsExternalAppHandler constructor is, and then what happens after the mBrowsingContext = mMaybeCloseWindowHelper->MaybeCloseWindow() call in nsExternalAppHandler::OnStartRequest (which runs in Thunderbird, since there is no content process).

Flags: needinfo?(bzbarsky)

Sorry, here are the snippets:

nsExternalAppHandler::GetDialogParent() {
  printf("=== GetDialogParent %p %p\n", (void*)mWindowContext, (void*)mBrowsingContext);
    // this will create a reference cycle (the dialog holds a reference to us as
    // nsIHelperAppLauncher), which will be broken in Cancel or CreateTransfer.
    nsCOMPtr<nsIInterfaceRequestor> dialogParent = GetDialogParent();
    printf("=== before show %p %d\n", (void*)dialogParent, mReason);
    rv = mDialog->Show(this, dialogParent, mReason);
    printf("=== after show\n");

Well look at it further another day. Thanks for the hints.

printf("=== GetDialogParent %p %p\n", (void*)mWindowContext, (void*)mBrowsingContext);

Should probably be:

printf("=== GetDialogParent %p %p\n", mWindowContext.get(), mBrowsingContext.get());

to be sure it's getting the pointer values right... But I suspect in practice it may not matter.

Attached patch debug-open-attach.patch (obsolete) — Splinter Review

Patch to just run the one subtest.

Note that the test fails before we get the "save draft" prompt. I'm going to check what's happening here:
JavaScript error: resource://testing-common/mozmill/utils.jsm, line 429: uncaught exception: 2153578497

Assignee: nobody → jorgk

I've just noticed that opening an attachment doesn't work any more: Start a composition, add a text document, double-click it: Nothing happens.

Summary: TEST-UNEXPECTED-FAIL | [snip]/mozmill/composition/test-attachment.js | test-attachment.js::test_open_attachment → TEST-UNEXPECTED-FAIL | [snip]/mozmill/composition/test-attachment.js | test-attachment.js::test_open_attachment - Can't open attachment for attachment bucket in compose window
Attached patch debug-open-attach.patch (obsolete) — Splinter Review

Updated debug patch. I see:
== JS stack> resource://gre/modules/MailContentHandler.jsm (56)
== JS stack> resource://testing-common/mozmill/utils.jsm (430)
== JS stack> resource://testing-common/mozmill/WindowHelpers.jsm (419)
== JS stack> resource://testing-common/mozmill/WindowHelpers.jsm (680)
== JS stack> file:///c:/mozilla-source/comm-central/comm/mail/test/mozmill/composition/test-attachment.js (317)
== JS stack> resource://testing-common/mozmill/frame.jsm (607)
== JS stack> resource://testing-common/mozmill/frame.jsm (686)
== JS stack> resource://testing-common/mozmill/frame.jsm (728)
== JS stack> resource://testing-common/mozmill/frame.jsm (576)
== JS stack> resource://testing-common/mozmill/frame.jsm (739)
== JS stack> chrome://jsbridge/content/modules/server.js (215)
== JS stack> chrome://jsbridge/content/modules/server.js (220)
== JS stack> chrome://jsbridge/content/modules/server.js (359)
== JS stack> chrome://jsbridge/content/modules/server.js (359)
== JS stack> chrome://jsbridge/content/modules/server.js (103)
== MailContentHandler.jsm throwing NS_ERROR_WONT_HANDLE_CONTENT (3)
JavaScript error: resource://testing-common/mozmill/utils.jsm, line 430: uncaught exception: 2153578497

Another version: We get called in handleContent() for URI
file:///c:/mozilla-source/comm-central/comm/mail/test/mozmill/composition/attachment.txt
and we return NS_ERROR_WONT_HANDLE_CONTENT so the Mozilla platform code can handle it.

But somehow it doesn't.

Attachment #9109236 - Attachment is obsolete: true
Attachment #9109238 - Attachment is obsolete: true

So if I try to open an attachment from the compose window's attachment bucket, I get this in the console:
uncaught exception: 2153578497
Missing window information when showing nsIHelperAppLauncherDialog: TypeError: aContext is null HelperAppDlg.jsm:156
show resource://gre/modules/HelperAppDlg.jsm:156

In the debug console I see:
== MailContentHandler.jsm throwing NS_ERROR_WONT_HANDLE_CONTENT (3) file:///D:/Desktop/Eres%20t%C3%BA%20-%20letra.txt
JavaScript error: , line 0: uncaught exception: 2153578497
=== GetDialogParent 0000000000000000 0000000000000000
=== before show 0000000000000000 0

With the totally weird "JavaScript error: , line 0: uncaught exception: 2153578497". Something has totally gone wrong here.

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #7)

Figure out who created this nsExternalAppHandler and why they did not give it any sort of window context or browsing context?

OK, I did this debugging session. I created a mail messages, attached a text file and double-clicked it. This is what happens.

First we come to here:
https://searchfox.org/mozilla-central/rev/492214c05cde6e6db5feff9465ece4920400acc3/uriloader/base/nsURILoader.cpp#477
calling HandleContent().

That calls into the TB handler and we return NS_ERROR_WONT_HANDLE_CONTENT (and also causing the weird console line "JavaScript error: , line 0: uncaught exception: 2153578497", but let's ignore that).

Next we get here:
https://searchfox.org/mozilla-central/rev/492214c05cde6e6db5feff9465ece4920400acc3/uriloader/base/nsURILoader.cpp#598
calling DoContent() with a nullptr for aWindowContext.

Then we get here:
https://searchfox.org/mozilla-central/rev/492214c05cde6e6db5feff9465ece4920400acc3/uriloader/base/nsURILoader.cpp#304
calling OnStartRequest().

That takes us to nsExternalAppHandler::OnStartRequest() and to
https://searchfox.org/mozilla-central/rev/492214c05cde6e6db5feff9465ece4920400acc3/uriloader/exthandler/nsExternalHelperAppService.cpp#1688
where we call GetDialogParent() and we already know that that finds mWindowContext and mBrowsingContext to be null. I guess the windows context was passed in null from above, not sure where the browsing context comes from.

To me this looks like a bug in the Mozilla platform.

Where from here?

EDIT: As we know, https://hg.mozilla.org/mozilla-central/rev/baffa8a7b961 broke our test (verified again via backout), so there must be something in there that isn't quite right.

Flags: needinfo?(bzbarsky)

That calls into the TB handler and we return NS_ERROR_WONT_HANDLE_CONTENT

OK, that makes sense. I assume all this is happening in the single process Thunderbird runs, right?

calling DoContent() with a nullptr for aWindowContext.

nullptr for aWindowContext (coming from m_originalContext) seems a bit odd here. What happened here before part 3 of bug 1589270? Was m_originalContext null then too? Because nothing in https://hg.mozilla.org/mozilla-central/rev/baffa8a7b961 touches m_originalContext directly that I can see... If m_originalContext used to be null back then as well, then what did GetDialogParent() return at https://searchfox.org/mozilla-central/rev/aa0bf8a23317ce1d4e34479fbc79a86fd0579ced/uriloader/exthandler/nsExternalHelperAppService.cpp#1636

Flags: needinfo?(bzbarsky) → needinfo?(jorgk)

Yes, I assume it all happens in the single process.

You really want me to back that changeset out again and wait 10 minutes for compilation? And then another 10 minutes to reverse the backout?

GetDialogParent() used to look like: https://hg.mozilla.org/mozilla-central/rev/baffa8a7b961#l14.60

Since mWindowContext was null (passed in) it must have returned mContentContext. That's been removed now and replaced by mBrowsingContext which appears to be null more often than the original mContentContext.

Flags: needinfo?(jorgk)

You really want me to back that changeset out again and wait 10 minutes for compilation?

Compile with backout in progress.

You really want me to back that changeset out again and wait 10 minutes for compilation? And then another 10 minutes to reverse the backout?

Well, are you expecting me to do that work, plus taking time to sort out how to build/run Thunderbird, etc, etc?

Since mWindowContext was null (passed in)

This is a thing that might be worth verifying. If we figure out what's going on without that, fine.

That's been removed now and replaced by mBrowsingContext which appears to be null more often than the original mContentContext.

OK, that's something you can check pretty easily without rebuilding, I suspect. When nsExternalHelperAppService::DoContent happens, what is aContentContext? Is it null? If it is non-null, what sort of object is it? What is the window we end up getting from it? What is the bc returned from that window if it's non-null? It looks like the old code just used aContentContext directly and now we get a BrowsingContext from that, so maybe the issue is what exactly aContentContext is here.

Flags: needinfo?(jorgk)

Ah, I guess I misread the code in comment 16, my apologies. Null aWindowContext is very much expected based on the callsite at https://searchfox.org/mozilla-central/rev/524bed6dfbc5ae21c62632d83b7573448b29e0ac/uriloader/base/nsURILoader.cpp#598-600 and was expected before the changes in question too. For m_originalContext, the thing being passed in is aContentContext and should generally not be null.

Now in Firefox m_originalContext is always a docshell, afaict, so getting a window from it and then a BrowsingContext from the window works. The question is what it is in Thunderbird in this case.

OK, rebuild took ten minutes, I had already done it yesterday. I'll now dump out what happens in GetDialogParent().

How do I answer "what sort of object is it"?

Flags: needinfo?(jorgk)

OK, as I expected, in the old version of GetDialogParent(), mWindowContext is null and mContentContext isn't.

It can be set in SetContentContext(), the CTOR of nsExternalAppHandler and here mContentContext = mMaybeCloseWindowHelper->MaybeCloseWindow();.

Quickly poking around in Thunderbird code, there are definitely several places where the thing passed in is some random interface requestor, not a docshell. Specifically:

  1. nsURLFetcher::FireURLRequest passes this. Its getInterface returns nothing much useful other than calling QI, but it does QI to various things.

  2. The call at https://searchfox.org/comm-central/rev/496170162015a8c4a4087c18479279b4ce3b2cbf/mail/components/MessengerContentHandler.jsm#146 passes listener, which has a getInterface that can return loadgroup or nsIURIContentListener.

  3. https://searchfox.org/comm-central/rev/496170162015a8c4a4087c18479279b4ce3b2cbf/mail/components/compose/content/MsgComposeCommands.js#6072 passes an AttachmentOpener, which can in fact return an nsIDOMWindow or nsIDocShell or itself as nsIURIContentListener.

I suspect that we are in case 3 here, but since the code in DoContent gets nsPIDOMWindowOuter, not nsIDOMWindow, we get back null and can't find the relevant BrowsingContext even though there really is one here.

OK, so what does the helper app dialog need this context for? In show() it tries to getInterface() an nsIDOMWindow from it, which would have failed in cases 1 and 2 anyway, but worked in case 3. In reallyShow() it tries to getInterface an nsIDocShell from it, which likewise would have failed in cases 1 and 2 but succeeded in case 3.

It looks like HelperAppDlg.jsm already treats null aContext just like an aContext that does not provide the relevant interfaces, so cases 1 and 2 were already not so great there.

So in terms of the narrow needs of Thunderbird, I think this would be no worse than before if we change nsExternalHelperAppService::DoContent to getInterface to nsIDOMWindow and then do a QI to get the nsPIDOMWindowOuter, along with a comment explaining why we are doing that: scripted interface requestors cannot return an instance of the (non-scriptable) nsPIDOMWindowOuter interface.

Ideally, we would move away from nsIInterfaceRequestor here in general, and to something that actually provides the things we care about. But for now, that should do the trick, I would think.

How do I answer "what sort of object is it"?

It depends on the debugger you are using. For gdb, you would "set print object on" and then look at the object, but for other debuggers, I don't know offhand.

Re. comment #24: Windows, MSVC++ with clang-cl compiled code. It couldn't be worse :-( - I'll look at comment #23 now.

Re. comment #23: You mean this? Not working. mBrowsingContext is still null in GetDialogParent().

-  nsCOMPtr<nsPIDOMWindowOuter> window = do_GetInterface(aContentContext);
+  // Scripted interface requestors cannot return an instance of the
+  // (non-scriptable) nsPIDOMWindowOuter interface, so we go via nsIDOMWindow.
+  nsCOMPtr<nsIDOMWindow> domWindow = do_GetInterface(aContentContext);
+  nsCOMPtr<nsPIDOMWindowOuter> window = do_QueryInterface(domWindow);

printf("=== DoContent %p %p %p %p\n", (void*)aContentContext, (void*)domWindow, (void*)window, (void*)bc);
=== DoContent 0000014E32ED9280 0000014E2D88C4F0 0000000000000000 0000000000000000

So I can't get the nsPIDOMWindowOuter.

Hmm. I guess by the time this wends its way down through xpconnect we may have an inner window coming out of the do_GetInterface call. Or we may have an outer one.

How about something like this:

  // Scripted interface requestors cannot return an instance of the
  // (non-scriptable) nsPIDOMWindowOuter interface, so get to the
  // window via `nsIDOMWindow`.  Unfortunately, at that point we 
  // don't know whether the thing we got is an inner or outer window,
  // so have to work with either one.
  RefPtr<BrowsingContext> bc;
  nsCOMPtr<nsIDOMWindow> domWindow = do_GetInterface(aContentContext);  
  if (nsCOMPtr<nsPIDOMWindowOuter> outerWindow = do_QueryInterface(domWindow)) {
    bc = outerWindow->GetBrowsingContext();
  } else if (nsCOMPtr<nsPIDOMWindowInner> innerWindow = do_QueryInterface(domWindow)) {
    bc = innerWindow->GetBrowsingContext();
  }

Yes, that fixes GetDialogParent() and makes the test pass and let's me open attachments in the compose window again.

I can submit this as a patch, but I don't feel that would be right since I didn't write any of it really. Maybe you can submit it. If you do, please fix the first badly indented lines in nsExternalHelperAppService::DoContent() and also make the lines in your patch fix fit the 80 char limit (or run clang-format).

Yes, that fixes GetDialogParent() and makes the test pass and let's me open attachments in the compose window again.

Great.

I filed bug 1597175 to track the Gecko-side changes, so this bug can track backing out the Thunderbird workarounds or whatnot.

And yes, I know how to format a patch, thank you. ;)

Depends on: 1597175

Thanks a lot for your help, Boris!

BTW, I wasn't implying that you needed help with formatting a patch ;-) - It's just that - at least on our team - some undesirable formatting slips in occasionally since no one noticed an over-long line or some such.

Keywords: leave-open
Target Milestone: --- → Thunderbird 72.0

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/00734bc78bf3
Backed out changeset 70edaf9ceae6 to re-enable test. a=backout

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.