Closed Bug 1292128 Opened 3 years ago Closed 3 years ago

TypeError: menuPopup is undefined in browser/base/content/test/general/browser_page_style_menu.js

Categories

(L20n :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: stas, Assigned: wchen)

References

Details

(Whiteboard: [gecko-l20n])

Attachments

(1 file, 2 obsolete files)

The try runs in bug 1288406 consistently failed at bc7 in Linux 64 opt and debug builds.

    https://treeherder.mozilla.org/#/jobs?repo=try&revision=37963736ff85&selectedJob=25061825

Here's the error output:

447 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_page_style_menu.js | Uncaught exception - at chrome://browser/content/browser.js:5709 - TypeError: menuPopup is undefined
450 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_page_style_menu.js | Found an unexpected tab at the end of test run: http://example.com/browser/browser/base/content/test/general/page_style_sample.html -
461 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_page_style_menu_update.js | Uncaught exception - at chrome://browser/content/browser.js:5709 - TypeError: menuPopup is undefined
464 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_page_style_menu_update.js | Found an unexpected tab at the end of test run: http://example.com/browser/browser/base/content/test/general/page_style_sample.html -
See Also: → 1288406
I can't repro locally, but I found this in the log:

06:58:17     INFO -  JavaScript error: chrome://global/content/l20n-perf-monitor.js, line 12: NS_ERROR_UNEXPECTED: Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIDOMWindowUtils.lastTransactionId]

So I wrapped this line in a try and launched:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cd36d5bf64a8ba3dfb35c1de527b8880c9ef1451
Assignee: nobody → gandalf
Didn't help :(
Maybe this is related to timing of XBL bindings?

Saying that because https://dxr.mozilla.org/mozilla-central/source/browser/base/content/test/general/browser_page_style_menu.js#38 puts XBL on the code path.

Or that loading order changed and the popup is there, but its children aren't.

https://dxr.mozilla.org/mozilla-central/source/toolkit/content/widgets/menu.xml#150 is the xbl property.
I was able to run the bc7 chunk and the browser_page_style_menu.js test in an interactive taskcluster loaner (after reporting bug 1311723 and bug 1312739).

When run as a chunk, the test fails (as expected).  When run individually, it passes.
Dao, you seem to be the author of this test (you landed it in bug 733339).

We're stuck with it failing on our branch and kind of at a dead end with ideas on how to fix it.

What we know is that:

1) The test passes when launched on its own
2) The test fails on our branch when launched in a bc7 chunk
3) The test passes on mozilla-central when launched in a bc7 chunk
4) Looking into the failure, it seems to me that the line that fails is:

http://hg.mozilla.org/mozilla-central/file/tip/browser/base/content/test/general/browser_page_style_menu.js#l39

where the argument passed to fillPopup is empty.

You can see the latest build result with bc7 failing here: https://treeherder.mozilla.org/#/jobs?repo=larch&revision=d7d142a7614746f609a0997bb70bc7cfae04915d

Is there any chance you can help us find a lead?
I'm mostly stuck because I can't find a way to reproduce this bug locally because of (1).
Flags: needinfo?(dao+bmo)
I've instrumented the menu item binding in xml with https://treeherder.mozilla.org/#/jobs?repo=try&revision=ebc1c8ea27ce&selectedJob=30252918, and in the logs, the message from the xbl binding doesn't show up.

When I run the test locally, it does.

To me, this is an even stronger indication that the xbl isn't bound to the XUL element when we run the test like this.

Dao, feel free to punt this if you don't have a good idea. We're in dire need for good ideas, though.
(In reply to Axel Hecht [:Pike] from comment #3)
> Maybe this is related to timing of XBL bindings?
> 
> Saying that because
> https://dxr.mozilla.org/mozilla-central/source/browser/base/content/test/
> general/browser_page_style_menu.js#38 puts XBL on the code path.

Should be easy enough to test this hypothesis (and possibly fix the failure) by replacing:

  let menupopup = document.getElementById("pageStyleMenu").menupopup;

with:

  let menupopup = document.querySelector("#pageStyleMenu > menupopup");
Flags: needinfo?(dao+bmo)
That seems to fix it, the same error shows up in browser_page_style_menu_update.js, applied same change and pushed to try again, https://treeherder.mozilla.org/#/jobs?repo=try&revision=1f682e95626f
So, with those two try runs, and with the other bc7 test failures out of the way:

The experiments seem to support that the XBL binding is in fact not bound when the browser_page_style_menu.js and browser_page_style_menu_update.js test are run in automation.

But why?
Neil, any chance you can help us find a path forward with this? We're stuck with this for 3 months and nowhere near the end ofthe tunnel.
Flags: needinfo?(enndeakin)
Here's the latest build with the same error - https://treeherder.mozilla.org/#/jobs?repo=larch&revision=7ccf894aacb0840b8e06e028abe1d494c5c001c5 (we fixed all other bugs we found over the last three months, but this one, we're stuck with)
:smaug suggested trying to inject a Promise in the middle to try to see if it's gonna solve it (which would indicate that it's racy)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=de5a7dda9600668a00e53017497c6c0ec3d2377e
Doesn't seem like it changed anything. :(
Gijs, RyanVM suggested you may be able to unblock us here. Any idea what may be going on?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #5)
> Dao, you seem to be the author of this test (you landed it in bug 733339).
> 
> We're stuck with it failing on our branch and kind of at a dead end with
> ideas on how to fix it.
> 
> What we know is that:
> 
> 1) The test passes when launched on its own
> 2) The test fails on our branch when launched in a bc7 chunk

<snip>

> I'm mostly stuck because I can't find a way to reproduce this bug locally
> because of (1).

./mach mochitest browser/base/content/test/general/

will run the entire dir, which is exactly what infra is doing. Unfortunately, it's quite a large dir (We've been slowly trying to get it smaller, but it's still pretty big). Does that also not reproduce? Are you using debug/opt as infra is, and e10s (default) or non-e10s (pass --disable-e10s) as infra is?

I'll look at it in more detail tomorrow, I guess, it's 11.25pm here.
Other suggestions: you can try running against a package build after doing ./mach package, by then using the "--appname dist" switch to run against the dist version (you can also pass in an arbitrary path instead to run against builds produced by infra).

Finally, on linux you can try rr with chaos mode ( ./mach mochitest --debugger=rr --debugger-args "record -h" path/to/dir ). It's kind of rubbish at debugging JS, but if you can reproduce you can printf debug.

Oh, and presumably you've tried --run-until-failure ?
I tried to run with "--run-until-failure" and I don't understand the results. I still have to run the whole "browser/base/content/test/general" which takes ~20min per run and I once got this failure, once didn't and once couldn't find it in the log.

I tried to reduce the amount of log "--quiet" - and then I couldn't find the failure. :/
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #18)
> I tried to run with "--run-until-failure" and I don't understand the
> results. I still have to run the whole "browser/base/content/test/general"
> which takes ~20min per run and I once got this failure, once didn't and once
> couldn't find it in the log.
> 
> I tried to reduce the amount of log "--quiet" - and then I couldn't find the
> failure. :/

If you're able to reproduce if you run the entire dir, use --start-at to narrow down which previous test is causing failures, which will then help reduce the time it takes to reproduce the issue again, and will also give clues as to why a previous test is causing this test to fail.
Flags: needinfo?(gijskruitbosch+bugs)
(also, --end-at with the failing test - clearly it's not necessary to run all the stuff that comes after it)
Also, those trypushes show lots of failing tests in browser/components/customizableui/, which on my machine runs in about 1 minute on opt. Have you looked at those? It's possible the underlying issue is the same.
It looks like your code is trying to fetch brand info before it's been loaded (or you've removed the .properties file in favour of l20n but didn't clean up consumers):

16:35:06     INFO -  Couldn't convert chrome URL: chrome://branding/locale/brand.properties

Also, there are weird XML errors (maybe from about:home, maybe about:certerror or about:neterror):

16:35:56     INFO - XML Parsing Error: mismatched tag. Expected: </b>.
16:35:56     INFO - Location:
16:35:56     INFO - Line Number 1, Column 221:
16:35:56     INFO - <html xmlns="http://www.w3.org/1999/xhtml"><body xmlns="http://www.w3.org/1999/xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p><b></p></b></

which trigger assertion failures. (I say weird because (a) this is unexpected and (b) the number of nested, un-id'd <div>s with namespaces on each of them is... surprising, to say the least))

(no idea how much of this is relevant)

Finally, maybe check if the XBL binding is alive when the browser starts initially? Is the menu node we're fetching with getElementById still in the document at the point where the XBL .menupopup call fails? Could try to add logs (Components.utils.reportError) to the destructor of the binding, see if that appears in the log? (destructors aren't guaranteed to run because XBL, ugh, and so kind of a long shot, but worth trying if we're attempting to understand when the popup is being destroyed)
(In reply to :Gijs Kruitbosch from comment #22)
> Finally, maybe check if the XBL binding is alive when the browser starts
> initially? Is the menu node we're fetching with getElementById still in the
> document at the point where the XBL .menupopup call fails?

Obviously this should be the case if you use getElementById, but it and/or querySelector have weird behaviour with XUL and XBL, see https://bugzilla.mozilla.org/show_bug.cgi?id=1282787 .
The tests that dao suggested to use

let menupopup = document.querySelector("#pageStyleMenu > menupopup");

show that the pageStyleMenu and its menupopup are there.

I did a try run with heavily instrumented XBL, for constructor, destructor, and .menupopup, https://treeherder.mozilla.org/#/jobs?repo=try&revision=ebc1c8ea27ceb2e460845fb863278faa56f6b6e3.

In the failing tests, you don't see the accessor being called, but it's reporting in other tests.

Constructors and destructors are called often and never respectively, and also in locally passing test runs, the constructors aren't close to the test case, so I didn't try to map the constructor call to anything about the particular menuitem.
(In reply to Axel Hecht [:Pike] from comment #24)
> The tests that dao suggested to use
> 
> let menupopup = document.querySelector("#pageStyleMenu > menupopup");
> 
> show that the pageStyleMenu and its menupopup are there.
> 
> I did a try run with heavily instrumented XBL, for constructor, destructor,
> and .menupopup,
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=ebc1c8ea27ceb2e460845fb863278faa56f6b6e3.
> 
> In the failing tests, you don't see the accessor being called, but it's
> reporting in other tests.

The logging doesn't distinguish the element that is being constructed/destructed, or on which the getter is being called, though, so it doesn't tell us whether this particular element was ever initialized (and/or its getter ever successfully invoked) or not.
Is there a diff from larch to m-c that's available somewhere? The trypushes seem to just use larch with merges from m-c and so all that happens is that my browser hangs and crashes if I open one of the merges. :-\
Yes, actually stas is working on refactoring our landings on larch into landable patches and pushes that to his user-repo, https://hg.mozilla.org/users/smalolepszy_mozilla.com/larch/.

Re the logging, the test does pass in isolation on various platforms, and then the logging shows one access in the middle of the test output. That other parts show the logging info from the binding to me ensures that the logging does work in automation. That the logging doesn't report inside the test output as it does in isolation to me indicates that the xbl binding is missing in the scenario in automation.
(In reply to Axel Hecht [:Pike] from comment #27)
> Yes, actually stas is working on refactoring our landings on larch into
> landable patches and pushes that to his user-repo,
> https://hg.mozilla.org/users/smalolepszy_mozilla.com/larch/.
> 
> Re the logging, the test does pass in isolation on various platforms, and
> then the logging shows one access in the middle of the test output. That
> other parts show the logging info from the binding to me ensures that the
> logging does work in automation. That the logging doesn't report inside the
> test output as it does in isolation to me indicates that the xbl binding is
> missing in the scenario in automation.

I'm confused. There's plenty of "constructed menuitem" in the log, but only in tests that have either requestCompleteLog or failures. So if a previous test or just the browser starting up constructed the binding, the log would be in there, and then it would get omitted because (almost?) none of the tests preceding the pagestyle menu one have logging turned on or have failures, so you get none of the (logging) test output at all. Once the XBL binding has been constructed, we wouldn't need to construct it again unless it was destroyed. So I don't think your conclusion follows (ie I think it's still possible it's been created and then destroyed). 

In any case, you can't distinguish the menu in question being constructed from any of the other ones, so it's hard to draw any conclusions.
(to be clear, yes, obviously the binding is not there or not functioning at the point where we reach the test - but the question I'm trying to answer is why/where did the binding get destroyed, or was it simply never applied to the element in question. For that, you'd need more specific logging (and probably turn on requestCompleteLog in head.js, which might push you over the log limits on infra - not sure, but you'd probably be wise to make the logging specific to the element we're interested in))
(In reply to :Gijs Kruitbosch from comment #21)
> Also, those trypushes show lots of failing tests in
> browser/components/customizableui/, which on my machine runs in about 1
> minute on opt. Have you looked at those? It's possible the underlying issue
> is the same.

Just did run browser/components/customizableui/test/browser_909779_overflow_toolbars_new_window.js on larch and c44c01dfd264, and both show the same failure and logs, including the brand error messages.

Maybe that's a new failure we picked up on our last merge from central?
Flags: needinfo?(dbolter)
Andrew I think you were going to try to find xbl debugging help here, please bounce the NI info back otherwise.
Flags: needinfo?(dbolter) → needinfo?(overholt)
William said he'd help out.
Flags: needinfo?(overholt) → needinfo?(wchen)
I narrowed this failure down to the test browser_bug422590.js
I can now reliably reproduce the failure if I run browser tests with only the test above and browser_page_style_menu.js

It looks like browser_bug422590.js disables the menu and doing so might cause the binding to be removed (guessing from the code and comments in the test [1]). I can only reproduce this in larch and it's not obvious why this doesn't happen in m-c, it'll need more investigation but at least we have a starting point.

[1] https://dxr.mozilla.org/mozilla-central/rev/47e0584afe0ab0b867412189c610b302b6ba0ea7/browser/base/content/test/general/browser_bug422590.js#36
Flags: needinfo?(wchen)
I investigated this further and here is the cause:

l20n code touches elements in script and causes us to wrap the element to create the reflector for JS. When we do this, we install the XBL binding.

browser_bug422590.js calls code that removes the menu from the document [1]. This causes us to remove the XBL binding. (The test adds the menu back to the document later, but we don't reinstall the XBL binding at that point).

When we get to browser_page_style_menu.js, the methods from XBL are no longer on the object and thus the test fails.

The problem is that once we remove the XBL binding prototype from the prototype chain, we only add it back when we create frames for the element [2] and it doesn't look like we get to that point (I'm guessing that is because the menu isn't visible during the tests).

This bug doesn't happen on m-c because we only create the wrapper for the menu elements right before we touch them in the tests and install the XBL bindings at that point. In larch, l20n code touches the elements very early.

A couple paths forward:

- Change XBL behavior to reinstall the binding when elements are inserted back into the document instead of when we create the frames.
- Update the test browser_bug422590.js to make the menus visible before the test ends (thus forcing the XBL binding to be reinstalled).

The first option is the more difficult one because this has been the behavior of XBL for a very long time and changing it will probably cause regressions. The second option is easier, and it looks like the test already suffers from the same issue (binding on menu is removed) but works around it.

[1] https://dxr.mozilla.org/mozilla-central/rev/f8ba9c9b401f57b0047ddd6932cb830190865b38/browser/components/customizableui/CustomizeMode.jsm#887
[2] https://dxr.mozilla.org/mozilla-central/rev/f8ba9c9b401f57b0047ddd6932cb830190865b38/layout/base/nsCSSFrameConstructor.cpp#2448
thanks so much :wchen! This is awesome.

So basically, the simple way to fix it is to open the file menu in the test to retrigger XBL binding.

I started looking for a function in our test code to do that, but can't find it. Anyone has any ideas on what code to use to trigger the file menu appearance?
something like this:

let menubarToolbar = document.getElementById("toolbar-menubar");
if (AppConstants.platform != "macosx" && menubarToolbar.getAttribute("autohide") == "true") {
  registerCleanupFunction(() => setToolbarVisibility(menubarToolbar, false));
  setToolbarVisibility(menubarToolbar, true);
}

let viewMenu = document.getElementById("view-menu");
let popupShownPromise = BrowserTestUtils.waitForEvent(viewMenu.menupopup, "popupshown");
viewMenu.open = true;
yield popupShownPromise;
Comment on attachment 8811523 [details]
Bug 1292128 - Regenerate XBL binding for page style menu at the end of the browser_bug422590 mochitest.

Thanks so much for help :gijs and :wchen!

This patch does indeed fix the bug. :)
Attachment #8811523 - Flags: review?(l10n)
Attachment #8811523 - Flags: review?(l10n) → review?(gijskruitbosch+bugs)
Comment on attachment 8811523 [details]
Bug 1292128 - Regenerate XBL binding for page style menu at the end of the browser_bug422590 mochitest.

I don't feel confident to review this, Gijs, can you take this one?
Comment on attachment 8811523 [details]
Bug 1292128 - Regenerate XBL binding for page style menu at the end of the browser_bug422590 mochitest.

https://reviewboard.mozilla.org/r/93602/#review93756

Sorry, I expected that 'file menu' was a typo, as the failing test was trying to access the view menu (to change charsets). So the code I gave you opens the view menu, not the file menu. I expected it'd be put in the test that's failing, which is effectively checking things about the charset menu without opening it, which I think is dodgy anyway. I do agree that you can equally argue that this tests messes things up and so it'd best take responsibility for putting them to rights again.

However, unfortunately this isn't quite good enough yet because now you run the test and then end it while the menu is still open. I would be... surprised... if that wouldn't come back to bite us at a later time in similar fashion to how this is currently biting us. In other words, you'd want a similar promise that checks the menu has been hidden again, and you'd want to reset `open` to false after the menu has been shown. Which probably means you want to use a task and 2 yields for the popupshown/popuphidden events?

Though... re-reading the code I suggested, menu.open is in itself implemented in XBL - https://dxr.mozilla.org/mozilla-central/source/toolkit/content/widgets/menu.xml#50-55 . So I'm wondering if just showing/hiding the menubar is sufficient? That would be simpler. I assume re-init'ing the XBL of the menubar's <menu> items is sufficient to init the XBL of all their descendants. Not sure, though. :-)


Actually, looking at this code I'm confused a last time: the code runs in a new window. The test opens one on line 26 (post-patch, line 9 on DXR). It closes that window before ending the test. What happens in the new window won't (... shouldn't...) affect the frames or XBL bindings of the main window, and the window has been closed, so how is all this messing with the test that ultimately failed?

Lastly, I'm on PTO today and tomorrow, so if you want the next review before Monday, please ask :Enn or :dao or :jaws. Sorry!
Attachment #8811523 - Flags: review?(gijskruitbosch+bugs)
(In reply to :Gijs Kruitbosch (PTO Nov. 17-20 inclusive) from comment #41)
> Comment on attachment 8811523 [details]
> Actually, looking at this code I'm confused a last time: the code runs in a
> new window. The test opens one on line 26 (post-patch, line 9 on DXR). It
> closes that window before ending the test. What happens in the new window
> won't (... shouldn't...) affect the frames or XBL bindings of the main
> window, and the window has been closed, so how is all this messing with the
> test that ultimately failed?
That's a good point, I took a look at the test again and it's actually running some tests on the main window (which is messing everything up) and then it creates a new chromeless window and runs some more tests on that window.

This patch uses a new window instead of the main window for the first set of tests and it looks like it's enough to fix this bug.
Comment on attachment 8811849 [details] [diff] [review]
Don't use main window for running test in browser_bug422590.js, use a new window instead.

summary: browser_bug322590.js is running tests on the main window and the tests cause XBL bindings to be removed from the menu. This is affecting a later test that accesses a property on the binding because it is no longer there. This patch changes the test to use a new window instead of the main window.
Attachment #8811849 - Flags: review?(jaws)
Comment on attachment 8811849 [details] [diff] [review]
Don't use main window for running test in browser_bug422590.js, use a new window instead.

Review of attachment 8811849 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/test/general/browser_bug422590.js
@@ +49,5 @@
>    function onClose() {
>      is(fileMenu.getAttribute("disabled"), "false",
>         "file menu is enabled after toolbar customization");
>  
> +    aWindow.close();

It doesn't look right that onClose is calling aWindow.close(), because that would mean that there are two calls to window.close() for the chromeless window.

If you look at where testCustomize is called for the chromeless window, the callback that is passed to testCustomize already includes newWin.close().

Therefore, maybe you should change the code above from:
testCustomize(newWin, testChromeless)
to:
testCustomize(newWin, function() {
  newWin.close();
  testChromeless();
});

and then remove this line that you're adding right here.
Attachment #8811849 - Flags: review?(jaws) → review-
Attachment #8811523 - Attachment is obsolete: true
Attachment #8811849 - Attachment is obsolete: true
Attachment #8812306 - Flags: review?(jaws)
Comment on attachment 8812306 [details] [diff] [review]
Don't use main window for running test in browser_bug422590.js, use a new window instead.

Review of attachment 8812306 [details] [diff] [review]:
-----------------------------------------------------------------

Stealing, r=me
Attachment #8812306 - Flags: review?(jaws) → review+
Pushed by wchen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/853024720c07
Don't use main window for running test in browser_bug422590.js, use a new window instead. r=Gijs
Assignee: gandalf → wchen
https://hg.mozilla.org/mozilla-central/rev/853024720c07
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.