Closed Bug 1336887 Opened 7 years ago Closed 7 years ago

Permaorange in browser_940307_panel_click_closure_handling.js on ASan when Gecko 54 merges to aurora on 2017-03-06

Categories

(Firefox :: Toolbars and Customization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox54 + fixed

People

(Reporter: philor, Assigned: dholbert)

References

Details

(Whiteboard: Fixed by bug 1338053)

Yeah, I can't quite believe I typed that, either.

https://treeherder.mozilla.org/logviewer.html#?job_id=74658429&repo=try for non-e10s, which starts with an extra "uncaught exception - TypeError: browser.contentWindow.gotoPref is not a function at openPreferences@chrome://browser/content/utilityOverlay.js:735:7" and https://treeherder.mozilla.org/logviewer.html#?job_id=74658478&repo=try for e10s,

[task 2017-02-06T02:32:20.267641Z] 02:32:20     INFO - TEST-UNEXPECTED-FAIL | browser/components/customizableui/test/browser_940307_panel_click_closure_handling.js | Uncaught exception - Panel did not show within 20 seconds.
[task 2017-02-06T02:32:20.634846Z] 02:32:20     INFO - TEST-UNEXPECTED-FAIL | browser/components/customizableui/test/browser_940307_panel_click_closure_handling.js | Popup stays open - Got closed, expected open
[task 2017-02-06T02:32:20.635180Z] 02:32:20     INFO - Stack trace:
...
[task 2017-02-06T02:32:40.628180Z] 02:32:40     INFO - TEST-UNEXPECTED-FAIL | browser/components/customizableui/test/browser_940307_panel_click_closure_handling.js | Uncaught exception - Panel did not hide within 20 seconds.

[Tracking Requested - why for this release]: permaorange on the merge, closed tree, tears and tearing of hair.
https://hg.mozilla.org/try/rev/5b90e6b1f8d7d27745201ffcc14d30e03ac25671 and https://hg.mozilla.org/try/rev/821acad7325303d2da0e0ac7aa11535483e3a081 should serve as a good clean central-as-aurora-on-try base for testing.
Flags: needinfo?(gijskruitbosch+bugs)
Do we have a regression range (ie was there an earlier aurora push that wasn't perma-orange)? I'm not aware of what anything that has changed here that would trigger this breakage. :-\

Jared, do you know of any preferences changes that might be related?
Flags: needinfo?(philringnalda)
Flags: needinfo?(jaws)
Flags: needinfo?(gijskruitbosch+bugs)
Nope, that's the first central-as-aurora I've done in a couple of years, so the range I've got is "not 53", the last couple of weeks.
Flags: needinfo?(philringnalda)
Tracking 54+ to keep this on the radar.
Well, that wasn't something I would have picked as the cause. https://treeherder.mozilla.org/#/jobs?repo=try&author=philringnalda@gmail.com&fromchange=bd197307165abea4860946ea061147d820a38e0d&group_state=expanded&tochange=df4f32dfa2ea98a521386f2e45d2442248d603a3 is, from the bottom, the first cset of bug 1209697 being fine, the second one busted, and the tip of m-c (as aurora) with the second one backed out, again fine.
Blocks: 1209697
Flags: needinfo?(emilio+bugs)
I don't really have a better explanation than what I wrote in bug 1209697 comment 35. Not sure also what happened between that  and bug 1209697 comment 38, where those failures went away in central. I just assumed those failures were completely unrelated and had been fixed from there.

The panel being tested uses flex layout in a way that was pathological before my patch, so I'm tempted to blame racy tests for this, since there's absolutely no way my patch would cause a TypeError in browser code otherwise.

I can try to dig more if required, though I have never touched any browser/ code ever :)
Flags: needinfo?(emilio+bugs)
I have a suspicion about what's happening (checking a one-click loaner right now to see if I can see it... we try to add the search bar and a disabled button to the menu panel and then bring up its context menu. But we've been adding buttons to that panel, and there's a different set of buttons on aurora. So if the position of the search bar and/or new button is such that it isn't visible in the panel (which shouldn't be the case, but bear with me) then we end up clicking the 'sign into sync' status bar that is below the customizable items in the panel. This then leads to the sync pane opening, which closes the panel, and causes all the test breakage.

The reason this is all related to flex layout patches is that the panel UI contents are supposed to grow if you add more items (ie that area of the panel will get bigger). So items being scrolled out of view and us clicking the wrong thing isn't supposed to happen. Emilio, any chance you have an idea about why that would happen after your changes?
Flags: needinfo?(emilio+bugs)
Well, not off hand, we _could_ be assuming that the cached value is right when we shouldn't, which can happen with that patch in some weird cases with nested flex containers (in which case the patch over bug 1336708 would fix this). But that would happen in all the tests and not only on ASAN I guess?

Thanks for digging into it btw, the previous comment was really helpful to understand what could be going on.

Where I can see the XML/XUL code that represents the panel items? I guess that would also help me understand what could go wrong.
Flags: needinfo?(emilio+bugs)
(In reply to Emilio Cobos Álvarez [:emilio] from comment #8)
> Well, not off hand, we _could_ be assuming that the cached value is right
> when we shouldn't, which can happen with that patch in some weird cases with
> nested flex containers (in which case the patch over bug 1336708 would fix
> this). But that would happen in all the tests and not only on ASAN I guess?
> 
> Thanks for digging into it btw, the previous comment was really helpful to
> understand what could be going on.
> 
> Where I can see the XML/XUL code that represents the panel items? I guess
> that would also help me understand what could go wrong.

It's mostly constructed dynamically. The container bits are in https://dxr.mozilla.org/mozilla-central/source/browser/components/customizableui/content/panelUI.inc.xul . For seeing the contents, you're probably best off looking at it with the browser toolbox ( https://developer.mozilla.org/en-US/docs/Tools/Browser_Toolbox ). You can use it (or the browser console - PanelUI.panel is a reference to the element) to set the "noautohide" attribute to "true", which will let you easily inspect the contents (ie to prevent the popup from disappearing all the time).
See bug 1336887, I bet that fixes this.
Depends on: 1338053
(In reply to Emilio Cobos Álvarez [:emilio] from comment #10)
> See bug 1336887, I bet that fixes this.

Err, I mean bug 1338053.
Yeah, extremely likely that that'll fix this.  (I don't know how to test this nightly-as-devedition configuration locally, but I'm not too worried about jumping through hoops to get it set up, because I'm confident that bug 1338053 explains & addresses this.)
Flags: needinfo?(jaws)
I'd bet that the hardest part of testing locally would be managing to hit the bustage rather than figuring out how to build as aurora. 99% of the build is just from changing 54.0a1 to 54.0a2 in the three version/milestone files, and using ac_add_options --with-branding=browser/branding/aurora in your mozconfig, but even with what should be every single change that will actually happen, and running on try using the exact same hardware/VMs, we're still only hitting it on the  ASan tests, which could possibly mean that you have to change version numbers and branding and --disable-crashreporter like ASan, or just as likely could mean that you have to run just as slowly as tests on ASan run.

But, why test locally? hg import https://hg.mozilla.org/try/rev/5b90e6b1f8d7d27745201ffcc14d30e03ac25671 && hg import https://hg.mozilla.org/try/rev/821acad7325303d2da0e0ac7aa11535483e3a081 && hg import https://hg.mozilla.org/try/rev/291e8b6c17bc94a5bb557e9dbf75da7d28efb961 && hg push-to-try -m "try: -b o -p linux64-asan -u mochitest-bc -t none" gives you a 100% chance of hitting the bustage if that doesn't fix it.
Fair enough - thanks!

That command didn't *quite* work out-of-the-box, because my local CLOBBER file is different from what one of those imported csets expects. (CLOBBER bitrot, sigh) :)

But I fixed the bitrot to hg's satisfaction and triggered a try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5cc2cf52dce730cb79a6f34df9615b073e2454c0

If that's green, then I think we can call this FIXED by bug 1338053!
(In reply to Daniel Holbert [:dholbert] from comment #14)
> Fair enough - thanks!
> 
> That command didn't *quite* work out-of-the-box, because my local CLOBBER
> file is different from what one of those imported csets expects. (CLOBBER
> bitrot, sigh) :)
> 
> But I fixed the bitrot to hg's satisfaction and triggered a try run:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=5cc2cf52dce730cb79a6f34df9615b073e2454c0
> 
> If that's green, then I think we can call this FIXED by bug 1338053!

Victory! Thanks very much for tracking down the rest of this to you and Emilio. :-)
Assignee: nobody → dholbert
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Whiteboard: Fixed by bug 1338053
You need to log in before you can comment on or make changes to this bug.