Closed Bug 1443364 Opened 6 years ago Closed 6 years ago

Intermittent browser/components/customizableui/test/browser_901207_searchbar_in_panel.js | Uncaught exception - Panel did not show within 20 seconds.

Categories

(Firefox :: Toolbars and Customization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 60
Tracking Status
firefox60 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: dao)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:logic])

Attachments

(1 file)

Off-hand this failure just looks like "we're too slow".

17 failures over 24 hours still means you'd need something on the order of 30-50 retriggers per push to narrow this down, and I likely won't be able to reproduce locally. I don't have cycles to chase this by manually running hundreds of retriggers to try to find the regressor in the pile of autoland checkins. Do we have tools for this at this point?

Basically, I'd really like this to be as simple as feeding an orange factor bug reference (which will have the first occurrence, failure message, most-frequently-failing platform, and frequency information to extrapolate a safe number of retriggers from) and a backward timeframe (default: 1 day) into a webpage, and have it come back to me a few hours later with an answer.
Flags: needinfo?(jmaher)
Flags: needinfo?(gbrown)
actually making a |./mach try intermittent <bugid>| would be cool.  It could determine the failure rate and platforms/configs and push to try with --rebuild X (although we are limited to a max of 20 for --rebuild as it seems to get abused too often).  The failures are primarily debug with a few opt failures mixed in there.  I have filed bug 1443773 to track this idea.

For the concept of retriggering recent failures on the trees, that does take a lot of work and we don't have anything specific in place for a process or tools to do this.  Our plan is to have something in place for the code sheriffs to see these high frequency failures that started in the last 3 days and retrigger and find the root cause.  The best bet here is to build a view on orangefactor or other dashboard where we show recent failures that occur at least 15 times in and are less than 3 days old- that would give us a place to focus efforts.

We have tried a few different times to solve this for performance alerts and it has failed in so many ways each time due to all the edge cases.

I have done some retriggers to see if this happened when the test moved from chunk 2 -> chunk 8:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=linux%20debug%20browser-chrome-e10s%20%20x64%20stylo&tochange=83df6cf24cf8fa59eb338ce0bd7bedac289c5d33&fromchange=916d91f23f895a441ffe375ff245d4c62a4bb41a
Flags: needinfo?(jmaher)
I think intelligent-backfilling&retriggering is a great idea...but it does have a lot of moving parts. I have some fear that such a project might end up creating a feature that works like magic one day of the year but otherwise has us chasing a million bugs.

I have aspirations of combining test-verify with backfilling to provide a similar feature: In treeherder, request regression tracking for a named test and changeset; then run test-verify on that test and work backward in time until the test runs reliably. However, this is just an idea -- I haven't even thought about coding this yet, there's no bug on file or anything.
Flags: needinfo?(gbrown)
looks to be related to bug 1442651:
https://hg.mozilla.org/integration/autoland/rev/562966f195b0f071cf1d973f7b4fd51ec8594343

::dao, can you take a look at this?
Flags: needinfo?(dao+bmo)
Interestingly, browser_901207_searchbar_in_panel.js doesn't seem to interact with tabbrowser in any meaningful way, and I don't see any errors in the log.

(In reply to :Gijs (under the weather; responses will be slow) from comment #2)
> Off-hand this failure just looks like "we're too slow".

20 seconds to show the overflow panel...?

I just ran the test locally and can actually reproduce the failure. The toolbar overflows as it should but the panel doesn't open. I can open it manually and focus the search bar, and then the test continues and passes.
Flags: needinfo?(dao+bmo)
(In reply to Dão Gottwald [::dao] from comment #7)
> I just ran the test locally and can actually reproduce the failure. The
> toolbar overflows as it should but the panel doesn't open. I can open it
> manually and focus the search bar, and then the test continues and passes.

Apparently it's sendWebSearchKeyCommand that fails. I can hit accel+k manually and that lets the test pass as well.
(In reply to Dão Gottwald [::dao] from comment #7)
> Interestingly, browser_901207_searchbar_in_panel.js doesn't seem to interact
> with tabbrowser in any meaningful way, and I don't see any errors in the log.
> 
> (In reply to :Gijs (under the weather; responses will be slow) from comment
> #2)
> > Off-hand this failure just looks like "we're too slow".
> 
> 20 seconds to show the overflow panel...?

Well, some of the screenshots in failing tests do show the overflow panel, with the search bar in it, and it's mostly failing on debug and asan runs, so yes that was my starting assumption. The regression window makes it a bit moot, of course...
I'm going to attach a patch that fixes the test over here, but I really don't understand why. the only rational explanation I can come up with right now for how bug 1442651 broke this is that document.activeElement is the <browser> when sendWebSearchKeyCommand gets called first, whereas before bug 1442651 it probably was <tabbrowser> because the browser elements were anonymous. But the focused element should be the browser element regardless. I don't know why this would affect EventUtils.synthesizeKey, let alone why this would be intermittent. Like I said manually hitting accel+k worked fine, so I don't think we have a real regression here.
Assignee: nobody → dao+bmo
Blocks: 1442651
Status: NEW → ASSIGNED
 (In reply to Geoff Brown [:gbrown] from comment #4)
> However, this is just an idea -- I haven't
> even thought about coding this yet, there's no bug on file or anything.

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1443867 . :-)
Comment on attachment 8956896 [details]
Bug 1443364 - Fix browser_901207_searchbar_in_panel.js intermittent failure.

https://reviewboard.mozilla.org/r/225850/#review231734

rs=me, though I wonder if we should file a followup about the default focus change having happened and just check in with Enn about why that would be / if there are going to be more serious repercussions than random tests failing (which of course is unfortunate, but doesn't ultimately hurt users directly).

Thanks for tracking this down and fixing it so quickly, Dão, Joel and Geoff!
Attachment #8956896 - Flags: review?(gijskruitbosch+bugs) → review+
(In reply to :Gijs (under the weather; responses will be slow) from comment #13)
> Comment on attachment 8956896 [details]
> Bug 1443364 - Fix browser_901207_searchbar_in_panel.js intermittent failure.
> 
> https://reviewboard.mozilla.org/r/225850/#review231734
> 
> rs=me, though I wonder if we should file a followup about the default focus
> change having happened and just check in with Enn about why that would be /
> if there are going to be more serious repercussions than random tests
> failing (which of course is unfortunate, but doesn't ultimately hurt users
> directly).
> 
> Thanks for tracking this down and fixing it so quickly, Dão, Joel and Geoff!

I'll file a followup once we know that this actually fixes the intermittent failure rather than just my local one. I tried to run test-verify on try but failed to figure out how to do it, so I'll just land this...
Pushed by dgottwald@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/39f31c0be9f4
Fix browser_901207_searchbar_in_panel.js intermittent failure. r=Gijs
https://hg.mozilla.org/mozilla-central/rev/39f31c0be9f4
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 60
See Also: → 1443360
See Also: → 1444042
See Also: → 1443404
Whiteboard: [stockwell needswork:owner] → [stockwell fixed:logic]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: