Closed Bug 1491484 Opened 11 months ago Closed 2 months ago

Intermittent toolkit/content/tests/browser/browser_findbar.js | Uncaught exception - undefined - timed out after 50 tries.

Categories

(Toolkit :: Find Toolbar, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: bgrins)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(3 files, 1 obsolete file)

Filed by: ncsoregi [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=199407494&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/fQmNtXP-SNyhq7nJpV2d_w/runs/0/artifacts/public/logs/live_backing.log

[task 2018-09-14T22:36:48.770Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | Findbar status should be empty - 
[task 2018-09-14T22:36:48.771Z] 22:36:48     INFO - Leaving test bound test_found
[task 2018-09-14T22:36:48.772Z] 22:36:48     INFO - Entering test bound test_tabwise_case_sensitive
[task 2018-09-14T22:36:48.774Z] 22:36:48     INFO - Buffered messages logged at 22:36:40
[task 2018-09-14T22:36:48.775Z] 22:36:48     INFO - Result listener not called, timeout reached.
[task 2018-09-14T22:36:48.777Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | Findbar status text should be 'Phrase not found' - 
[task 2018-09-14T22:36:48.777Z] 22:36:48     INFO - Buffered messages logged at 22:36:42
[task 2018-09-14T22:36:48.780Z] 22:36:48     INFO - Result listener not called, timeout reached.
[task 2018-09-14T22:36:48.781Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | Findbar status should be empty - 
[task 2018-09-14T22:36:48.783Z] 22:36:48     INFO - Leaving test bound test_tabwise_case_sensitive
[task 2018-09-14T22:36:48.784Z] 22:36:48     INFO - Entering test bound test_reinitialization_at_remoteness_change
[task 2018-09-14T22:36:48.786Z] 22:36:48     INFO - Ensure findbar re-initialization at remoteness change.
[task 2018-09-14T22:36:48.786Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | Findbar status text should be 'Phrase not found' - 
[task 2018-09-14T22:36:48.787Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | Findbar status should be empty - 
[task 2018-09-14T22:36:48.787Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | Browser should be remote now. - 
[task 2018-09-14T22:36:48.788Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | Browser should not be remote any more. - 
[task 2018-09-14T22:36:48.788Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | Findbar status text should be 'Phrase not found' - 
[task 2018-09-14T22:36:48.789Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | Findbar status should be empty - 
[task 2018-09-14T22:36:48.790Z] 22:36:48     INFO - Leaving test bound test_reinitialization_at_remoteness_change
[task 2018-09-14T22:36:48.792Z] 22:36:48     INFO - Entering test bound e10sLostKeys
[task 2018-09-14T22:36:48.792Z] 22:36:48     INFO - Buffered messages logged at 22:36:43
[task 2018-09-14T22:36:48.796Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | findbar isn't initialized yet - 
[task 2018-09-14T22:36:48.797Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | findbar is not yet focused - 
[task 2018-09-14T22:36:48.798Z] 22:36:48     INFO - TEST-PASS | toolkit/content/tests/browser/browser_findbar.js | still has initial find query - 
[task 2018-09-14T22:36:48.799Z] 22:36:48     INFO - Buffered messages finished
[task 2018-09-14T22:36:48.800Z] 22:36:48     INFO - TEST-UNEXPECTED-FAIL | toolkit/content/tests/browser/browser_findbar.js | Uncaught exception - undefined - timed out after 50 tries.
[task 2018-09-14T22:36:48.801Z] 22:36:48     INFO - Leaving test bound e10sLostKeys
[task 2018-09-14T22:36:48.802Z] 22:36:48     INFO - GECKO(10630) | MEMORY STAT | vsize 802MB | residentFast 306MB | heapAllocated 94MB
[task 2018-09-14T22:36:48.804Z] 22:36:48     INFO - TEST-OK | toolkit/content/tests/browser/browser_findbar.js | took 14537ms
[task 2018-09-14T22:36:48.805Z] 22:36:48     INFO - Not taking screenshot here: see the one that was previously logged
[task 2018-09-14T22:36:48.806Z] 22:36:48     INFO - TEST-UNEXPECTED-FAIL | toolkit/content/tests/browser/browser_findbar.js | Found an unexpected tab at the end of test run: data:text/html;charset=utf-8,The letter s. - 
[task 2018-09-14T22:36:48.807Z] 22:36:48     INFO - checking window state
I'll take a look
Assignee: nobody → bgrinstead
Blocks: 1411707
Status: NEW → ASSIGNED
Flags: needinfo?(bgrinstead)
I'm seeing some different timing on the good and bad case:

Good:
1) Call `EventUtils.synthesizeAndWaitKey("f")` https://searchfox.org/mozilla-central/rev/bdc89dfd7869e418d788b28eb60ab8d94e708a15/toolkit/content/tests/browser/browser_findbar.js#179
2) The keyReceivedPromise gets resolved https://searchfox.org/mozilla-central/rev/bdc89dfd7869e418d788b28eb60ab8d94e708a15/testing/mochitest/tests/SimpleTest/EventUtils.js#990
3) The `checkAfterSynthesize` callback from the test which does the `EventUtils.synthesizeKey("a");` etc gets called https://searchfox.org/mozilla-central/rev/bdc89dfd7869e418d788b28eb60ab8d94e708a15/toolkit/content/tests/browser/browser_findbar.js#183

Bad: (2) and (3) order is swapped. Somehow this is leading to a case where receiveMessage doesn't get called, even though the addMessageListener is being called before any of this happens in both the good and bad case (https://searchfox.org/mozilla-central/rev/6c82481caa506a240a626bb44a2b8cbe0eedb3a0/toolkit/content/widgets/findbar.js#343).
Did some more runs. It seems like somehow the focus event on the input is now firing earlier: https://searchfox.org/mozilla-central/rev/6c82481caa506a240a626bb44a2b8cbe0eedb3a0/toolkit/content/widgets/findbar.js#255.

This causes a call to _updateBrowserWithState which sends "Findbar:UpdateState" to messageManager which then kicks us out of inPassThrough and prevents calls to passKeyToParent (which then sends the "Findbar:Keypress" message back): https://searchfox.org/mozilla-central/rev/a0333927deabfe980094a14d0549b589f34cbe49/toolkit/modules/FindBarContent.jsm#50.

At least, if I wrap the focus listener in a timeout then test-verify passes on Linux. This seems plausible, since the XBL event handlers on the textbox would have fired later than the normal event handlers (as per https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XBL/XBL_1.0_Reference/Event_Handlers):

> XBL event handlers always fire last, after all other event handlers at the same position in the event flow. Since XBL handlers usually constitute the default actions for a widget, this allows authors in the bound document to write events that potentially suppress the default actions taken by the XBL handlers.

> Within an XBL inheritance chain, event handlers always fire first on the derived binding and then on the base binding in the chain. A derived handler then has a way of preventing the event from flowing to its base binding handlers.

It's not clear to me if this is just a race that's more common to surface now that the event handler fires earlier, or if there's a new bug here that the test is uncovering.
There's the mozSystemGroup option that I think should make your event listener behave similar to the XBL one.
(In reply to Dão Gottwald [::dao] from comment #7)
> There's the mozSystemGroup option that I think should make your event
> listener behave similar to the XBL one.

I went ahead and tried every configuration of mozSystemGroup and capture, and still fail test-verify for all 4 of those. I'm not sure if that test was passing test-verify before the CE migration, though.
(In reply to Brian Grinstead [:bgrins] from comment #9)
> (In reply to Dão Gottwald [::dao] from comment #7)
> > There's the mozSystemGroup option that I think should make your event
> > listener behave similar to the XBL one.
> 
> I went ahead and tried every configuration of mozSystemGroup and capture,
> and still fail test-verify for all 4 of those.

The problem probably isn't the event propagation order then...

> I'm not sure if that test was
> passing test-verify before the CE migration, though.

Well, the test wasn't known to be intermittent, was it?
(In reply to Dão Gottwald [::dao] from comment #10)
> (In reply to Brian Grinstead [:bgrins] from comment #9)
> > (In reply to Dão Gottwald [::dao] from comment #7)
> > > There's the mozSystemGroup option that I think should make your event
> > > listener behave similar to the XBL one.
> > 
> > I went ahead and tried every configuration of mozSystemGroup and capture,
> > and still fail test-verify for all 4 of those.
> 
> The problem probably isn't the event propagation order then...

Does mozSystemGroup run at the same time as a XBL <handler>? The confusing thing is you can also set `group="system"` on a <handler> which makes me think they are two separate things.

> > I'm not sure if that test was
> > passing test-verify before the CE migration, though.
> 
> Well, the test wasn't known to be intermittent, was it?

No, I don't see anything that looks relevant at https://bugzilla.mozilla.org/buglist.cgi?quicksearch=browser_findbar.js&list_id=14341486.
(In reply to Brian Grinstead [:bgrins] from comment #11)
> (In reply to Dão Gottwald [::dao] from comment #10)
> > (In reply to Brian Grinstead [:bgrins] from comment #9)
> > > (In reply to Dão Gottwald [::dao] from comment #7)
> > > > There's the mozSystemGroup option that I think should make your event
> > > > listener behave similar to the XBL one.
> > > 
> > > I went ahead and tried every configuration of mozSystemGroup and capture,
> > > and still fail test-verify for all 4 of those.
> > 
> > The problem probably isn't the event propagation order then...
> 
> Does mozSystemGroup run at the same time as a XBL <handler>? The confusing
> thing is you can also set `group="system"` on a <handler> which makes me
> think they are two separate things.

Not sure, but in any case the listeners (do we even have multiple?) are called serially, whereas setTimeout implies a whole different kind of "later".
OK, given the logging at https://treeherder.mozilla.org/#/jobs?repo=try&revision=7e06e8bddc3b5df91cc8b1189849e6215da6a23f&selectedJob=200215210 / https://taskcluster-artifacts.net/Sfoq7_gAS2apTGtiSsRpYw/0/public/logs/live_backing.log, I think what's happening is:

Good case:

1) FindbarContent gets the 3 'keypress' events and calls passKeyToParent for each because we are `inPassThrough=true` https://searchfox.org/mozilla-central/rev/a0333927deabfe980094a14d0549b589f34cbe49/toolkit/modules/FindBarContent.jsm#79

2) Some time after one or more keys are pressed <findbar> gets focused in the parent which passes "Findbar:UpdateState" down to content with isOpenAndFocused = true. Note that how many keys are already pressed before focus is random (search for "onKeypress received from child" in the log to see the various configurations)

3) "Findbar:UpdateState" is received in content. Even with the randomness on when (2) happens, this message is received _after_ all 3 keypress events happen. This turns off inPassThrough: :https://searchfox.org/mozilla-central/rev/a0333927deabfe980094a14d0549b589f34cbe49/toolkit/modules/FindBarContent.jsm#54

Bad case:

FindbarContent gets only 2 'keypress' events, although it does get 3 'keydown' events. Step (3) above happens _before_ the third keypress happens. I've just now found this line: https://searchfox.org/mozilla-central/rev/a0333927deabfe980094a14d0549b589f34cbe49/toolkit/actors/FindBarChild.jsm#68. I need to do a try to confirm, but what I think is happening is that we flip inPassThrough to false after the keydown happened in the content process but before the keypress. Do you know of a way to track whether a keypress came from a keydown that had already happened before so we can allow the pass through to happen in that case? Or have an alternative idea on how to prevent this race in the first place?

FWIW, try is totally happy if I just remove or delay the _updateBrowserWithState call in the focus listener on the input field https://searchfox.org/mozilla-central/rev/a0333927deabfe980094a14d0549b589f34cbe49/toolkit/content/widgets/findbar.js#259, although I'm a bit worried that could cause some other side effect where the browser thinks it's in passthrough when it shouldn't be - I'm not super familiar with this code.
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(dao+bmo)
Forwarding to Mike since he knows this code better, though I'm happy to take a closer look myself if needed.
Flags: needinfo?(dao+bmo) → needinfo?(mdeboer)
(In reply to Brian Grinstead [:bgrins] from comment #13)
> FindbarContent gets only 2 'keypress' events, although it does get 3
> 'keydown' events. Step (3) above happens _before_ the third keypress
> happens. I've just now found this line:
> https://searchfox.org/mozilla-central/rev/
> a0333927deabfe980094a14d0549b589f34cbe49/toolkit/actors/FindBarChild.jsm#68.
> I need to do a try to confirm, but what I think is happening is that we flip
> inPassThrough to false after the keydown happened in the content process but
> before the keypress. Do you know of a way to track whether a keypress came
> from a keydown that had already happened before so we can allow the pass
> through to happen in that case? Or have an alternative idea on how to
> prevent this race in the first place?

Just to follow up here, it looks like this idea was correct:

If you search for `"FindBarChild onKeypress receieved" 99 false` in https://taskcluster-artifacts.net/RZ2K1ktySgOiaRccMY_6EA/0/public/logs/live_backing.log you can see that happens after `"is open and focused?" true` message is received (which means isPassThrough is false and the FindBarChild doesn't pass the key along to FindBarContent). Link to try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7196b4f92f9abae5b4b2afa6a4a0797790d3a124&selectedJob=200246712.
Attachment #9010679 - Flags: review?(jmaher) → review+
The "whole point" of this particular test is to ensure that, if the user has 'press any key to find in page' turned on, and the page has focus, and the user types, no keypresses get lost (ie all of them need to make it to the findbar), even if the user types very quickly.

(In reply to Brian Grinstead [:bgrins] from comment #13)
> FWIW, try is totally happy if I just remove or delay the
> _updateBrowserWithState call in the focus listener on the input field
> https://searchfox.org/mozilla-central/rev/
> a0333927deabfe980094a14d0549b589f34cbe49/toolkit/content/widgets/findbar.
> js#259, although I'm a bit worried that could cause some other side effect
> where the browser thinks it's in passthrough when it shouldn't be - I'm not
> super familiar with this code.

So this doesn't seem like the right solution.

The test is necessarily a bit artificial, because we forcibly send keys to the kid independent of when focus moves into the parent's findbar element. The last time I touched it, I initially had similar problems, and they went away. See bug 1371523 comment 19 and later comments. There are some debugging suggestions in there, in terms of figuring out *why* the messaging order is wrong, assuming the issue is the same (which is what it sounds like to me, though it's possible I'm missing something).
Flags: needinfo?(gijskruitbosch+bugs)
Attachment #9010689 - Flags: review?(jmaher) → review+
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
Pushed by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/97f2e131ba8b
disable browser_findbar.js on linux opt and pgo for frequent failures. r=jmaher
Keywords: checkin-needed
Switching around mozSystemGroup and capture on the in-content keypress event and parent process focus doesn't seem to help: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2753f6d3b095c17d2e460e31d7d08b004249d13c&selectedJob=201253258 / https://hg.mozilla.org/try/rev/9b2c96502b8d.
Pushed by bgrinstead@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d1c5f2c89b17
Skip less of browser_findbar.js on linux opt;r=Gijs
I'm sorry I wasn't of much help here, Brian :/ Could you needinfo me again if this needs more work after all, or can this bug be closed?
Flags: needinfo?(mdeboer)
Attachment #9010799 - Attachment description: Bug 1491484 - WIP Re-enable browser_findbar.js on linux → Bug 1491484 - Re-enable all of browser_findbar.js on linux
Pushed by bgrinstead@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6c57a1daa70c
Re-enable all of browser_findbar.js on linux r=Gijs
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
You need to log in before you can comment on or make changes to this bug.