Closed Bug 1308343 Opened 3 years ago Closed 2 years ago

Intermittent browser/modules/test/browser/browser_ContentSearch.js | Test timed out -

Categories

(Firefox :: Search, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 56
Tracking Status
firefox-esr52 --- disabled
firefox53 --- wontfix
firefox54 --- disabled
firefox55 --- disabled
firefox56 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: standard8)

Details

(Keywords: intermittent-failure, Whiteboard: [fxsearch][stockwell disabled])

Attachments

(2 files)

Component: General → Search
Priority: -- → P3
Whiteboard: [fxsearch]
this has been trending towards high frequency the last 4 weeks, no need to take action here- but possibly looking into this for obvious flaws would be nice.

this is timing out primarily on linux64-debug, typically this runs in 15seconds, so the timeout seems like a real failure case.

here is a clip from a linux64 debug log[0]:
[task 2017-03-02T22:36:18.347406Z] 22:36:18     INFO - TEST-PASS | browser/modules/test/browser_ContentSearch.js | Array buffers have the same length. - 339 == 339 - 
[task 2017-03-02T22:36:18.349615Z] 22:36:18     INFO - TEST-PASS | browser/modules/test/browser_ContentSearch.js | Buffers are equal. - 
[task 2017-03-02T22:36:18.351771Z] 22:36:18     INFO - TEST-PASS | browser/modules/test/browser_ContentSearch.js | Actual value is ArrayBuffer. - 
[task 2017-03-02T22:36:18.353906Z] 22:36:18     INFO - TEST-PASS | browser/modules/test/browser_ContentSearch.js | Expected value is ArrayBuffer. - 
[task 2017-03-02T22:36:18.356185Z] 22:36:18     INFO - TEST-PASS | browser/modules/test/browser_ContentSearch.js | Array buffers have the same length. - 339 == 339 - 
[task 2017-03-02T22:36:18.359303Z] 22:36:18     INFO - TEST-PASS | browser/modules/test/browser_ContentSearch.js | Buffers are equal. - 
[task 2017-03-02T22:36:18.361163Z] 22:36:18     INFO - Leaving test bound modifyEngine
[task 2017-03-02T22:36:18.363147Z] 22:36:18     INFO - Entering test bound search
[task 2017-03-02T22:36:18.366992Z] 22:36:18     INFO - Waiting for ContentSearch message AddToWhitelistAck...
[task 2017-03-02T22:36:18.368743Z] 22:36:18     INFO - Received ContentSearch message AddToWhitelistAck
[task 2017-03-02T22:36:18.370340Z] 22:36:18     INFO - 
[task 2017-03-02T22:36:18.372160Z] 22:36:18     INFO - Waiting for ContentSearchTest message loadStopped...
[task 2017-03-02T22:36:18.375212Z] 22:36:18     INFO - Received ContentSearchTest message loadStopped
[task 2017-03-02T22:36:18.376756Z] 22:36:18     INFO - 
[task 2017-03-02T22:36:18.380128Z] 22:36:18     INFO - TEST-PASS | browser/modules/test/browser_ContentSearch.js | Correct search page loaded - "http://mochi.test:8888/browser/browser/components/search/test/?searchtest=ContentSearchTest" == "http://mochi.test:8888/browser/browser/components/search/test/?searchtest=ContentSearchTest" - 
[task 2017-03-02T22:36:18.382144Z] 22:36:18     INFO - Leaving test bound search
[task 2017-03-02T22:36:18.384245Z] 22:36:18     INFO - Entering test bound searchInBackgroundTab
[task 2017-03-02T22:36:18.388444Z] 22:36:18     INFO - Console message: [JavaScript Error: "Unknown source for search: ContentSearchTest" {file: "resource://app/modules/BrowserUsageTelemetry.jsm" line: 383}]
[task 2017-03-02T22:36:18.390227Z] 22:36:18     INFO - recordSearch@resource://app/modules/BrowserUsageTelemetry.jsm:383:15
[task 2017-03-02T22:36:18.391948Z] 22:36:18     INFO - recordSearchInTelemetry@chrome://browser/content/browser.js:3937:7
[task 2017-03-02T22:36:18.393620Z] 22:36:18     INFO - performSearch@resource://app/modules/ContentSearch.jsm:257:5
[task 2017-03-02T22:36:18.395304Z] 22:36:18     INFO - _onMessageSearch@resource://app/modules/ContentSearch.jsm:414:5
[task 2017-03-02T22:36:18.398524Z] 22:36:18     INFO - this.ContentSearch._onMessage<@resource://app/modules/ContentSearch.jsm:396:13
[task 2017-03-02T22:36:18.400323Z] 22:36:18     INFO - TaskImpl_run@resource://gre/modules/Task.jsm:319:42
[task 2017-03-02T22:36:18.402117Z] 22:36:18     INFO - process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:922:23
[task 2017-03-02T22:36:18.405184Z] 22:36:18     INFO - walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:806:7
[task 2017-03-02T22:36:18.407000Z] 22:36:18     INFO - Promise*scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:739:11
[task 2017-03-02T22:36:18.408749Z] 22:36:18     INFO - schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:770:7
[task 2017-03-02T22:36:18.410534Z] 22:36:18     INFO - Promise.prototype.then@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:455:5
[task 2017-03-02T22:36:18.416656Z] 22:36:18     INFO - TaskImpl_handleResultValue@resource://gre/modules/Task.jsm:396:7
[task 2017-03-02T22:36:18.418420Z] 22:36:18     INFO - TaskImpl_run@resource://gre/modules/Task.jsm:327:15
[task 2017-03-02T22:36:18.420197Z] 22:36:18     INFO - TaskImpl@resource://gre/modules/Task.jsm:277:3
[task 2017-03-02T22:36:18.421897Z] 22:36:18     INFO - asyncFunction@resource://gre/modules/Task.jsm:252:14
[task 2017-03-02T22:36:18.423724Z] 22:36:18     INFO - _processEventQueue/this._currentEventPromise<@resource://app/modules/ContentSearch.jsm:361:15
[task 2017-03-02T22:36:18.425537Z] 22:36:18     INFO - TaskImpl_run@resource://gre/modules/Task.jsm:319:42
[task 2017-03-02T22:36:18.427191Z] 22:36:18     INFO - TaskImpl@resource://gre/modules/Task.jsm:277:3
[task 2017-03-02T22:36:18.428869Z] 22:36:18     INFO - asyncFunction@resource://gre/modules/Task.jsm:252:14
[task 2017-03-02T22:36:18.430495Z] 22:36:18     INFO - Task_spawn@resource://gre/modules/Task.jsm:166:12
[task 2017-03-02T22:36:18.432166Z] 22:36:18     INFO - _processEventQueue@resource://app/modules/ContentSearch.jsm:359:33
[task 2017-03-02T22:36:18.433834Z] 22:36:18     INFO - receiveMessage@resource://app/modules/ContentSearch.jsm:190:5
[task 2017-03-02T22:36:18.435601Z] 22:36:18     INFO - 
[task 2017-03-02T22:36:18.437623Z] 22:36:18     INFO - Buffered messages logged at 22:34:52
[task 2017-03-02T22:36:18.439711Z] 22:36:18     INFO - Waiting for ContentSearch message AddToWhitelistAck...
[task 2017-03-02T22:36:18.442298Z] 22:36:18     INFO - Received ContentSearch message AddToWhitelistAck
[task 2017-03-02T22:36:18.444684Z] 22:36:18     INFO - 
[task 2017-03-02T22:36:18.447205Z] 22:36:18     INFO - Waiting for ContentSearchTest message loadStopped...
[task 2017-03-02T22:36:18.450286Z] 22:36:18     INFO - Buffered messages logged at 22:34:56
[task 2017-03-02T22:36:18.452343Z] 22:36:18     INFO - Console message: [JavaScript Error: "Unknown source for search: ContentSearchTest" {file: "resource://app/modules/BrowserUsageTelemetry.jsm" line: 383}]
[task 2017-03-02T22:36:18.454075Z] 22:36:18     INFO - recordSearch@resource://app/modules/BrowserUsageTelemetry.jsm:383:15
[task 2017-03-02T22:36:18.455752Z] 22:36:18     INFO - recordSearchInTelemetry@chrome://browser/content/browser.js:3937:7
[task 2017-03-02T22:36:18.457628Z] 22:36:18     INFO - performSearch@resource://app/modules/ContentSearch.jsm:257:5
[task 2017-03-02T22:36:18.459516Z] 22:36:18     INFO - _onMessageSearch@resource://app/modules/ContentSearch.jsm:414:5
[task 2017-03-02T22:36:18.462471Z] 22:36:18     INFO - this.ContentSearch._onMessage<@resource://app/modules/ContentSearch.jsm:396:13
[task 2017-03-02T22:36:18.464308Z] 22:36:18     INFO - TaskImpl_run@resource://gre/modules/Task.jsm:319:42
[task 2017-03-02T22:36:18.467695Z] 22:36:18     INFO - process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:922:23
[task 2017-03-02T22:36:18.469436Z] 22:36:18     INFO - walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:806:7
[task 2017-03-02T22:36:18.471450Z] 22:36:18     INFO - Promise*scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:739:11
[task 2017-03-02T22:36:18.476358Z] 22:36:18     INFO - schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:770:7
[task 2017-03-02T22:36:18.478194Z] 22:36:18     INFO - Promise.prototype.then@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:455:5
[task 2017-03-02T22:36:18.479945Z] 22:36:18     INFO - TaskImpl_handleResultValue@resource://gre/modules/Task.jsm:396:7
[task 2017-03-02T22:36:18.481698Z] 22:36:18     INFO - TaskImpl_run@resource://gre/modules/Task.jsm:327:15
[task 2017-03-02T22:36:18.483389Z] 22:36:18     INFO - TaskImpl@resource://gre/modules/Task.jsm:277:3
[task 2017-03-02T22:36:18.485277Z] 22:36:18     INFO - asyncFunction@resource://gre/modules/Task.jsm:252:14
[task 2017-03-02T22:36:18.487033Z] 22:36:18     INFO - _processEventQueue/this._currentEventPromise<@resource://app/modules/ContentSearch.jsm:361:15
[task 2017-03-02T22:36:18.488712Z] 22:36:18     INFO - TaskImpl_run@resource://gre/modules/Task.jsm:319:42
[task 2017-03-02T22:36:18.490357Z] 22:36:18     INFO - TaskImpl@resource://gre/modules/Task.jsm:277:3
[task 2017-03-02T22:36:18.492185Z] 22:36:18     INFO - asyncFunction@resource://gre/modules/Task.jsm:252:14
[task 2017-03-02T22:36:18.494106Z] 22:36:18     INFO - Task_spawn@resource://gre/modules/Task.jsm:166:12
[task 2017-03-02T22:36:18.496073Z] 22:36:18     INFO - _processEventQueue@resource://app/modules/ContentSearch.jsm:359:33
[task 2017-03-02T22:36:18.497990Z] 22:36:18     INFO - receiveMessage@resource://app/modules/ContentSearch.jsm:190:5
[task 2017-03-02T22:36:18.501628Z] 22:36:18     INFO - 
[task 2017-03-02T22:36:18.503675Z] 22:36:18     INFO - Buffered messages finished
[task 2017-03-02T22:36:18.505698Z] 22:36:18     INFO - TEST-UNEXPECTED-FAIL | browser/modules/test/browser_ContentSearch.js | Test timed out - 
[task 2017-03-02T22:36:18.507476Z] 22:36:18     INFO - Search engine removed: testEngine.xml
[task 2017-03-02T22:36:19.234038Z] 22:36:19     INFO - MEMORY STAT | vsize 2059MB | residentFast 238MB | heapAllocated 96MB
[task 2017-03-02T22:36:19.237940Z] 22:36:19     INFO - TEST-OK | browser/modules/test/browser_ContentSearch.js | took 91259ms


[0] https://treeherder.mozilla.org/logviewer.html#?repo=try&job_id=81269889&lineNumber=8212
it looks like we are failing in the searchisbackgroundtab test:
https://dxr.mozilla.org/mozilla-central/source/browser/modules/test/browser/browser_ContentSearch.js?q=browser_ContentSearch.js&redirect_type=direct#125

:standard8, I see that you are the author of this test (a few weeks ago!)- well upon further investigation this was just moving tests between directories- is this a test which you could help figure out or get the right people on this bug?
Flags: needinfo?(standard8)
Whiteboard: [fxsearch] → [fxsearch][stockwell needswork]
Drew, can you take a look? This seems to be the waitForLoadAndStopIt function that isn't sending a message back to the content space. Do you know where the alternative implementations of that function are? (xref bug 1188692 comment 58)
Flags: needinfo?(standard8) → needinfo?(adw)
Given the frequency of this happening, bumping to a P2, though maybe we should make it a P1.
Priority: P3 → P2
thanks for picking this up Mark, this seems to be:
osx opt browser-chrome e10s chunk 1
linux64 debug browser-chrome e10s chunk 1

As we have had 2 weeks of high failures, I agree we need to look into this more seriously- is there any further information about the failure pattern or logging info that would be helpful?
:standard8, do you have any updates here?  We are at 3 weeks now and I would prefer to temporarily disable this unless there is some work in progress
Flags: needinfo?(standard8)
(In reply to Joel Maher ( :jmaher) from comment #27)
> :standard8, do you have any updates here?  We are at 3 weeks now and I would
> prefer to temporarily disable this unless there is some work in progress

I was hoping for a response from Drew. Unfortunately I'm focussed on other high priority things for at least the first part of this week. I'll see if Panos can help get some movement.
Flags: needinfo?(standard8) → needinfo?(past)
Sorry Mark, I just don't have time to look into this right now.  The other implementations of waitForDocLoadAndStopIt are doing the same thing that this one is.  It would take some investigation to figure out what's going on.  The thing that stands out in the log is that the URL that's being waited for is actually loading.

I think it's unwise to disable this, not that my opinion matters at all.  The latest bot comment says it has a failure rate of 4.6%.  Not great, but not terrible.  Disable it and we go from 95.4% successful coverage to 0%.
Flags: needinfo?(adw)
The other waitForDocLoadAndStopIt are here by the way: https://dxr.mozilla.org/mozilla-central/search?q=flags%20%26%20docStart

Although now that I look at the test code again, it looks like what's happening is that waitForDocLoadAndStopIt is working, and it's something after that's going wrong.  waitForDocLoadAndStop just happens to be the last thing that logs before the hang.
Unfortunately both Mark and Drew are busy with higher priority projects right now, but I'd like us to fix this ASAP, so I'm raising the priority to P1 and I'll shop around for volunteers. Joel, if you feel that we absolutely must disable the test in the meantime, could we just disable it on macOS (and perhaps also linux64), so that the number of failures goes below your threshold, but we still maintain some test coverage?
Flags: needinfo?(past)
Priority: P2 → P1
:past, let me check in on Thursday to see if we have someone to look at this- I am happy to be patient assuming we have someone who is working on the problem.
Summary: Intermittent browser/modules/test/browser_ContentSearch.js | Test timed out - → Intermittent browser/modules/test/browser/browser_ContentSearch.js | Test timed out -
checking back in here- the failure rate is still steady.  Any luck finding someone to look into this in the next week or so?
Flags: needinfo?(past)
I'll look at it.
Assignee: nobody → adw
Status: NEW → ASSIGNED
Flags: needinfo?(past)
hi :adw, this bug is assigned to you, have you made any progress in fixing this?  Are you getting stuck with things that maybe I can help with?
I haven't time yet, but it's still on my radar.
going on 4 weeks with no pending fix, lets reduce the pain and disable this until there is more time to work on fixing it.
Attachment #8861063 - Flags: review?(gbrown)
Attachment #8861063 - Flags: review?(gbrown) → review+
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/52a9eb86ebf9
Intermittent browser/modules/test/browser/browser_ContentSearch.js. temporarily disable. r=gbrown
https://hg.mozilla.org/mozilla-central/rev/52a9eb86ebf9
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Joel, you annotated the manifest entry with this bug number but didn't set leave-open. Was that on purpose? Typically we'd want to point to the bug filed for fixing & re-enabling the test rather than the bug that disabled it.
Flags: needinfo?(jmaher)
https://hg.mozilla.org/releases/mozilla-beta/rev/9831b9a4dcdc
Whiteboard: [fxsearch][stockwell needswork] → [fxsearch][stockwell fixed]
ok, I didn't clean this bug up- was in a rush yesterday- so it needs to be leave-open!
Status: RESOLVED → REOPENED
Flags: needinfo?(jmaher)
Keywords: leave-open
Resolution: FIXED → ---
Whiteboard: [fxsearch][stockwell fixed] → [fxsearch][stockwell disabled]
Assignee: adw → nobody
Assignee: nobody → standard8
I've done a few try runs with the test re-enabled. What I've found so far is:

- Mac seems to be mainly failing when starting one of the individual tests, i.e. during the addTab(). There's no output, so my best guess is that the tab load is failing to be caught properly.

This appears to be using a custom load handler, I think we can do better by using BrowserTestUtils - so I'll try that next.

- The other failures mainly seen on Linux (but also occasionally on Linux/Windows are issues where we're waiting for notification that the process has stopped the load of the search but we never get it. The screenshots seem to imply that it doesn't stop the load as well.

Still thinking about this. I suspect it is something in contentSearch.js that we could do better, not sure what yet.
(In reply to Mark Banner (:standard8) from comment #56)
> - Mac seems to be mainly failing when starting one of the individual tests,
> i.e. during the addTab(). There's no output, so my best guess is that the
> tab load is failing to be caught properly.
> 
> This appears to be using a custom load handler, I think we can do better by
> using BrowserTestUtils - so I'll try that next.

I was correct here, switching to BrowserTestUtils to wait for the tab to open & load makes the test stable on Mac.

> - The other failures mainly seen on Linux (but also occasionally on
> Linux/Windows are issues where we're waiting for notification that the
> process has stopped the load of the search but we never get it. The
> screenshots seem to imply that it doesn't stop the load as well.
> 
> Still thinking about this. I suspect it is something in contentSearch.js
> that we could do better, not sure what yet.

It appears that the listener that contentSearch.js adds to webProgress is begin garbage collected as the webProgress interface is handling a weak reference. As it is gc, this would explain the intermittent nature.

I'm cleaning up my patches for this and will get something posted in a bit.
Status: REOPENED → ASSIGNED
I did various re-runs on the try build associated with MozReview & none of the failures that turned up where the ones in this test. So hopefully we're good to go assuming it is ok in review.
Comment on attachment 8885661 [details]
Bug 1308343 - Fix intermittent failures in browser_ContentSearch.js by ensuring we keep listeners alive and listening for loads correctly.

https://reviewboard.mozilla.org/r/156516/#review161908

Thanks Mark.  The change from callbacks to promises is nice too.
Attachment #8885661 - Flags: review?(adw) → review+
Pushed by mbanner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/868c580d8d9d
Fix intermittent failures in browser_ContentSearch.js by ensuring we keep listeners alive and listening for loads correctly. r=adw
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/868c580d8d9d
Status: ASSIGNED → RESOLVED
Closed: 3 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: Firefox 55 → Firefox 56
You need to log in before you can comment on or make changes to this bug.