Closed Bug 907768 Opened 11 years ago Closed 9 years ago

Intermittent testSearchSuggestions | Results for query 'f' matched expected suggestions - got false, expected true

Categories

(Firefox for Android Graveyard :: Awesomescreen, defect)

ARM
Android
defect
Not set
normal

Tracking

(firefox35 wontfix, firefox36 fixed, firefox37 fixed, firefox-esr31 unaffected)

RESOLVED FIXED
Firefox 37
Tracking Status
firefox35 --- wontfix
firefox36 --- fixed
firefox37 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: emorley, Assigned: rnewman)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 2 obsolete files)

Android Armv6 Tegra 250 mozilla-central opt test robocop-1 on 2013-08-21 06:29:13 PDT for push b2486721572e

slave: tegra-196

https://tbpl.mozilla.org/php/getParsedLog.php?id=26817957&tree=Mozilla-Central

{
1 INFO TEST-START | testSearchSuggestions
EventExpecter: no longer listening for Gecko:Ready
2 INFO TEST-PASS | testSearchSuggestions | waiting for urlbar text to gain focus - urlbar text gained focus
EventExpecter: no longer listening for SearchEngines:Data
waitForTest timeout after 5000 ms
3 INFO TEST-UNEXPECTED-FAIL | testSearchSuggestions | Results for query 'f' matched expected suggestions - got false, expected true
Exception caught during test!
junit.framework.AssertionFailedError: 3 INFO TEST-UNEXPECTED-FAIL | testSearchSuggestions | Results for query 'f' matched expected suggestions - got false, expected true
	at junit.framework.Assert.fail(Assert.java:47)
	at org.mozilla.fennec.FennecMochitestAssert._logMochitestResult(FennecMochitestAssert.java:107)
	at org.mozilla.fennec.FennecMochitestAssert.ok(FennecMochitestAssert.java:136)
	at org.mozilla.fennec.FennecMochitestAssert.is(FennecMochitestAssert.java:142)
	at org.mozilla.fennec.tests.testSearchSuggestions.testSearchSuggestions(testSearchSuggestions.java:88)
	at java.lang.reflect.Method.invokeNative(Native Method)
	at java.lang.reflect.Method.invoke(Method.java:521)
	at android.test.InstrumentationTestCase.runMethod(InstrumentationTestCase.java:204)
	at android.test.InstrumentationTestCase.runTest(InstrumentationTestCase.java:194)
	at android.test.ActivityInstrumentationTestCase2.runTest(ActivityInstrumentationTestCase2.java:186)
	at org.mozilla.fennec.tests.BaseTest.runTest(BaseTest.java:160)
	at junit.framework.TestCase.runBare(TestCase.java:127)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:169)
	at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:154)
	at android.test.InstrumentationTestRunner.onStart(InstrumentationTestRunner.java:520)
	at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1447)
4 INFO TEST-UNEXPECTED-FAIL | testSearchSuggestions | Exception caught - junit.framework.AssertionFailedError: 3 INFO TEST-UNEXPECTED-FAIL | testSearchSuggestions | Results for query 'f' matched expected suggestions - got false, expected true
}
The logs in Comment 55 and 56 have screenshots that might be useful.

bnicholson -- can you have a look?
Flags: needinfo?(bnicholson)
Attached image screenshot
Posting screenshot here so it doesn't get lost.
Assignee: nobody → bnicholson
Status: NEW → ASSIGNED
Flags: needinfo?(bnicholson)
Whiteboard: [test disabled][leave open]
I believe this is a race condition between the real search suggestion handler being attached and our fake test one. Our mocking here is pretty dodgy.
Indeed, adding logging:

        suggestMap.put("f",        new ArrayList<String>() {{ add("f"); add("facebook"); add("fandango"); add("frys"); }});


01-01 17:06:23.499 I/GeckoTestSS(11730): Suggestion 'free online games' not equal to expected 'fandango'.

because Yahoo's sugggestions for 'f' are:

  facebook     free online games    fandango
We're setting the search suggestion handler from two different threads:

 XXX setSuggestClient on thread 1
 java.lang.RuntimeException: stack
 	at org.mozilla.gecko.home.BrowserSearch.setSuggestClient(BrowserSearch.java:589)
 	at org.mozilla.gecko.home.BrowserSearch.access$500(BrowserSearch.java:67)
 	at org.mozilla.gecko.home.BrowserSearch$4.run(BrowserSearch.java:355)
 	at android.os.Handler.handleCallback(Handler.java:739)

 XXX setSuggestClient on thread 642
 java.lang.RuntimeException: stack
 	at org.mozilla.gecko.home.BrowserSearch.setSuggestClient(BrowserSearch.java:589)
 	at org.mozilla.gecko.tests.testSearchSuggestions.connectSuggestClient(testSearchSuggestions.java:119)
 	at org.mozilla.gecko.tests.testSearchSuggestions.testSearchSuggestions(testSearchSuggestions.java:53)


Not only is this not thread safe, but it'll race with the real suggestion client.
Assignee: bnicholson → rnewman
The existing test did some really funky stuff to try to mock out BrowserSearch, swizzling one of its private members and racing with an event listener.

Rather than doing that, this patch simply exposes the SuggestClient creation as a factory, which the test mocks at its leisure.

That also allowed for some simplification of the rest of BrowserSearch.

Passes locally (when it reliably failed); try build in progress.
Attachment #8543204 - Flags: review?(mark.finkle)
Attachment #8543204 - Flags: review?(bnicholson)
Comment on attachment 8543204 [details] [diff] [review]
Rewrite search suggestion test hooks. v1

Works locally, but apparently still not enough for infra. Looking further.
Attachment #8543204 - Flags: review?(mark.finkle)
Attachment #8543204 - Flags: review?(bnicholson)
I have this working reliably on 4.0 ARM and x86. 2.3 seems to regularly fail like this:

01-02 13:11:52.746 I/UpdateService(  561): opening connection with url: https://aus4.mozilla.org/update/4/Fennec/37.0a1/20150102093802/Android_arm-eabi-gcc3/en-US/default/2.3.7/default/default/37.0a1/update.xml
01-02 13:11:52.831 D/GeckoToolbar(  505): onTabChanged: FAVICON
01-02 13:11:52.831 D/GeckoToolbarDisplayLayout(  505): Ignoring favicon: new image is identical to previous one.
01-02 13:11:52.852 D/GeckoBrowserApp(  505): BrowserApp.onTabChanged: 0: FAVICON
01-02 13:11:57.202 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420233117180,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:11:57.771 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420233117730,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:11:57.786 D/dalvikvm(  505): GC_CONCURRENT freed 784K, 47% free 3923K/7303K, external 1520K/1813K, paused 17ms+128ms
01-02 13:11:58.301 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420233118277,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:11:58.821 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420233118813,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:11:59.361 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420233119353,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:11:59.922 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420233119900,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:12:00.461 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420233120452,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:12:00.991 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420233120969,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:12:01.511 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420233121504,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:12:02.055 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420233122043,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:12:02.094 I/Robocop (  505): {"action":"log","message":"waitForCondition timeout after 5000 ms.","time":1420233122075,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 13:12:02.141 I/Robocop (  505): {"message":"got false, expected true","time":1420233122106,"expected":"PASS","source":"robocop","status":"FAIL","test":"testSearchSuggestions","thread":null,"subtest":"Results for query 'f' matched expected suggestions","action":"test_status","pid":null}
01-02 13:12:02.732 D/dalvikvm(  505): GC_EXTERNAL_ALLOC freed 255K, 47% free 3908K/7303K, external 1508K/1813K, paused 296ms

which to me says that 5 seconds just isn't enough to wait for the search stuff to finish loading.

I'm trying again with a 15-second timeout.
This layers in an additional, more descriptive check for the situation in which the test's approach won't work.

It also shifts swizzling to be as early as it can be (which seems to fix test issues), and lengthens the timeout to 15 seconds (which I'm hoping will make 2.3 work).
Attachment #8543204 - Attachment is obsolete: true
*sigh*

01-02 19:04:12.851 I/Robocop ( 3130): {"message":"urlbar text gained focus","time":1420225452858,"source":"robocop","status":"PASS","test":"testSearchSuggestions","thread":null,"subtest":"waiting for urlbar text to gain focus","action":"test_status","pid":null}
01-02 19:04:12.851 I/Robocop ( 3130): {"message":"","time":1420225452862,"source":"robocop","status":"PASS","test":"testSearchSuggestions","thread":null,"subtest":"There is no existing search client.","action":"test_status","pid":null}
01-02 19:04:12.929 D/Telemetry( 3130): StartUISession: frecency.1
01-02 19:04:13.140 D/GeckoLayerClient( 3130): Window-size changed to (1280,564)
01-02 19:04:14.000 D/dalvikvm( 3130): GC_CONCURRENT freed 587K, 8% free 8434K/9095K, paused 3ms+6ms
01-02 19:04:14.000 I/GeckoSuggestClient( 3130): Not connected to network
01-02 19:04:14.226 E/GeckoNetworkManager( 3130): Registering receiver failed
01-02 19:04:14.773 V/tiny_hw ( 1295): out_standby(0xd61160) closing PCM
01-02 19:04:15.804 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225455807,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:16.335 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225456338,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:16.851 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225456857,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:17.375 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225457377,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:17.875 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225457882,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:18.414 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225458417,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:18.929 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225458936,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:19.453 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225459459,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:19.968 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225459977,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:20.484 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225460486,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:21.023 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225461031,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:21.562 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225461565,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:22.109 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225462114,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:22.632 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225462640,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:23.257 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225463261,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:23.773 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225463778,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:24.281 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225464282,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:24.820 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225464826,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:25.359 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225465363,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:25.921 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225465923,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:26.476 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225466463,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:26.531 D/dalvikvm( 3130): GC_CONCURRENT freed 585K, 8% free 8447K/9159K, paused 0ms+1ms
01-02 19:04:26.984 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225466992,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:27.492 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225467497,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:28.000 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225468005,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:28.531 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225468534,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:29.031 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225469037,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:29.562 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225469567,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:30.101 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225470103,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:30.648 I/Robocop ( 3130): {"action":"log","message":"Fail: queryChild is null or GONE.","time":1420225470656,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:30.656 I/Robocop ( 3130): {"action":"log","message":"waitForCondition timeout after 15000 ms.","time":1420225470659,"pid":null,"level":"info","source":"robocop","thread":null}
01-02 19:04:30.656 I/Robocop ( 3130): {"message":"got false, expected true","time":1420225470661,"expected":"PASS","source":"robocop","status":"FAIL","test":"testSearchSuggestions","thread":null,"subtest":"Results for query 'f' matched expected suggestions","action":"test_status","pid":null}
01-02 19:04:30.757 D/dalvikvm( 3130): GC_FOR_ALLOC freed 287K, 8% free 8449K/9159K, paused 44ms
The suggest code had a mechanism to avoid checking for online status. The test didn't use it.

With that fixed, and the race rewrite, the only remaining failure appears to be:

01-04 00:04:27.630 I/UpdateService(  561): opening connection with url: https://aus4.mozilla.org/update/4/Fennec/37.0a1/20150103210947/Android_arm-eabi-gcc3/en-US/default/2.3.7/default/default/37.0a1/update.xml
01-04 00:04:27.976 I/Robocop (  505): {"action":"log","message":"Fail: suggestionGroup is null.","time":1420358667967,"pid":null,"level":"info","source":"robocop","thread":null}

That is: we trying to fetch an update, and that seems to totally prevent the HTTP request that would fetch search suggestions.

Other than that, my try build seems to be pretty green.
Depends on: 1117230
Yup:

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=b1bdc09a92aa

No failures on x86 (five runs) or 11+ (eight runs).

Three failures on 2.3 out of five runs, all of which featured an update check immediately prior to the failure.

So I'm gonna call this good, fold the patch, ask for review, and leave the updater part to Bug 1117230.
This patch fixes all testSearchSuggestions intermittent failures, except for those that appear to be caused by update checks on 2.3.

It also replaces all waitForTest (deprecated) calls with waitForCondition.
Attachment #8543738 - Flags: review?(bnicholson)
Attachment #8543441 - Attachment is obsolete: true
Comment on attachment 8543738 [details] [diff] [review]
Rewrite search suggestion test hooks, fix network bypass in SuggestClient. v3

Review of attachment 8543738 [details] [diff] [review]:
-----------------------------------------------------------------

Nice. This looks good to me.
Attachment #8543738 - Flags: review?(bnicholson) → review+
Attachment #8543739 - Flags: review?(bnicholson) → review+
The screenshot in that failure looks pretty dire: no search engines listed at all. Previously most failures looked like


[f                        ]
(icon) [f]
(icon) [f]
(icon) [f]


This failure is:

[f                        ]


which means we haven't even handled the search engine data message from Gecko.

This looks like basically a startup hang, and the root cause likely isn't in this test itself. Let's see how many more failures we get.
I turned this off on 2.3, because hatred.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [test disabled][leave open] → [test disabled on 2.3]
Target Milestone: --- → Firefox 37
Incidentally, from Bug 1110681:

---
I have looked at SearchService initialization. I believe that the asynchronous initialization never completes. This may be due to a bug in the code (the loading of search engines, in particular, is way too complicated to be easy to trust) or in a notification observer that could be spinning the event loop and causing havoc.
---

That would explain my observation in Comment 96.
Mind requesting Aurora approval for this? :)
Flags: needinfo?(rnewman)
Comment on attachment 8543738 [details] [diff] [review]
Rewrite search suggestion test hooks, fix network bypass in SuggestClient. v3

Approval Request Comment
[Feature/regressing bug #]:
  Long-standing test wonkiness.

[User impact if declined]:
  Sheriffs will continue to be sad for a long time.

[Describe test coverage new/current, TBPL]:
  This is mostly a test change; small modification to search code to allow for more correct mocking.

[Risks and why]: 
  Relatively low risk. Well-understood transformation.

[String/UUID change made/needed]:
  None.
Flags: needinfo?(rnewman)
Attachment #8543738 - Flags: approval-mozilla-aurora?
Attachment #8543738 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: [test disabled on 2.3]
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.