Closed Bug 769524 Opened 12 years ago Closed 11 years ago

Frequent robocop testSearchSuggestions | GeckoEventExpecter - blockForEvent timeout: SearchEngines:Data

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 22

People

(Reporter: philor, Assigned: gbrown)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 1 obsolete file)

Brought to you by bug 756704 turning hangs and timeouts into test failures, near as I can tell.

https://tbpl.mozilla.org/php/getParsedLog.php?id=13080199&tree=Mozilla-Inbound
Android Tegra 250 mozilla-inbound opt test robocop on 2012-06-28 12:43:08 PDT for push 23d9d354236e
slave: tegra-141

1 INFO TEST-START | testSearchSuggestions
2 INFO TEST-UNEXPECTED-FAIL | testSearchSuggestions | GeckoEventExpecter - blockForEvent timeout: SearchEngines:Data
Exception caught during test!
junit.framework.AssertionFailedError: 2 INFO TEST-UNEXPECTED-FAIL | testSearchSuggestions | GeckoEventExpecter - blockForEvent timeout: SearchEngines:Data
	at junit.framework.Assert.fail(Assert.java:47)
	at org.mozilla.fennec.FennecMochitestAssert._logMochitestResult(FennecMochitestAssert.java:107)
	at org.mozilla.fennec.FennecMochitestAssert.ok(FennecMochitestAssert.java:137)
	at org.mozilla.fennec.FennecNativeActions$GeckoEventExpecter.blockForEvent(FennecNativeActions.java:141)
	at org.mozilla.fennec.tests.testSearchSuggestions.testSearchSuggestions(testSearchSuggestions.java:46)
	at java.lang.reflect.Method.invokeNative(Native Method)
	at java.lang.reflect.Method.invoke(Method.java:521)
	at android.test.InstrumentationTestCase.runMethod(InstrumentationTestCase.java:204)
	at android.test.InstrumentationTestCase.runTest(InstrumentationTestCase.java:194)
	at android.test.ActivityInstrumentationTestCase2.runTest(ActivityInstrumentationTestCase2.java:186)
	at org.mozilla.fennec.tests.BaseTest.runTest(BaseTest.java:93)
	at junit.framework.TestCase.runBare(TestCase.java:127)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:169)
	at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:154)
	at android.test.InstrumentationTestRunner.onStart(InstrumentationTestRunner.java:520)
	at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1447)
3 INFO TEST-UNEXPECTED-FAIL | testSearchSuggestions | Exception caught - junit.framework.AssertionFailedError: 2 INFO TEST-UNEXPECTED-FAIL | testSearchSuggestions | GeckoEventExpecter - blockForEvent timeout: SearchEngines:Data
Assignee: nobody → gbrown
In my testing, I see the SearchEngines:Data event received very quickly -- we wait less than 20 ms usually. So the 90000 ms timeout seems reasonable. 

I think the test may be occasionally missing the SearchEngines:Data event by starting to listen too late. This patch moves creation of the event expecter back to before we start waiting for Gecko:Ready. Is that reasonable?
Attachment #637945 - Flags: review?(bnicholson)
(In reply to Geoff Brown [:gbrown] from comment #7)
> Created attachment 637945 [details] [diff] [review]
> patch to listen for suggestions data event earlier
> 
> In my testing, I see the SearchEngines:Data event received very quickly --
> we wait less than 20 ms usually. So the 90000 ms timeout seems reasonable. 
> 
> I think the test may be occasionally missing the SearchEngines:Data event by
> starting to listen too late. This patch moves creation of the event expecter
> back to before we start waiting for Gecko:Ready. Is that reasonable?

I don't understand how that could happen; we only send the SearchEngines:Get message after the AwesomeBar has been opened, so SearchEngines:Data should only be received then as well. As long as we listen for SearchEngines:Data before we call clickOnAwesomeBar(), I think we should be safe.

Are you seeing SearchEngines:Data received before the AwesomeBar is clicked? If so, I don't think that should be happening...
(In reply to Brian Nicholson (:bnicholson) from comment #9)
> Are you seeing SearchEngines:Data received before the AwesomeBar is clicked?

I don't have conclusive evidence of that, no. The tbpl logs suggest there are times when we wait 90 s for SearchEngines:Data and don't receive it, and I thought we might have missed it...but if that doesn't make sense, then....?
Comment on attachment 637945 [details] [diff] [review]
patch to listen for suggestions data event earlier

Cancelling review -- this isn't a sensible solution and probably won't help.
Attachment #637945 - Flags: review?(bnicholson)
Probable dup of 771189 -- just marking as a dependency for now.
Depends on: 771189
(In reply to TinderboxPushlog Robot from comment #32)
> philor
> https://tbpl.mozilla.org/php/getParsedLog.php?id=13296058&tree=Mozilla-
> Inbound
> Android Tegra 250 mozilla-inbound opt test robocop on 2012-07-06 11:23:13
> slave: tegra-087

Well that is disappointing! This happened after the fix for 771189 landed on inbound.
(In reply to TinderboxPushlog Robot from comment #34)
> philor
> https://tbpl.mozilla.org/php/getParsedLog.php?id=13296701&tree=Mozilla-
> Inbound
> Android Tegra 250 mozilla-inbound opt test robocop on 2012-07-06 11:58:21
> slave: tegra-248
 
This one is different -- see bug 769145.
(In reply to Geoff Brown [:gbrown] from comment #35)
> (In reply to TinderboxPushlog Robot from comment #34)
> > philor
> > https://tbpl.mozilla.org/php/getParsedLog.php?id=13296701&tree=Mozilla-
> > Inbound
> > Android Tegra 250 mozilla-inbound opt test robocop on 2012-07-06 11:58:21
> > slave: tegra-248
>  
> This one is different -- see bug 769145.

I filed bug 771677 separately to keep track of this.
Not exactly a proper fix, but it should avoid these errors until we find the real cause.

Try push: https://tbpl.mozilla.org/?tree=Try&rev=fbe54653ee9f
Attachment #639941 - Flags: review?(gbrown)
Comment on attachment 639941 [details] [diff] [review]
Add sleep to testSearchSuggestions to workaround blocked event queue

Review of attachment 639941 [details] [diff] [review]:
-----------------------------------------------------------------

I think it's fine to land this to resolve the frequent oranges while we sort this out. Please [leave open] this bug.
Attachment #639941 - Flags: review?(gbrown) → review+
Attachment #637945 - Attachment is obsolete: true
Investigating today, I found no sign of the blocked event queue -- I'm pretty sure that problem was resolved by bug 771189.

I can reproduce this problem quite consistently in local test runs; the test successfully clicks on the awesome bar, the UI remains responsive (no ANR) but the wait for SearchEngines:Data times out.

Adding a sleep, as in bnicholson's patch, allows the test to succeed.

Without the sleep, I see SearchEngines:Get sent:

https://hg.mozilla.org/mozilla-central/file/37fc31499d8a/mobile/android/base/awesomebar/AllPagesTab.java#l82

but the handler:

https://hg.mozilla.org/mozilla-central/file/37fc31499d8a/mobile/android/chrome/content/browser.js#l5666

is not called.

It looks like SearchService's _syncInit

https://hg.mozilla.org/mozilla-central/file/37fc31499d8a/toolkit/components/search/nsSearchService.js#l2526

is not called, although it is scheduled to be called just around the time that SearchEngines:Get is sent:

https://hg.mozilla.org/mozilla-central/file/37fc31499d8a/toolkit/components/search/nsSearchService.js#l3230


I suspect we need to call SearchService:ensureInitialized before/while processing SearchEngines:Get to avoid this failure. Will continue investigating tomorrow...
Attached patch debug patchSplinter Review
I'm still seeing a problem with the SearchEngines:Get event getting processed in the queue. With this patch applied, here's the output I'm seeing:

I/Gecko   ( 7059): BRN: posting event: Accessibility:Settings
I/Gecko   ( 7059): BRN: posting event: Preferences:Get
I/Gecko   ( 7059): BRN: posting event: Accessibility:Settings
I/Gecko   ( 7059): BRN: posting event: Tab:Selected
I/Gecko   ( 7059): BRN: posting event: Window:Resize
I/Gecko   ( 7059): BRN: posting event: Preferences:Get
I/Gecko   ( 7059): BRN: posting event: SearchEngines:Get
I/Gecko   ( 7059): BRN: popping event: Accessibility:Settings
I/Gecko   ( 7059): BRN: popping event: Preferences:Get
I/Gecko   ( 7059): BRN: popping event: Accessibility:Settings
I/Gecko   ( 7059): BRN: popping event: Tab:Selected
I/Gecko   ( 7059): BRN: popping event: Window:Resize
I/Gecko   ( 7059): BRN: popping event: Preferences:Get

That is, SearchEngines:Get is received and posted in AndroidJNI.cpp, but isn't popped off the queue to be processed. As a result, the SearchEngines:Get observer in browser.js is never getting called.
I definitely see SearchEngines:Get popped:

07-10 09:45:55.414  4326  4335 I Gecko   : BRN: posting event: Accessibility:Settings
07-10 09:45:55.414  4326  4335 I Gecko   : BRN: posting event: Telemetry:Add
07-10 09:45:55.414  4326  4335 I Gecko   : BRN: posting event: Preferences:Get
07-10 09:45:55.414  4326  4335 I Gecko   : BRN: posting event: Preferences:Get
07-10 09:45:55.414  4326  4335 I Gecko   : BRN: posting event: Accessibility:Settings
07-10 09:45:55.414  4326  4335 I Gecko   : BRN: posting event: Tab:Selected
07-10 09:45:55.464  4326  4335 I Gecko   : BRN: posting event: Window:Resize
07-10 09:45:55.480  4326  4335 I Gecko   : BRN: posting event: Preferences:Get
07-10 09:45:55.772  4326  4326 I Gecko   : BRN: posting event: SearchEngines:Get
07-10 09:45:55.796  4326  4335 I Gecko   : BRN: popping event: Accessibility:Settings
07-10 09:45:55.854  4326  4335 I Gecko   : BRN: popping event: Telemetry:Add
07-10 09:45:56.132  4326  4335 I Gecko   : BRN: popping event: Preferences:Get
07-10 09:45:56.172  4326  4335 I Gecko   : BRN: popping event: Preferences:Get
07-10 09:45:57.290  4326  4326 I Gecko   : BRN: posting event: Passwords:Init
07-10 09:45:57.296  4326  4335 I Gecko   : BRN: popping event: Accessibility:Settings
07-10 09:45:57.296  4326  4335 I Gecko   : BRN: popping event: Tab:Selected
07-10 09:45:57.308  4326  4326 I Gecko   : BRN: posting event: Passwords:Init
07-10 09:45:57.308  4326  4335 I Gecko   : BRN: popping event: Window:Resize
07-10 09:45:57.332  4326  4335 I Gecko   : BRN: popping event: Preferences:Get
07-10 09:45:57.336  4326  4335 I Gecko   : BRN: popping event: SearchEngines:Get
07-10 09:45:57.416  4326  4335 I Gecko   : BRN: popping event: Passwords:Init

...and ExecuteSoon(_syncInit) is usually called, but _syncInit is not executed. 

So I suppose something (something unrelated?) is causing that thread to hang, and we are just hanging at different points in the SearchEngines:Get processing.
It looks like this is a slight variation on bug 771189. The Gecko thread is waiting in CreateCompositor, and there is another thread waiting on waitForValidSurface.
Depends on: 772672
Whiteboard: [orange] [leave open] → [orange]
Attachment #639941 - Flags: checkin+
Whiteboard: [orange]
Could this be related to the fact that bug 760036 makes nsSearchService initialization async?
(In reply to Geoff Brown [:gbrown] from comment #74)
> ...and ExecuteSoon(_syncInit) is usually called, but _syncInit is not
> executed. 
> 
> So I suppose something (something unrelated?) is causing that thread to
> hang, and we are just hanging at different points in the SearchEngines:Get
> processing.

Ah, I had missed that one. That looks scary.
If necessary, I can certainly hack around the symptoms with a very unsatisfactory solution: ensuring that nsSearchService always takes the deprecated, synchronous, main thread initialization path under Android.
Marking as blocker, as this prevents bug 760036, itself a blocker, from landing.
Severity: normal → blocker
I backed out the workaround (sleep) on a try push just to verify this is still a problem: It failed on nearly every run, with the same blockForEvent timeout.

https://tbpl.mozilla.org/?tree=Try&rev=e1359a5dedd0
I also added some logging and verified that the cause of the timeouts is bug 772672 still:

test waits for Gecko:Ready
test initializes expecter for SearchEngines:Data
test clicks on awesomebar
AllPagesTab is created and sends SearchEngines:Get
surfaceDestroyed is called
CreateCompositor is called
waitForValidSurface is called
test waits for SearchEngines:Data... and times out after 90 s of inactivity on all threads

With the sleep, events proceed like this:

test waits for Gecko:Ready
test sleeps for 5 seconds during which:
  CreateCompositor is called
  waitForValidSurface is called
  waitForValidSurface completes
  CreateCompositor completes
test initializes expecter for SearchEngines:Data
test clicks on awesomebar
AllPagesTab is created and sends SearchEngines:Get
surfaceDestroyed is called
SearchEngines:Get is received
SearchEngines:Data is sent
test receives SearchEngines:Data and proceeds normally -- all tests pass
Returning severity to normal, since this no longer blocks 760036.
Severity: blocker → normal
Thanks for the help, gbrown.
I checked to see if the recent work on bug 826300 might allow us to remove the work-around; it does not. Without the initial sleep, this test still intermittently fails with a timeout waiting for SearchEngines:Data.
These last 4 failures appear to be caused by the patch for bug 777712 -- backed out for other test failures.
Now that bug 772672 should be fixed, I'm re-testing to see if removing the sleep workaround here works. Try push at https://tbpl.mozilla.org/?tree=Try&rev=add2900e0aaa
Try run looks good now
Attachment #720049 - Flags: review?(gbrown)
Comment on attachment 720049 [details] [diff] [review]
Remove workaround

Review of attachment 720049 [details] [diff] [review]:
-----------------------------------------------------------------

Great! Thanks much!
Attachment #720049 - Flags: review?(gbrown) → review+
Adding checkin-needed for the above patch since inbound is closed.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/93b2766c6f25
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 22
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: