Closed Bug 869277 Opened 7 years ago Closed 6 years ago

Intermittent testAddSearchEngine | The number of search results has increased - got 5, expected 10

Categories

(Firefox for Android :: General, defect)

22 Branch
ARM
Android
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 23

People

(Reporter: RyanVM, Assigned: AdrianT)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 4 obsolete files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=22649173&tree=Mozilla-Inbound

Android Armv6 Tegra 250 mozilla-inbound opt test robocop-2 on 2013-05-06 13:42:15 PDT for push b24fa214730b
slave: tegra-055

0 INFO SimpleTest START
1 INFO TEST-START | testAddSearchEngine
EventExpecter: no longer listening for Gecko:Ready
2 INFO TEST-PASS | testAddSearchEngine | Awesomebar URL typed properly - http://mochi.test:8888/tests/robocop/robocop_blank_01.html should equal http://mochi.test:8888/tests/robocop/robocop_blank_01.html
EventExpecter: no longer listening for DOMContentLoaded
3 INFO TEST-PASS | testAddSearchEngine | There should be search engines displayed when text is entered - 9 should not equal -1
4 INFO TEST-PASS | testAddSearchEngine | Awesomebar URL typed properly - http://mochi.test:8888/tests/robocop/robocop_search.html should equal http://mochi.test:8888/tests/robocop/robocop_search.html
EventExpecter: no longer listening for DOMContentLoaded
Long Clicking at width = 20 and height = 83
5 INFO TEST-PASS | testAddSearchEngine | Waiting for the context menu to be opened - The context menu was opened
6 INFO TEST-PASS | testAddSearchEngine | Adding the search engine - The add serach engine pop-up has been cloesed
7 INFO TEST-PASS | testAddSearchEngine | There should be search engines displayed when text is entered - 5 should not equal -1
8 INFO TEST-UNEXPECTED-FAIL | testAddSearchEngine | The number of search results has increased - got 5, expected 10
Exception caught during test!
junit.framework.AssertionFailedError: 8 INFO TEST-UNEXPECTED-FAIL | testAddSearchEngine | The number of search results has increased - got 5, expected 10
	at junit.framework.Assert.fail(Assert.java:47)
	at org.mozilla.fennec.FennecMochitestAssert._logMochitestResult(FennecMochitestAssert.java:107)
	at org.mozilla.fennec.FennecMochitestAssert.ok(FennecMochitestAssert.java:136)
	at org.mozilla.fennec.FennecMochitestAssert.is(FennecMochitestAssert.java:142)
	at org.mozilla.fennec.tests.testAddSearchEngine.testAddSearchEngine(testAddSearchEngine.java:64)
	at java.lang.reflect.Method.invokeNative(Native Method)
	at java.lang.reflect.Method.invoke(Method.java:521)
	at android.test.InstrumentationTestCase.runMethod(InstrumentationTestCase.java:204)
	at android.test.InstrumentationTestCase.runTest(InstrumentationTestCase.java:194)
	at android.test.ActivityInstrumentationTestCase2.runTest(ActivityInstrumentationTestCase2.java:186)
	at org.mozilla.fennec.tests.BaseTest.runTest(BaseTest.java:127)
	at junit.framework.TestCase.runBare(TestCase.java:127)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:169)
	at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:154)
	at android.test.InstrumentationTestRunner.onStart(InstrumentationTestRunner.java:520)
	at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1447)
9 INFO TEST-UNEXPECTED-FAIL | testAddSearchEngine | Exception caught - junit.framework.AssertionFailedError: 8 INFO TEST-UNEXPECTED-FAIL | testAddSearchEngine | The number of search results has increased - got 5, expected 10
10 INFO TEST-END | testAddSearchEngine | finished in 67958ms
This is failing frequently -- can we disable for now?
Depends on: 870226
Attached patch testAddSearchEngine fix (obsolete) — Splinter Review
The fix for the intermittent fail. In my investigations I found bug 870226 for Gingerbread/Froyo devices. It seems to take a long time after Gecko:Ready for search content of the Awesomescreen to be displayed on this OSes and/or the Tegra boards which are generally slow. This patch changes the way the search engine numbers are return.

Since the first entry in the Awesomescreen after app startup is almost always going to return the wrong value I get the number of the search engines up to 5 different times. Each time I check the value against the old value found and if 2 consecutive tries return the same value it is most likely the correct one so I return that one. This should ensure that the return result of the getNumSearchEngine method is consistent.

Tryserver run: https://tbpl.mozilla.org/?tree=Try&rev=f9bacf3acc91. All fails are known intermittent fails and the issue was no longer reproduced with this version of the test
Attachment #747301 - Flags: review?(jmaher)
Comment on attachment 747301 [details] [diff] [review]
testAddSearchEngine fix

Review of attachment 747301 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/tests/testAddSearchEngine.java.in
@@ +13,5 @@
>   * expected values.
>   */
>  public class testAddSearchEngine extends BaseTest {
>  
> +    final int MAX_TRYES = 5;

s/TRYES/TRIES/

@@ +31,5 @@
>          loadUrl(blank);
>          waitForText("Browser Blank Page 01");
>  
> +        initialNumSearchEngines = getNumSearchEngines("Browser Blank Page 01");
> +        

nit: remove extra line or spaces here.
Attachment #747301 - Flags: review?(jmaher) → review+
Cleaned up the nits along with the test description which I seemed to have missed in the original patch also.
Attachment #747301 - Attachment is obsolete: true
Attachment #747386 - Flags: review?(jmaher)
https://hg.mozilla.org/mozilla-central/rev/e82e0f27347e
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Firefox 23
Depends on: 870918
Seems this happens again :/
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Adrian -- Are you currently investigating this? Any progress/thoughts?
Flags: needinfo?(adrian.tamas)
I investigated this on Friday but I was unable to reproduce the issue locally. I'll continue investigations on this today
Flags: needinfo?(adrian.tamas)
I continued to try and reproduce this but I am unable to reproduce in 100+ runs on the slowest device we have - an HTC Desire (Android 2.2). I've seen this is happening a lot more on the ARMv6 build and tried to build the latest m-c sources on armv6 but it failed to build - logged Bug 893759 for this. Will continue to debug this and try to come up with a fix.
I am still not able to reproduce this locally. The only reason there are 10 views in stead of 5 would be that while getCurrentViews is executed the top sites ListView - which has 5 entries - is only then switched with the searchengine ListView - which has 5 views - making it 10 entries. I will have to investigate further but I decided to rewrite the test to check against the number of searchengines received through the SearchEngines:Data event and then check the number of displayed searchengines. This works locally every time unfortunately the first version fails on the tryserver: https://tbpl.mozilla.org/?tree=Try&rev=79438ae48485. I'll continue on this tomorrow.
Attached patch Rewrite of testAddSearchEngine (obsolete) — Splinter Review
Tryrun: https://tbpl.mozilla.org/?tree=Try&rev=580e1d322dbe

Rewrite of the testAddSearchEngines test to check if the search engine was added by comparing to the data received through the SearchEngine:Data event and then check the displayed search engines against the expected number. This sould overall be a more reliable version of the test.
Attachment #777786 - Flags: review?(gbrown)
Comment on attachment 777786 [details] [diff] [review]
Rewrite of testAddSearchEngine

Review of attachment 777786 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/tests/BaseTest.java.in
@@ +708,5 @@
> +    * This method parses the string received as event data from blockForEventData()
> +    * blockForEventData returns the data from a Gecko event as a string. 
> +    * The data contains tuples structured like this: "property":"value" separated by "," in most cases but this may vary
> +    * The method takes as input the event data string and a string of delimiters to parse after.
> +    * @return an ArrayList<String> with the 2*k (odd) index the property and the 2k+1 (even) index the value, 0<=k<=ArrayList.size()/2

I think you confused "odd" and "even" here. {0, 2, 4, ...} are "even".

@@ +714,5 @@
> +    public ArrayList<String> parseEventData(String eventData, String dataDelimiters) {
> +        ArrayList<String> parsedEventData = new ArrayList<String>();
> +        String[] parseData = eventData.split(dataDelimiters);
> +        for (String data:parseData) {
> +            // Sometimes the property can cotain : so only split the string when we have "String":"String"

s/cotain/contain/

@@ +717,5 @@
> +        for (String data:parseData) {
> +            // Sometimes the property can cotain : so only split the string when we have "String":"String"
> +            String[] parseDataEntry = data.split("\":");
> +            for (String dataContent:parseDataEntry) {
> +                // Strip extra characters like event data begining and ending brakets, remaing extra quote marks

s/begining/beginning/
s/remaing/remaining/

::: mobile/android/base/tests/testAddSearchEngine.java.in
@@ +31,5 @@
>          blockForGeckoReady();
>          loadUrl(blank);
>          waitForText("Browser Blank Page 01");
> +
> +        // Get the numver of searchengines

s/numver/number/

@@ +34,5 @@
> +
> +        // Get the numver of searchengines
> +        Actions.EventExpecter searchEngineDataEventExpector = mActions.expectGeckoEvent("SearchEngines:Data");
> +        clickOnAwesomeBar();
> +        String eventData = searchEngineDataEventExpector.blockForEventData();

I would prefer another searchEngineDataEventExpector.unregisterListener() right after the blockForEventData(), so that it is not listening between here and when it is re-used.

@@ +95,2 @@
>  
> +    public void verifyDisplayedSearchEnginesCount(String waitText, int mExpectedCount) {

The Firefox for Android convention is that mName is used for member variables...so naming the parameter mExpectedCount is confusing here.

Perhaps s/mExpectedCount/expectedCountParam/ ?

@@ +107,5 @@
> +                    if (adapter != null) {
> +                        searchEngineCount = adapter.getCount();
> +                        if (searchEngineCount == expectedCount) {
> +                            return true;
> +                        }

It might be useful/interesting to have:

} else {
    mAsserter.dumpLog("found "+searchEngineCount+" search engines");
}

@@ +120,5 @@
> +        waitForText(waitText);
> +        if (correctNumSearchEnginesDisplayed) {
> +            mAsserter.ok(correctNumSearchEnginesDisplayed, expectedCount + " Search Engines should be displayed", "The correct number of Search Engines has been displayed");
> +        } else {
> +            // TODO: Investigate more why the number of displayed searchengines is not correctly determined here

This is a little troubling...how often is it wrong?
Attachment #777786 - Flags: review?(gbrown) → review+
Fixed all the nets and added the dumpLog in case the displayed number of searchengines is wrong.

As for the fail to clickLongOnScreen this is a Robotium 3.6 issue: http://code.google.com/p/robotium/issues/detail?id=343 fixed in robotium 4.0. This is why initially we were looking to upgrade (bug 855978) but because of all the issues with it, after talking to Joel, we decided to postpone any upgrade to robotium until everything is more stable.  I could look into and try to add the version that works from 4.0 to BaseTest and maybe use that but this doesn't fail very often from what I have seen.

Should we check this fix in now or should we wait on this also until the panda run becomes more stable?
Attachment #777786 - Attachment is obsolete: true
Attachment #778346 - Flags: review?(gbrown)
(In reply to Adrian Tamas from comment #99)
> As for the fail to clickLongOnScreen this is a Robotium 3.6 issue:
> http://code.google.com/p/robotium/issues/detail?id=343 fixed in robotium
> 4.0. This is why initially we were looking to upgrade (bug 855978) but
> because of all the issues with it, after talking to Joel, we decided to
> postpone any upgrade to robotium until everything is more stable.  I could
> look into and try to add the version that works from 4.0 to BaseTest and
> maybe use that but this doesn't fail very often from what I have seen.

I would not worry about it...we will eventually update robotium and that should take care of it.

But actually, I was not asking about the clickLongOnScreen TODO, but the other TODO: "Investigate more why the number of displayed searchengines is not correctly determined here" -- I do not think that is related to clickLongOnScreen. I don't want to get hung up on this -- just curious.

> Should we check this fix in now or should we wait on this also until the
> panda run becomes more stable?

I would check it in now, since you are just modifying an existing test.
Attachment #778346 - Flags: review?(gbrown) → review+
(In reply to Geoff Brown [:gbrown] from comment #100)
> (In reply to Adrian Tamas from comment #99)
> > As for the fail to clickLongOnScreen this is a Robotium 3.6 issue:
> > http://code.google.com/p/robotium/issues/detail?id=343 fixed in robotium
> > 4.0. This is why initially we were looking to upgrade (bug 855978) but
> > because of all the issues with it, after talking to Joel, we decided to
> > postpone any upgrade to robotium until everything is more stable.  I could
> > look into and try to add the version that works from 4.0 to BaseTest and
> > maybe use that but this doesn't fail very often from what I have seen.
> 
> I would not worry about it...we will eventually update robotium and that
> should take care of it.
> 
> But actually, I was not asking about the clickLongOnScreen TODO, but the
> other TODO: "Investigate more why the number of displayed searchengines is
> not correctly determined here" -- I do not think that is related to
> clickLongOnScreen. I don't want to get hung up on this -- just curious.
> 
Sorry about that I did not look close enough to you question and assumed it was about the first clickLongOnScreen TODO.

Looking through the tryserver run there are 3 fails(known-fails) on Tegra armv6 build, 1 fail on Tegra armv7 and none on Pandas out of 75+ runs i estimate are there. Another thing I have seen is that the fail is on the first verification when there are 4 search engines and 9 are displayed. 

I believe that if I increase the waitForTest timeout or add a delay before the first check like loading a second page there should be no fails. It's most likely the sluggishness after Gecko:Ready, especially on the armv6 builds. I can try both ideas on Monday if you want and remove the fail-safe here or we can land as it is and I would still try to do this on Monday and if it is ok land a follow-up patch. Which ever you would consider the best approach here.

Also I have seen that the number of tests ran on pandas differs from 835 to 833 and sometimes it even is 411 (the middle rc2 from the last group of 3 consecutive runs that pass) - maybe still an issue with the tests on panda?
(In reply to Adrian Tamas from comment #101)
> I believe that if I increase the waitForTest timeout or add a delay before
> the first check like loading a second page there should be no fails. It's
> most likely the sluggishness after Gecko:Ready, especially on the armv6
> builds. I can try both ideas on Monday if you want and remove the fail-safe
> here or we can land as it is and I would still try to do this on Monday and
> if it is ok land a follow-up patch. Which ever you would consider the best
> approach here.

I suggest you land this patch as it is and investigate further when you have time.

> Also I have seen that the number of tests ran on pandas differs from 835 to
> 833 and sometimes it even is 411 (the middle rc2 from the last group of 3
> consecutive runs that pass) - maybe still an issue with the tests on panda?

I think there are some conditional assertions in a few tests: If this condition isn't met, execute an additional assertion...that sort of thing. I think that's why the pass count varies a little, from 832 - 835.

The 411 count is a problem -- bug 895966. Thanks for noticing that.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/769ff264e4e7
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
My supposition was correct and it seems it is an issue with the timing. Loading a second page at the beginning fixes this - https://tbpl.mozilla.org/?tree=Try&rev=021de66f8b81 no KNOWN-FAILs in all the runs.

I am also looking into just increasing the waitForTest timeout - tryrun here https://tbpl.mozilla.org/?tree=Try&rev=cef387213549 - which I suspect will have the same effect although I may need to adjust the timeout depending on the tryrun.

On one hand loading the 2 pages first works but it means loading 2 pages that are not used in the test. I would go for the second solution of increasing the timeout of waitForTest if that proves to be the solution since only in the worst cases it will wait the 5 or more seconds. What would be the better way to go here?
Flags: needinfo?(gbrown)
I prefer the increased waitForTest timeout, if it is effective.
Flags: needinfo?(gbrown)
Tryrun: https://tbpl.mozilla.org/?tree=Try&rev=5230f9d8d86d

After increasing the waitForTest timeout I did not see any more known-fails on the tryrun so I removed the todo_is assertion and changed the waitForTest timeout to 5000 ms. Waiting for the new tryrun results.
The run has 1 fail after removing the todo_is assertion and increasing the waitForTest timeout. The screeenshot is not really helpful. Should I try and increase the timeout further in the hope that it is a timing issue? Another possible solution could be to load the Search Engine page before we check for the existing search engines and then return to adding them.
In the log for the failure from the try run, I see that we keep finding 9 search engines, rather than the expected number. Actually, it's 9 items in a ListView -- I wonder what these are. Have you tried logging out the TextView's in that list?

I would prefer that we keep digging into the root cause of this rather than pushing forward with an increased timeout or loading extra pages, etc.
(In reply to Geoff Brown [:gbrown] from comment #110)
> In the log for the failure from the try run, I see that we keep finding 9
> search engines, rather than the expected number. Actually, it's 9 items in a
> ListView -- I wonder what these are. Have you tried logging out the
> TextView's in that list?
> 
> I would prefer that we keep digging into the root cause of this rather than
> pushing forward with an increased timeout or loading extra pages, etc.
I took the timer down to 1 second and printed each item of the list. Here is the output for a run where 9 search engines are found: http://pastebin.com/GWuDqiDc. There are 4 searchenginerows and 5 Relative layouts which most likely are the allpages items. I'll post a patch shortly to count only the SearchEngineRows and not everything. This should fix the test reliably and we can use the default wait for test timeout.
This fix is a mix of extending the waitForTest timeout to 5s (5000ms) and only counting the SearchEngineRow views from all the views getCurrentListViews() returns since Top Sites entries were also counted in some situations.

Try run: https://tbpl.mozilla.org/?tree=Try&rev=d17ac0e2fbb9
Attachment #779820 - Attachment is obsolete: true
Attachment #782478 - Flags: review?(gbrown)
Comment on attachment 782478 [details] [diff] [review]
Follow-up patch to remove the todo_is assertion v2

Review of attachment 782478 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great -- really glad we understand it now.

::: mobile/android/base/tests/testAddSearchEngine.java.in
@@ +110,5 @@
> +                        for (int i = 0; i < adapter.getCount(); i++ ) {
> +                            View item = view.getChildAt(i);
> +                            try {
> +                                ClassLoader classLoader = getActivity().getClassLoader();
> +                                Class searchEngineRow = classLoader.loadClass("org.mozilla.gecko.SearchEngineRow");

nit - you could do this outside of the loop.
Attachment #782478 - Flags: review?(gbrown) → review+
Moved the try outside of the loop and the loop. Works locally without any issues.
Attachment #782478 - Attachment is obsolete: true
Attachment #782613 - Flags: review+
Attachment #782613 - Flags: review+ → review?(gbrown)
Attachment #782613 - Flags: review?(gbrown) → review+
Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.