Closed Bug 910106 Opened 6 years ago Closed 6 years ago

Intermittent Android testClearPrivateData, testFindInPage, tcheck2 talosError: 'Verify HomePager is hidden: HomePager is hidden [browser_output.txt]'

Categories

(Firefox for Android :: Awesomescreen, defect)

ARM
Android
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 27
Tracking Status
firefox25 --- unaffected
firefox26 --- fixed
firefox27 --- fixed
firefox-esr24 --- unaffected
fennec 26+ ---

People

(Reporter: philor, Assigned: Margaret)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #907720 +++

Unlike the morphed bug 907720 results which seem to now combine both the crash and the failure to get rid of about:home, this one lacks the crash.

https://tbpl.mozilla.org/php/getParsedLog.php?id=27101525&tree=Mozilla-Inbound
Android Tegra 250 mozilla-inbound talos remote-trobocheck2 on 2013-08-27 22:37:05 PDT for push caf66a81daeb
slave: tegra-277

PaintExpecter: no longer listening for events
waitForTest timeout after 10000 ms
__FAILVerify HomePager is hidden: HomePager is hidden__FAIL
__start_report3.5722396__end_report
__startTimestamp1377643485982__endTimestamp
__startBeforeLaunchTimestamp1377668642346__endBeforeLaunchTimestamp
__startAfterTerminationTimestamp1377668718435__endAfterTerminationTimestamp

pushing directory: /tmp/tmpym49VA/profile to /mnt/sdcard/tests/profile
	Screen width/height:1024/768
	colorDepth:24
	Browser inner width/height: 1024/695


browser_name:Fennec
browser_version:26.0a1
buildID:20130827214517

removing file: /mnt/sdcard/tests/fennec_ids.txt
removing file: /mnt/sdcard/tests/robotium.config
removing file: /mnt/sdcard/tests/browser_output.txt
removing file: /mnt/sdcard/tests/browser_output.txt
removing file: /mnt/sdcard/tests/profile/sessionstore.js
getting files in '/mnt/sdcard/tests/profile/minidumps/'
removing file: /mnt/sdcard/tests/browser_output.txt
removing file: /mnt/sdcard/tests/profile/sessionstore.js
getting files in '/mnt/sdcard/tests/profile/minidumps/'
Failed tcheck2: 
		Stopped Tue, 27 Aug 2013 22:45:18
Traceback (most recent call last):
  File "run_tests.py", line 277, in run_tests
    talos_results.add(mytest.runTest(browser_config, test))
  File "/builds/tegra-277/talos-data/talos/ttest.py", line 423, in runTest
    test_results.add(browser_log_filename, counter_results=counter_results)
  File "/builds/tegra-277/talos-data/talos/results.py", line 128, in add
    browserLog = BrowserLogResults(filename=results, counter_results=counter_results, global_counters=self.global_counters)
  File "/builds/tegra-277/talos-data/talos/results.py", line 328, in __init__
    self.error(match.group(1))
  File "/builds/tegra-277/talos-data/talos/results.py", line 343, in error
    raise utils.talosError(message)
talosError: 'Verify HomePager is hidden: HomePager is hidden [browser_output.txt]'
Margaret, this appears to be the failure in bug 907720 morphed slightly - can you take a look? :-)
Flags: needinfo?(margaret.leibovic)
(In reply to Ed Morley [:edmorley UTC+1] from comment #20)
> Margaret, this appears to be the failure in bug 907720 morphed slightly -
> can you take a look? :-)

Sure, I can own this.

I'm actually a bit confused about why bug 907720 was left open. Looking at some of the recent TBPL robot comments in there, the failures sound more like this bug. Maybe there's been confusion between these two bugs when starring failures?

You guys are more familiar with the state of failures on the tree, so I'll let you decide what we should do, but I think we can close bug 907720.
Assignee: nobody → margaret.leibovic
Flags: needinfo?(ryanvm)
Flags: needinfo?(margaret.leibovic)
Flags: needinfo?(emorley)
Bug 907720 tracks rck2 crashing with this message. This bug tracks this message w/o a crash. If you want to combine them, we could so, though.
Flags: needinfo?(ryanvm)
Flags: needinfo?(emorley)
If they're the same, happy to combine
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #24)
> Bug 907720 tracks rck2 crashing with this message. This bug tracks this
> message w/o a crash. If you want to combine them, we could so, though.

Looking through the log messages here, it looks like some of them were crashes (probably accidental TBPL starring). But regardless of whether there was a crash or not, I think these "Verify HomePager is hidden" failures all have the same root cause, so I closed out bug 907720.
Summary: Intermittent Android tcheck2 talosError: 'Verify HomePager is hidden: HomePager is hidden [browser_output.txt]' without a crash → Intermittent Android tcheck2 talosError: 'Verify HomePager is hidden: HomePager is hidden [browser_output.txt]'
This seems to only be happening on the tegras. Is it possible that they're really slow enough that we're not giving the HomePager enough time to hide? I'm tempted to suggest we just try increasing MAX_WAIT_HOME_PAGER_HIDDEN_MS, but it's already 10 seconds, which seems like a long time.
Margaret, any news on this? :-)
Flags: needinfo?(margaret.leibovic)
(In reply to Ed Morley [:edmorley UTC+1] from comment #76)
> Margaret, any news on this? :-)

I'm looking into this now, and I haven't been able to reproduce this locally, maybe I can see if I can get my hands on a tegra device.

I can also try increasing the timeout and pushing that change to try just to see if it improves things at all.
Flags: needinfo?(margaret.leibovic)
See bug 913683 for some experiments I'm performing to try to fix both this bug and bug 908823.
This should *hopefully* be much improved by the patch I just landed in bug 913683. However, on my last try run there was one case where the test still timed out, and I'm not sure we're going to be able to avoid this error 100% of the time.
https://tbpl.mozilla.org/php/getParsedLog.php?id=28149240&tree=Mozilla-Inbound
Summary: Intermittent Android tcheck2 talosError: 'Verify HomePager is hidden: HomePager is hidden [browser_output.txt]' → Intermittent Android testFindInPage, tcheck2 talosError: 'Verify HomePager is hidden: HomePager is hidden [browser_output.txt]'
With the patch I landed in bug 913683, this should happen less often, although this check is used in more tests now (thanks for updating the summary).

The real problem here is that these tegra devices are slow, and it's good that the test actually bails out rather than continues while the home pager is hidden.

Hopefully the other performance issues we're tracking for the new about:home will help resolve this, but in the meantime, there's not really anything more actionable I can do right now.
Thank you for all your work on trying to make this test more reliable. Unfortunately the failure rate is still too high, so I've disabled it on Android 2.2 (only) until we have a better solution:
remote:   https://hg.mozilla.org/integration/fx-team/rev/59655b2db583

Given Android 2.2/tegras are going to be on the way out next year, I'm not using [leave open] in the whiteboard, since I think this will end up being a permanent solution (and as such we should just let mcMerge close the bug after next merge). If you'd rather leave it open, happy for you to add the whiteboard annotation :-)
Duplicate of this bug: 918517
Unfortunately, there is some fundamental issue here with PixelTest.loadAndGetPainted(), so any test that uses that (or loadAndPaint(), since that calls into it as well) is likely to have problems :(

I don't know what to do here -- on the one hand, it sucks to have tests failing so often, but on the other hand, if the home pager is actually still showing, there is definitely some problem.

Maybe I can get my hand on a tegra device to see if I can reproduce this locally.
Summary: Intermittent Android testFindInPage, tcheck2 talosError: 'Verify HomePager is hidden: HomePager is hidden [browser_output.txt]' → Intermittent Android testClearPrivateData, testFindInPage, tcheck2 talosError: 'Verify HomePager is hidden: HomePager is hidden [browser_output.txt]'
An experiment to see if replace loadUrl with inputAndLoadUrl increases the reliability of these tests:
https://tbpl.mozilla.org/?tree=Try&rev=f1b8b9e00ded
(In reply to Ed Morley [:edmorley UTC+1] from comment #169)
> Thank you for all your work on trying to make this test more reliable.
> Unfortunately the failure rate is still too high, so I've disabled it on
> Android 2.2 (only) until we have a better solution:
> remote:   https://hg.mozilla.org/integration/fx-team/rev/59655b2db583

I don't really love this solution because it means that we'll be proceeding with the tests, even though the app is in a wrong state. I'm curious if this might cause some other weird intermittent issues, like the crashes we used to experience in tcheck2, and the reason why I originally landed this check

However, I understand it's generating a lot of noise right now, so something needed to be done.

> Given Android 2.2/tegras are going to be on the way out next year, I'm not
> using [leave open] in the whiteboard, since I think this will end up being a
> permanent solution (and as such we should just let mcMerge close the bug
> after next merge). If you'd rather leave it open, happy for you to add the
> whiteboard annotation :-)

I'm going to continue to try to investigate this, in the hopes that we can come up with a better solution.
Whiteboard: [leave open]
Ok, thank you :-)
Whiteboard: [leave open] → [disabled on Android 2.2][leave open]
Duplicate of this bug: 920490
(In reply to :Margaret Leibovic from comment #177)
> I don't really love this solution because it means that we'll be proceeding
> with the tests, even though the app is in a wrong state. I'm curious if this
> might cause some other weird intermittent issues, like the crashes we used
> to experience in tcheck2, and the reason why I originally landed this check

Ah indeed, I'd forgotten this was the solution for bug 907720, too many bug dependencies.

I've reverted the test-disabling:
remote:   https://hg.mozilla.org/integration/fx-team/rev/ebf396c5f073
Whiteboard: [disabled on Android 2.2][leave open] → [leave open]
I can trigger some more try jobs, but so far I haven't seen any home pager hidden problems with this patch.

I realize this may slow things down, because this manually enters URLs as opposed to trying to load them directly, but I think that's a worthwhile tradeoff.

I can also look into reverting the timeout increases I made in bug 913683, since those might not actually be necessary with this change.
Attachment #809893 - Flags: review?(gbrown)
Comment on attachment 809893 [details] [diff] [review]
Replace loadUrl with inputAndLoadUrl in PixelTest

Review of attachment 809893 [details] [diff] [review]:
-----------------------------------------------------------------

This is basically un-doing the change that Lucas made in bug 905591. afaik, that problem ("Dropped event because it is stale", etc.) still exists -- we just use inputAndLoadUrl infrequently enough to avoid it most of the time. 

Why don't the two of you work it out?
Attachment #809893 - Flags: review?(gbrown) → review?(lucasr.at.mozilla)
Perhaps instead we should replace BaseTest.loadUrl with an openUrl method that calls BrowserApp.openUrl:
http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/BrowserApp.java#1294
(In reply to :Margaret Leibovic from comment #196)
> Perhaps instead we should replace BaseTest.loadUrl with an openUrl method
> that calls BrowserApp.openUrl:
> http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/BrowserApp.
> java#1294

Sounds like a good thing to try.
Comment on attachment 809893 [details] [diff] [review]
Replace loadUrl with inputAndLoadUrl in PixelTest

Margaret, let me know if you still want to go ahead with this approach or try something different. I'll clear the review flag for now.
Attachment #809893 - Flags: review?(lucasr.at.mozilla)
Blocks: 922112
trobocheck hidden on {mozilla-central, mozilla-inbound, b2g-inbound, fx-team} for Android 2.2 (only), bug 922112 filed to unhide once this bug fixed.
I starting experimenting locally with using BrowserApp.openUrl instead of Tabs.loadUrl, but I'm afraid we'll have the same problem, because openUrl doesn't call hideHomePager itself, but actually calls Tabs.loadUrl, which puts us in the same situation.

Thinking about this bug more, I realized that entering a URL while on about:home itself has different behavior than entering a URL on a normal page. On a normal page, after we commit editing mode, we may see the previous page before the next page loads. However, because we call hideHomePager in commitEditingMode, the about:home content always gets hidden immediately, rather than waiting for a location change event.

I suppose what I really need to do here is figure out why the hell it takes so long to get this location change event to hide the home pager (or if somehow we're never making that call to hide it). Previously, I wasn't been able to reproduce this failure locally, but I can try again with the new tests that are using this check.

Lucas, can you give me some more context around the change from bug 905591? Given the number of failures happening here, I think it may be worthwhile to go ahead and land the patch I have in here.
Flags: needinfo?(lucasr.at.mozilla)
(In reply to :Margaret Leibovic from comment #277)
> I starting experimenting locally with using BrowserApp.openUrl instead of
> Tabs.loadUrl, but I'm afraid we'll have the same problem, because openUrl
> doesn't call hideHomePager itself, but actually calls Tabs.loadUrl, which
> puts us in the same situation.
> 
> Thinking about this bug more, I realized that entering a URL while on
> about:home itself has different behavior than entering a URL on a normal
> page. On a normal page, after we commit editing mode, we may see the
> previous page before the next page loads. However, because we call
> hideHomePager in commitEditingMode, the about:home content always gets
> hidden immediately, rather than waiting for a location change event.
> 
> I suppose what I really need to do here is figure out why the hell it takes
> so long to get this location change event to hide the home pager (or if
> somehow we're never making that call to hide it). Previously, I wasn't been
> able to reproduce this failure locally, but I can try again with the new
> tests that are using this check.
> 
> Lucas, can you give me some more context around the change from bug 905591?
> Given the number of failures happening here, I think it may be worthwhile to
> go ahead and land the patch I have in here.

The change in bug 905591 was just a pragmatic move to get fig landed. IIRC, testVkbOverlap (not disabled) was failing to input URLs with injected events for some unknown reason. Ideally, non-chrome tests should not rely on event injection in the UI. But I'm fine with reverting this change if that makes our tests more reliable.
Flags: needinfo?(lucasr.at.mozilla)
Comment on attachment 809893 [details] [diff] [review]
Replace loadUrl with inputAndLoadUrl in PixelTest

Given you last comment, I'd like to land this change. We should continue to investigate a better solution for avoiding inputting URLs for all tests, but I think this patch will make our failure situation much better in the meantime.
Attachment #809893 - Flags: review?(lucasr.at.mozilla)
Attachment #809893 - Flags: review?(lucasr.at.mozilla) → review+
Bombs away: https://hg.mozilla.org/integration/fx-team/rev/8ce571d084f9

Let's see how this works out. We should stay on the lookout for failures like the one in bug 905591, but I think this will at least get rid of the "HomePager is hidden" problem.
Depends on: 923109
I just filed bug 923109 as a follow-up to come up with a replacement for the patch I landed here.

However, I think we can close this bug, since that patch should stop the failures that have been happening here.
Indeed.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Target Milestone: --- → Firefox 27
You need to log in before you can comment on or make changes to this bug.