Closed
Bug 913683
Opened 12 years ago
Closed 12 years ago
Talos regressions in trobopan and tcheck2 on Android since Aug 22 2013
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(firefox26 fixed, firefox27 fixed, fennec-)
RESOLVED
FIXED
Firefox 27
People
(Reporter: gbrown, Assigned: Margaret)
References
Details
(Keywords: perf, regression)
Attachments
(2 files)
|
3.39 KB,
patch
|
gbrown
:
review+
|
Details | Diff | Splinter Review |
|
1.89 KB,
patch
|
gbrown
:
review+
paul.feher
:
feedback+
|
Details | Diff | Splinter Review |
Since about Aug 22, trobopan and tcheck2 have been unstable on both Android 2.2 and Android 4.0, triggering lots of Talos alerts.
I think we are already working on sorting this out (via bug 908823 especially) but I wanted another bug on file to reflect that there have been various regressions and improvements reported since then (it seems pointless to report bugs for all of the individual alerts -- there's a new one almost every day.)
http://graphs.mozilla.org/graph.html#tests=[[201,11,20]]&sel=none&displayrange=30&datatype=running
http://graphs.mozilla.org/graph.html#tests=[[174,11,20]]&sel=none&displayrange=30&datatype=running
http://graphs.mozilla.org/graph.html#tests=[[201,11,29]]&sel=none&displayrange=30&datatype=running
http://graphs.mozilla.org/graph.html#tests=[[174,11,29]]&sel=none&displayrange=30&datatype=running
| Assignee | ||
Comment 1•12 years ago
|
||
Cc'ing kats, since we were talking about this on IRC this morning.
Comment 2•12 years ago
|
||
Yeah I started looking at this a little today. Note that it's only the Tegras that have been off since the fig landing, and I agree with Margaret that for some reason it was because we were running the test on the home screen rather than the test page. Her bandaid makes us wait a bit longer but I think we're still running the test on the home screen for the most part, based on the huge variance I'm seeing in the tegra talos numbers. It did however make the tests run to completion more often.
I think we need to dig into why the behaviour is wrong on the tegra devices. Margaret, when you ran it locally with your patch, did you see the test getting run on the actual page? I wonder if you can run it locally with and without your patch to see if the reported results are similar. If that is the case then maybe I'm wrong and your patch did fix the behaviour but the numbers are off for some other reason. If the reported results are different before and after (i.e. your patch fixes the talos numbers locally) then the tegras in production are not behaving the same way as your local testing and we need to root-cause the home pager problem.
Comment 3•12 years ago
|
||
Also FWIW I did also do a couple of try pushes for trobopan. [1] just applies your patch to trobopan as well, so that it waits for the homepager. As you reported at bug 907720 comment 346, it fails occasionally. [2] is an additional patch on top of that that increases the timeout, and that one hasn't failed yet. In both cases though the reported talos values have a huge range so I don't think the underlying problem is actually fixed, it just hides the symptoms better.
[1] https://tbpl.mozilla.org/?tree=Try&rev=0440a3838828
[2] https://tbpl.mozilla.org/?tree=Try&rev=5f41b9bf69d8
| Assignee | ||
Comment 4•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2)
> I think we need to dig into why the behaviour is wrong on the tegra devices.
> Margaret, when you ran it locally with your patch, did you see the test
> getting run on the actual page?
No, I failed to get the real test pages to load. I may have just been putting the files from the talos repo in the wrong place, but I gave up assuming that not seeing the test get run on about:home anymore was a good enough sign (before applying my patch, I almost always saw us panning and zooming the about:home content, but after applying the patch, I always saw panning and zooming the 404 page that showed up when I couldn't load the test page).
If the numbers are still a mess, I agree that we should try to find a way to observe what's actually going on.
Comment 5•12 years ago
|
||
(In reply to :Margaret Leibovic from comment #4)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2)
> No, I failed to get the real test pages to load. I may have just been
> putting the files from the talos repo in the wrong place, but I gave up
> assuming that not seeing the test get run on about:home anymore was a good
> enough sign (before applying my patch, I almost always saw us panning and
> zooming the about:home content, but after applying the patch, I always saw
> panning and zooming the 404 page that showed up when I couldn't load the
> test page).
It could be that the 404 page behaves differently from the actual test page (maybe it loads faster or sends different draw-finished notifications) and so that might throw off the test's behaviour.
To run e.g. testPan locally, I believe the steps I've used successfully in the past are:
1) Copy the relevant fileset (in this case wikipedia.html and en.wikipedia.org) from http://hg.mozilla.org/build/talos/file/ca2229a32cb6/talos/startup_test/fennecmark to your local mobile/android/base/tests/ folder.
2) Modify testPan.java.in to return TEST_MOCHITEST from getTestType()
3) Modify testPan.java.in's getAbsoluteUrl call argument to "/robocop/wikipedia.html"
4) Modify robocop.ini to just have testPan
Then rebuild and run as usual.
In the past I think I've only ever done this with the timecube.html page on testCheck.java.in and since that's a standalone file it works well. Wikipedia and CNN have an additional directory of external content and I don't know if the robocop packaging process will pick that up properly. You can modify the MOCHITEST_ROBOCOP_FILES variable in build/mobile/robocop/Makefile.in to include the "en.wikipedia.org" folder as well, which might do the trick.
| Assignee | ||
Comment 7•12 years ago
|
||
I just made a try push that experiments with making sure the home pager is hidden earlier, before we start trying to verify that the page is painted:
https://tbpl.mozilla.org/?tree=Try&rev=78d9b242bc90
I also increased the delay we wait between paint events, since locally I observed us starting to pan a blank page before anything started drawing.
Depending on how this goes, this patch may help fix both bug 908823 and bug 910106.
(For future reference, I found sticking the talos test page files directly in objdir/_tests/testing/mochitest/tests/robocop was the easiest way to make them available in local test runs.)
| Assignee | ||
Comment 8•12 years ago
|
||
So, good news is that the patch in that try push brought the tpan numbers back closer to normal (fixing bug 908823).
The bad news is that it didn't fix the intermittent HomePager is hidden issue, and if we land this patch, we'll be adding that intermittent failure to tpan in addition to tcheck2 :(
Maybe I need to revisit the verifyHomePagerHidden implementation to see if something there is causing flakiness, or if the home pager just really doesn't get hidden. Maybe we need an even longer timeout? I can push a patch that does that to try. Maybe I should also land a patch that gives us some more logging.
| Assignee | ||
Comment 9•12 years ago
|
||
An attempt to increase MAX_WAIT_HOME_PAGER_HIDDEN_MS, but not PAINT_CLEAR_DELAY:
https://tbpl.mozilla.org/?tree=Try&rev=e95999e1170f
An attempt that increases both:
https://tbpl.mozilla.org/?tree=Try&rev=3d666ef0831d
| Assignee | ||
Updated•12 years ago
|
Assignee: nobody → margaret.leibovic
| Assignee | ||
Comment 10•12 years ago
|
||
This patch moves the verifyHomePagerHidden check to before we start trying to verify that the page drawn, which seems like a logical change.
However, just moving that check still caused a lot of intermittent "HomePager is hidden failures". I added some logging to see what happens in those failure cases, and it appears the view just really is still visible, so I think these tegras are just really slow. Increasing MAX_WAIT_HOME_PAGER_HIDDEN_MS appears to fix this problem, but maybe we should start looking into why it actually takes so long for the home pager to disappear (this isn't only dependent on the Java UI, since in these tests the home pager is only hidden after we get a location change event).
When I got this test running locally I saw that without the PAINT_CLEAR_DELAY increase, we would end up starting to pan the page before it finished drawing. This is definitely wrong, and I believe that's what's caused the big spike reported in bug 908823. This increase appears to fix the issue right now, but I'd also like to look into why this wasn't a problem before the fig merge, since the time to paint a page after the home pager is hidden shouldn't have changed with the fig changes.
Lucas has been doing some fig-related performance investigations, so maybe he would have some ideas about these things. In any case, I think this patch fixes a lot of our failure/bogus-number issues, so it would be worthwhile to land.
Attachment #804802 -
Flags: review?(gbrown)
| Reporter | ||
Comment 11•12 years ago
|
||
Comment on attachment 804802 [details] [diff] [review]
patch to move verifyHomePagerHidden check and increase timeouts
Review of attachment 804802 [details] [diff] [review]:
-----------------------------------------------------------------
I too am concerned about the length of delays that seem to be necessary, but I don't know of another way to get these passing. Let's hope those performance investigations turn up some answers soon!
Attachment #804802 -
Flags: review?(gbrown) → review+
Comment 12•12 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #11)
> I too am concerned about the length of delays that seem to be necessary, but
> I don't know of another way to get these passing.
The test itself could be broken (or not working well with fig), and if that's the case, then the test needs to be fixed... but first it's probably best to understand what changed with fig that made the test irrelevant?
| Assignee | ||
Comment 13•12 years ago
|
||
https://hg.mozilla.org/integration/fx-team/rev/dea697e42d70
Let's see how well this works, and we can look to uplift to Aurora if necessary.
Comment 14•12 years ago
|
||
(In reply to :Margaret Leibovic from comment #13)
> https://hg.mozilla.org/integration/fx-team/rev/dea697e42d70
>
> Let's see how well this works, and we can look to uplift to Aurora if
> necessary.
Failing test:
https://tbpl.mozilla.org/php/getParsedLog.php?id=27974067&tree=Fx-Team
Comment 15•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+1] from comment #14)
> Failing test:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=27974067&tree=Fx-Team
Backed out:
remote: https://hg.mozilla.org/integration/fx-team/rev/d3a3c14c23e3
| Assignee | ||
Comment 16•12 years ago
|
||
Ugh, we use loadAndPaint in testSystemPages to test about:home... so of course the home pager won't be hidden!
I don't really see the value in an automated test to make sure about:home loads, since it's highly unlike a patch would ever land with that problem, since it's so noticable.
Test passed locally, giving that a try:
https://tbpl.mozilla.org/?tree=Try&rev=ddde61cf46f5
| Assignee | ||
Comment 17•12 years ago
|
||
Like I mentioned in my last comment, I don't think we should bother checking about:home in testSystemPages. In fact, using loadAndPaint to check this Java-based page doesn't even make sense.
Attachment #806045 -
Flags: review?(gbrown)
| Reporter | ||
Comment 18•12 years ago
|
||
Comment on attachment 806045 [details] [diff] [review]
patch to remove about:home from testSystemPages
Review of attachment 806045 [details] [diff] [review]:
-----------------------------------------------------------------
I think we can do without an explicit about:home check, but let's check with Paul too, since he wrote the original test.
Attachment #806045 -
Flags: review?(gbrown)
Attachment #806045 -
Flags: review+
Attachment #806045 -
Flags: feedback?(paul.feher)
Updated•12 years ago
|
Attachment #806045 -
Flags: feedback?(paul.feher) → feedback+
| Assignee | ||
Comment 19•12 years ago
|
||
Try push was green, so pushing to fx-team:
https://hg.mozilla.org/integration/fx-team/rev/b8e91961a401
https://hg.mozilla.org/integration/fx-team/rev/b3cef9e12692
Comment 20•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/b8e91961a401
https://hg.mozilla.org/mozilla-central/rev/b3cef9e12692
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 27
Comment 21•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/f89590e5824a
https://hg.mozilla.org/releases/mozilla-aurora/rev/3d9f6b93d300
status-firefox26:
--- → fixed
status-firefox27:
--- → fixed
Updated•5 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•