Closed
Bug 710296
Opened 13 years ago
Closed 9 years ago
turn on responsiveness for linux in Talos
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(firefox47 fixed)
RESOLVED
FIXED
mozilla47
Tracking | Status | |
---|---|---|
firefox47 | --- | fixed |
People
(Reporter: k0scist, Unassigned)
References
Details
Attachments
(1 file)
I notice this comment at
http://hg.mozilla.org/build/talos/file/705a12151bf2/talos/ttest.py#l256
:
# ignore responsiveness tests on linux until we fix Bug 697555
bug 697555 is now fixed. Can we remove this code and turn on
responsiveness for linux again?
Comment 1•13 years ago
|
||
No, that bug only wound up dealing with the OS X side of things. There's still something broken with Linux where it crashes on shutdown.
Comment 2•13 years ago
|
||
We should try this after the patch in bug 730282 lands, since it fixes a potential crash on Linux/OSX.
Depends on: 730282
Comment 3•13 years ago
|
||
out of 40 runs, I had 2 crashes: https://tbpl.mozilla.org/?tree=Try&rev=c60b10defc46
Comment 4•9 years ago
|
||
do we have a need to do this? If so, I would like to get someone looking at this, otherwise with so many years gone by, this isn't much of a priority.
Flags: needinfo?(ted)
Comment 5•9 years ago
|
||
I would ask the e10s folks. If they want it then yes. If they are ambivalent or don't want it, then let's just WONTFIX this.
Flags: needinfo?(ted)
Comment 6•9 years ago
|
||
:blassey, can you determine if we want to look into this, or help find someone who could have that discussion and drive a decision?
Flags: needinfo?(blassey.bugs)
Updated•9 years ago
|
Flags: needinfo?(blassey.bugs) → needinfo?(jmathies)
Comment 7•9 years ago
|
||
Where are the results for these tests? I'm looking through tp5 results on mc and I'm not seeing them.
We currently use telemetry to check for regressions in main thread jank. Having a 'canary in the coal mine' talos test that measures this is useful.
Flags: needinfo?(jmaher)
Comment 8•9 years ago
|
||
it appears we don't run these at all- my memory isn't the best, but here is some historical data:
http://graphs.mozilla.org/graph.html#tests=[[275,1,49],[275,1,47],[275,1,31],[275,1,25]]&sel=none&displayrange=365&datatype=geo
it appears we turned this off, I suspect it was that a few reasons:
1) failure to collect data often: https://bugzilla.mozilla.org/show_bug.cgi?id=1174160
2) interest in this, but nothing conclusive: https://bugzilla.mozilla.org/show_bug.cgi?id=1170241
3) we accidentally disabled this 5.5 months ago
I couldn't find any reason why this was disabled specifically. If this is useful we would need to:
* turn it on
* ensure it is stable and doesn't fail to collect, hang jobs
* find out how to not post 8000 data points, ideally just 1
* ensure the results are normal, not random so if we get a regression it is worthwhile to track down.
:jimm, what are your thoughts here?
Flags: needinfo?(jmaher)
Comment 9•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #8)
> it appears we don't run these at all- my memory isn't the best, but here is
> some historical data:
> http://graphs.mozilla.org/graph.html#tests=[[275,1,49],[275,1,47],[275,1,31],
> [275,1,25]]&sel=none&displayrange=365&datatype=geo
>
> it appears we turned this off, I suspect it was that a few reasons:
> 1) failure to collect data often:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1174160
> 2) interest in this, but nothing conclusive:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1170241
> 3) we accidentally disabled this 5.5 months ago
>
> I couldn't find any reason why this was disabled specifically. If this is
> useful we would need to:
> * turn it on
> * ensure it is stable and doesn't fail to collect, hang jobs
> * find out how to not post 8000 data points, ideally just 1
> * ensure the results are normal, not random so if we get a regression it is
> worthwhile to track down.
>
> :jimm, what are your thoughts here?
This sounds like a useful test to me. If someone landed code that janked the main browser thread it'd be nice to know it. I'm not sure how often that happens but a test to be sure it doesn't would be useful. Especially now that we are doing multiple processes where one process might spend (too much) time blocked on another.
Flags: needinfo?(jmathies)
Comment 10•9 years ago
|
||
After reading it over, this kinds feels like a dupe of bug 710296.
Comment 11•9 years ago
|
||
this is sounding like a useful thing- did you mean to link another bug as a possible dupe?
At the very least I can turn this on for a try run and see what works/doesn't
Comment 12•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #11)
> this is sounding like a useful thing- did you mean to link another bug as a
> possible dupe?
>
> At the very least I can turn this on for a try run and see what works/doesn't
Whoops, I meant bug 1170241. We had a discussion there last spring about the usefulness of this test in tracking regressions under e10s. My opinion hasn't changed since then, I think we should keep it and improve on it over time. :)
Comment 13•9 years ago
|
||
ok, I turned this on for all platforms:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6193e1d7008
it seems to be recording data, I am going to get this fix pushed and see how stable the data is and work on tracking it. Is it useful to track e10s and non e10s?
Flags: needinfo?(jmathies)
Comment 14•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #13)
> ok, I turned this on for all platforms:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6193e1d7008
>
> it seems to be recording data, I am going to get this fix pushed and see how
> stable the data is and work on tracking it. Is it useful to track e10s and
> non e10s?
Yes!
Flags: needinfo?(jmathies)
Comment 15•9 years ago
|
||
I wonder if tp5o_responsiveness_paint is broken under e10s? the delta there is huge.
Comment 16•9 years ago
|
||
The actual data collection here is pretty simple--it just fires tracer events into the native event queue and times how long they take to get handled. This is only done in the chrome process, so I could believe that in certain scenarios e10s would have way lower results, where the content process is doing the bulk of the work and not tying up the chrome process event loop.
Comment 17•9 years ago
|
||
yeah, we would need a custom measurement for e10s if we wanted to measure something outside of the chrome process.
Comment 18•9 years ago
|
||
osx is quite crazy, but windows and linux seem to be stable.
Comment 19•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/33483/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/33483/
Attachment #8715485 -
Flags: review?(j.parkouss)
Comment 20•9 years ago
|
||
and a corresponding try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e43d82e4b369
this will ignore osx, but turn on responsiveness for linux/win* on tp5.
Comment 21•9 years ago
|
||
Comment on attachment 8715485 [details]
MozReview Request: Bug 710296 - turn on responsiveness for linux in Talos. r?parkouss
https://reviewboard.mozilla.org/r/33483/#review30187
::: testing/talos/talos/config.py:326
(Diff revision 1)
> +
Unless I'm missing something, I think you can bring back 'responsiveness' in GLOBAL_OVERRIDES, and that should allow to remove this block of code (it will be handled automatically with the "for key in GLOBAL_OVERRIDES" above).
::: testing/talos/talos/ttest.py:93
(Diff revision 1)
> - if test_config.get('responsiveness') and \
> + if test_config.get('responsiveness', False) and \
False is not required here, you can remove it.
Attachment #8715485 -
Flags: review?(j.parkouss)
Comment 22•9 years ago
|
||
I actually found out that we were always failing to run responsiveness as the global_configs set it to False because it is a cli arg with a False default value. This overrides the value set in test.py (True for tp5o).
So we accidentally turned this off with the update to config.py/etc.
Comment 23•9 years ago
|
||
Comment on attachment 8715485 [details]
MozReview Request: Bug 710296 - turn on responsiveness for linux in Talos. r?parkouss
(In reply to Joel Maher (:jmaher) from comment #22)
> I actually found out that we were always failing to run responsiveness as
> the global_configs set it to False because it is a cli arg with a False
> default value. This overrides the value set in test.py (True for tp5o).
>
> So we accidentally turned this off with the update to config.py/etc.
Oh ok ! Nevermind then, r+. Maybe add a comment to explain that ? We should probably try to improve this weird arg parsing logic (again), but that is another story.
Attachment #8715485 -
Flags: review+
Comment 24•9 years ago
|
||
Comment on attachment 8715485 [details]
MozReview Request: Bug 710296 - turn on responsiveness for linux in Talos. r?parkouss
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/33483/diff/1-2/
Attachment #8715485 -
Flags: review+ → review?(j.parkouss)
Comment 25•9 years ago
|
||
Comment on attachment 8715485 [details]
MozReview Request: Bug 710296 - turn on responsiveness for linux in Talos. r?parkouss
https://reviewboard.mozilla.org/r/33483/#review30287
Thanks for looking deeper into this!
This looks fine to me, no more --responsiveness parameter which is automatically set on tests that requires it. (e.g. tp5o).
Attachment #8715485 -
Flags: review?(j.parkouss) → review+
Comment 26•9 years ago
|
||
Comment 27•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox47:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Comment 28•9 years ago
|
||
This wierdly seems to have "regressed" tp5o % cpu on win7. Any ideas?
https://treeherder.allizom.org/perf.html#/alerts?id=20
It doesn't make any sense, but looking at the graph the new behaviour unmistakably started with this push. Unless something else is going on....
Flags: needinfo?(jmaher)
Comment 29•9 years ago
|
||
yes, this would be fine as we are doing more cpu and work to record and dump the output.
Flags: needinfo?(jmaher)
Comment 30•9 years ago
|
||
But that should happen on *linux* not windows. This makes no sense, on :Callek's suggestion I'm doing a retrigger on a older revision off a windows xp job which exhibits this behavior:
https://treeherder.allizom.org/perf.html#/graphs?series=[mozilla-inbound,1fc0f3ae22619bd040312c9c7dc14e77247d7939,1]&highlightedRevisions=633d74f817b2
Retriggers here:
https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=751e2bc571ab&filter-searchStr=windows%207%20talos&selectedJob=18442797
Comment 31•9 years ago
|
||
Ok that seemed to give the old set of values. I wonder if this changeset actually did somehow change the windows numbers? Doing a bunch of retriggers to confirm, results to be reported here:
https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=865b01fa7947&newProject=mozilla-inbound&newRevision=d9520249a4a0&framework=1&filter=%25&showOnlyImportant=0
Comment 32•9 years ago
|
||
we had an error and accidentally turned this off for windows, so this fix turned it on for linux and back on for windows. osx is still disabled.
Comment 33•9 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #15)
> I wonder if tp5o_responsiveness_paint is broken under e10s? the delta there
> is huge.
Reading the description here https://wiki.mozilla.org/Buildbot/Talos/Tests#Responsiveness
Suggests that the test result value is:
> return sum([float(x)*float(x) / 1000000.0 for x in val_list])
This seem to me to depend on the number of collected events, and it's not normalized for the number of events. So less events -> much "better" result.
Maybe that's the main culprit here?
Comment 34•9 years ago
|
||
we see the tp5o memory/cpu issue in pgo on aurora now:
https://treeherder.mozilla.org/perf.html#/alerts?id=320
You need to log in
before you can comment on or make changes to this bug.
Description
•