Closed Bug 802801 Opened 12 years ago Closed 11 years ago

verify cedar mozharness talos numbers are valid

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: jmaher)

References

Details

(Whiteboard: [mozharness][talos])

Attachments

(1 file)

We need to verify the numbers are valid before rolling out to m-c.

Currently ts_paint looks suspect, but may be cleared up by some of the other talos fixes.

(talos)jmaher@jmaher-MacBookPro:~/mozilla/talos$ python compare.py --branch Cedar --revision 6257bd043755 --print-graph-url
Linux:
    tdhtmlr: 424.971 -> 470.147; 433.912.  http://goo.gl/CDcU0
    tdhtmlr_nochrome: 414.118 -> 464.618; 417.147.  http://goo.gl/43wcS
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: 1905.3 -> 2168.5; 2151.4.  http://goo.gl/pa2tw
    dromaeo_dom: 91.17 -> 145.493; 143.39.  http://goo.gl/oFJsx
    tscrollr: 10049.8 -> 18207.1; 17477.5.  http://goo.gl/ZdTXV
    a11yr: 521.0 -> 544.0; 525.5.  http://goo.gl/PT10j
    :( ts_paint: 698.526 -> 723.842; 800.263.  http://goo.gl/918iM
    tpaint: No results found
    tsvgr: 2798.05 -> 4050.36; 4031.59.  http://goo.gl/FvHaE
    tsvgr_opacity: 98.0 -> 104.0; 100.0.  http://goo.gl/x1w86
    tp5n: 370.707 -> 387.677; 375.813.  http://goo.gl/tzlWw
Linux64:
    tdhtmlr: 390.382 -> 428.5; 401.206.  http://goo.gl/BlPE3
    tdhtmlr_nochrome: 385.824 -> 427.088; 390.706.  http://goo.gl/1MBPY
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: No results found
    dromaeo_dom: No results found
    tscrollr: 10051.0 -> 17403.6; 16837.4.  http://goo.gl/8x52C
    a11yr: 449.5 -> 462.0; 452.0.  http://goo.gl/CDElv
    :( ts_paint: 651.474 -> 668.842; 780.158.  http://goo.gl/e9v3Y
    tpaint: No results found
    tsvgr: 2500.55 -> 3767.5; 2636.3.  http://goo.gl/Ojyoo
    tsvgr_opacity: 80.0 -> 84.0; 80.5.  http://goo.gl/gY4no
    tp5n: 316.788 -> 333.646; 320.354.  http://goo.gl/rxfBQ
Win:
    tdhtmlr: No results found
    tdhtmlr_nochrome: No results found
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: No results found
    dromaeo_dom: No results found
    tscrollr: No results found
    a11yr: No results found
    ts_paint: No results found
    tpaint: No results found
    tsvgr: No results found
    tsvgr_opacity: No results found
    tp5n: No results found
WinXP:
    tdhtmlr: No results found
    tdhtmlr_nochrome: 551.618 -> 701.647; 687.441.  http://goo.gl/HQcXi
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: 2364.01 -> 2685.86; 2650.83.  http://goo.gl/pt9Rk
    dromaeo_dom: 116.067 -> 188.397; 186.36.  http://goo.gl/T0h5x
    tscrollr: No results found
    a11yr: No results found
    ts_paint: No results found
    tpaint: No results found
    tsvgr: 2428.15 -> 3833.55; 3681.0.  http://goo.gl/XfyfX
    tsvgr_opacity: 756.0 -> 976.5; 778.5.  http://goo.gl/BoRps
    tp5n: 295.227 -> 312.535; 296.096.  http://goo.gl/UPKP5
OSX10.7:
    tdhtmlr: No results found
    tdhtmlr_nochrome: 407.618 -> 489.118; 417.059.  http://goo.gl/c7ICw
    ts_places_med: No results found
    ts_places_max: No results found
    :) dromaeo_css: 2772.0 -> 3187.76; 3195.23.  http://goo.gl/24iPB
    dromaeo_dom: 150.57 -> 253.583; 246.687.  http://goo.gl/xHRJE
    tscrollr: 14122.7 -> 18745.6; 14235.4.  http://goo.gl/pHMjL
    a11yr: 293.0 -> 308.0; 294.0.  http://goo.gl/jGryi
    :( ts_paint: 843.316 -> 969.105; 973.263.  http://goo.gl/E1tsz
    tpaint: No results found
    tsvgr: No results found
    tsvgr_opacity: No results found
    :) tp5n: 254.465 -> 280.919; 249.455.  http://goo.gl/NWphW
OSX64:
    tdhtmlr: 423.559 -> 455.176; 426.147.  http://goo.gl/zX67u
    tdhtmlr_nochrome: No results found
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: 2784.08 -> 3221.56; 3214.84.  http://goo.gl/jtodr
    dromaeo_dom: 147.703 -> 251.32; 249.667.  http://goo.gl/5EXLA
    tscrollr: No results found
    a11yr: No results found
    ts_paint: No results found
    tpaint: No results found
    tsvgr: 2156.45 -> 3370.64; 3370.0.  http://goo.gl/oLBnc
    tsvgr_opacity: 760.0 -> 983.0; 773.5.  http://goo.gl/mQYIU
    tp5n: No results found
OSX:
    tdhtmlr: No data for platform
    tdhtmlr_nochrome: No data for platform
    ts_places_med: No results found
    ts_places_max: No results found
    dromaeo_css: No results found
    dromaeo_dom: No results found
    tscrollr: No data for platform
    a11yr: No data for platform
    ts_paint: No results found
    tpaint: No results found
    tsvgr: No data for platform
    tsvgr_opacity: No data for platform
    tp5n: No data for platform
(talos)jmaher@jmaher-MacBookPro:~/mozilla/talos$
you're it!
Assignee: nobody → jmaher
If this is difficult/time consuming, we can wait til we're ready to roll out for another run.

If it's quick and easy, several checks throughout the process might be helpful.
Not sure what the status is here.  IMHO, and I am no statistician, it will be difficult and/or time-consuming to do this rigorously.  If we do end up writing tools to do this, it might be nice to keep them general enough such that the next time we have to do something like this they'll be ready
We just need to compare the numbers that are generated from mozharness to that which is reported to graph server for a similar or same changeset.

If they are in the same range, especially over a series of changesets, then we can assert that we are producing numbers in the same fashion as we do without mozharness.
please update this bug when cedar has a up to date build with talos numbers using the latest talos bits.
1. Unbreak Windows on Cedar
2. Unbreak tp on Linux and Windows on Cedar
3. Tell jmaher that Cedar has builds using the same talos.zip as the tip of mozilla-central, when it does and they're working
(4. He compares the numbers)
Let's not forget:

3.a. When a new talos.zip is uploaded for mozilla-central, upload a new talos python package to the puppetagain find-links page: http://puppetagain.pub.build.mozilla.org/data/python/packages/ . Get in the habit of doing this, either via a script or manually.  Personally, I would probably bump the package version here, but ultimately that's less important.  But the talos.json of Cedar does need to be updated with the mozilla-central copy.

I am happy to write a script to make this easier, if desired.
This URL compares the changeset that was run with the normal builbotcustom code and mozharness:
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=af6f1eaaa35e&newRev=14e3e9ab9994&submit=true

How can we tell if we have introduced any problems?
Linux:
:( tspaint_places_med	  658.9	->	  701.3	  744.5	[N/A] [PGO: N/A]	
:( tspaint_places_max	  663.4	->	  702.1	  752.2	[N/A] [PGO: N/A]	
:( ts_paint          	  666.7	->	  709.9	  743.3	[N/A] [PGO: N/A]	
Linux64:
:( tspaint_places_med	  610.4	->	  658.1	  696.4	[N/A] [PGO: N/A]	
:( tspaint_places_max	  609.6	->	  648.1	  694.8	[N/A] [PGO: N/A]	
:( ts_paint          	  607.2	->	  638.2	  695.6	[N/A] [PGO: N/A]	
win7:
:( tspaint_places_med	  500.7	->	20837.8	20880.6	[N/A] [PGO: N/A]	
:( tspaint_places_max	  501.3	->	20827.0	20897.1	[N/A] [PGO: N/A]	
:( ts_paint          	  729.2	->	20829.5	20880.5	[N/A] [PGO: N/A]	
win8:
:( tspaint_places_med	  685.9	->	20918.6	20990.9	[N/A] [PGO: N/A]	
:( tspaint_places_max	  680.2	->	20922.6	20986.9	[N/A] [PGO: N/A]	
:( ts_paint          	  699.1	->	20918.3	21000.7	[N/A] [PGO: N/A]	
osx10.7:
:( tspaint_places_med	  918.3	->	 1036.0	 1225.0	[N/A] [PGO: N/A]	
:( tspaint_places_max	  922.0	->	 1003.3	 1027.6	[N/A] [PGO: N/A]	
:( ts_paint          	  893.2	->	  998.8	 1016.5	[N/A] [PGO: N/A]	
osx64:
:( tspaint_places_max	  829.1	->	  873.3	  885.4	[N/A] [PGO: N/A]	
osx10.8:
:( tspaint_places_med	  665.6	->	  744.9	  771.3	[N/A] [PGO: N/A]	


So it appears that our ts test (start/stop the browser 20 times) is problematic.  the places_med|max are just different profiles used while running the test.  I have verified this on a few different changesets.  

I really don't understand how changing to mozharness could cause this, but maybe there is some additional overhead induced on the system with mozharness when it comes to launching a new process.  

We could take this as a bump in the numbers and accept that.  I am open to any thoughts here.
Attached image [screenshot]
I'm also willing to take the bump. I don't see a way out of it unless we had someone investigate what seems a very obscure situation.
I can see the bump on the third section:
http://graphs.mozilla.org/graph.html#tests=[[227,26,33],[227,26,35],[227,26,12],[227,26,24],[227,26,22]]&sel=none&displayrange=30&datatype=running
Should we doublecheck that our mozharness ts talos commandline options are the same as buildbot talos?
cedar:
/home/cltbld/talos-slave/test/build/venv/bin/talos --noisy --debug -v --executablePath /builds/slave/talos-slave/test/build/application/firefox/firefox --title talos-linux32-ix-023 --symbolsPath http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-linux/1373828505/firefox-25.0a1.en-US.linux-i686.crashreporter-symbols.zip --activeTests tscrollr:a11yr:ts_paint:tpaint --results_url http://graphs.mozilla.org/server/collect.cgi --output talos.yml --branchName Cedar --datazilla-url https://datazilla.mozilla.org/talos --authfile /builds/slave/talos-slave/test/oauth.txt --mozAfterPaint --filter ignore_first:5 --filter median --webServer localhost

m-c:
python PerfConfigurator.py -v -e ../firefox/firefox-bin -t talos-linux32-ix-046 --branchName Firefox-Non-PGO --resultsServer graphs.mozilla.org --resultsLink /server/collect.cgi --activeTests tscrollr:a11yr:ts_paint:tpaint --mozAfterPaint --filter ignore_first:5 --filter median --symbolsPath ../symbols


big differences are that we are running the talos script vs perfconfigurator.  Also that we have '--webServer localhost' specified.  I think the datazilla and authfile flags are harmless as they are utilized after the test is completed.  

also, we don't need '--output talos.yml' nor the '--debug' options for the mozharness stuff.
(In reply to Joel Maher (:jmaher) from comment #13)
>
> also, we don't need '--output talos.yml' nor the '--debug' options for the
> mozharness stuff.

Should we fix this before deploying to try?
the --output might help for logging purposes, no?

I'd be more concerned about the --webServer localhost for timing issues.
I would really like to remove the --webserver localhost, that causes us to do a python webserver instead of the apache one on the box.  Not sure why that doesn't affect the other tests which care more about page loading.
(In reply to Joel Maher (:jmaher) from comment #16)
> I would really like to remove the --webserver localhost, that causes us to
> do a python webserver instead of the apache one on the box.  Not sure why
> that doesn't affect the other tests which care more about page loading.

Would just removing do the trick?

Is there anything else before we go ahead an enable it on try?

On another note, I was thinking of sending this note to dev.platform and dev.tree-management.
What do you guys think?
My only concerned on what the reaction would be about "The only nit is that there will be a number hit for the ts tests that we are willing to take. [3]"

#####################

Hi,
We have recently been working hard to separate the buildbot logic that runs our talos jobs on tbpl to its own separate script.

This has the advantage of permitting anyone (specially the a-team) to adjust how our harnesses run talos inside of our infra without having to set up buildbot (which is what currently runs our talos jobs). This also permits anyone to run the jobs locally in the same manner as Releng's infrastructure. This also allows for further development and flexibility on how we configure the jobs we run.

This work is near complete and ready to go live. [1]

Initially, we will enable it on the try server to see production-like load. So far, it's been looking great on Cedar [2]

The only nit is that there will be a number hit for the ts tests that we are willing to take. [3]

There's one thing to do on your part if you want to not have failing *talos* jobs on the try server, make sure that 3d1c2ca7efe8 is in your local checkout [4].
If you have updated your repo from m-i by Friday 12th at 10:19AM PDT you should be good to go.

Once we get a couple of days worth of load on the try server we will go ahead and enable it for every m-c based repository.

If you have any questions please write a comment on bug 713055.

Best regards,
Jason & Armen
Release Engineering

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=713055
[2] https://tbpl.mozilla.org/?tree=Cedar&jobname=talos
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=802801#c10
[4] http://hg.mozilla.org/integration/mozilla-inbound/rev/3d1c2ca7efe8
We forgot to add Jason to this :)
(In reply to Joel Maher (:jmaher) from comment #16)
> I would really like to remove the --webserver localhost, that causes us to
> do a python webserver instead of the apache one on the box.

I thought the python webserver is used only when we have the --develop option? http://hg.mozilla.org/build/talos/file/fcbb9d7d3c78/talos/PerfConfigurator.py#l362
It appears I was wrong about the --webserver localhost and it only is set when using --develop.
That sounds great.
It seems that we're ready to enable on try tomorrow and make an announcement.

Please correct me if I got it wrong.
all sounds good to me.
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #17)
> Is there anything else before we go ahead an enable it on try?
> 
> On another note, I was thinking of sending this note to dev.platform and
> dev.tree-management.
> What do you guys think?

dev.platform, then Try would be a good order.
That looks like a good writeup; thanks!
I posted the announcement.
I will enable it later today or tomorrow morning.
We have done the verification. Let's turn to bug 713055 to finish things up.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → Mozharness
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: