Closed Bug 1260824 Opened 8 years ago Closed 8 years ago

Autophone - submit s1s2 data to perfherder

Categories

(Testing Graveyard :: Autophone, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bc, Assigned: bc)

References

Details

Attachments

(21 files, 5 obsolete files)

2.42 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
794 bytes, patch
jmaher
: review+
Details | Diff | Splinter Review
809 bytes, patch
jmaher
: review+
Details | Diff | Splinter Review
33.05 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
33.27 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
5.35 KB, patch
jmaher
: feedback+
Details | Diff | Splinter Review
2.47 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
1.38 MB, patch
jmaher
: feedback+
Details | Diff | Splinter Review
3.07 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
6.18 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
9.26 KB, patch
Details | Diff | Splinter Review
91.47 KB, patch
Details | Diff | Splinter Review
10.65 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
11.56 KB, text/plain
Details
822 bytes, text/plain
Details
4.27 KB, text/plain
wlach
: feedback+
Details
10.35 KB, patch
Details | Diff | Splinter Review
5.19 KB, text/plain
Details
9.41 KB, text/plain
Details
3.67 KB, text/plain
Details
3.20 KB, text/plain
Details
In today's mobile testing meeting, we discussed getting Autophone to begin submitting performance data to perfherder. Initially we can do this while also submitting the data to phonedash while we evaluate the usefulness of the test data on perfherder.

This will work with the perfherder's grouping of the data by platform as long as we keep the 1-1 mapping between Android version and device type.

The original work to get the autophone talos tests to submit to perfherder was in bug 1192069.
we should first decide on what data we want to submit, getting it into treeherder should be easy.

We will just have to make sure each platform maps to a device- so the nexus 6p's will be "android 6.0 ...", etc.  No simple way to distinguish 6p-1, 6p-2, etc.

right now we are looking at s1/s2 only.  lets list out the platforms and subtests or variants we care about.
We have a 1-1 mapping from device type to android version now. We will just need to make sure to preserve that for the devices submitting to perfherder going forward.

s1s2 is the only performance test we have other than the talos tests so it is the goal here.

I don't see the usefulness of trying to limit anything at this stage. The interesting part will be if submitting the mean instead of all of the data points for each measurement will allow us to see regressions. Another possibility is to submit all of the data points but perfherder might not support that much data.

We will find out what appears to be useful by reporting to phonedash and perfherder simultaneously and comparing the results.
(In reply to Bob Clary [:bc:] from comment #2)
> We have a 1-1 mapping from device type to android version now. We will just
> need to make sure to preserve that for the devices submitting to perfherder
> going forward.
> 
> s1s2 is the only performance test we have other than the talos tests so it
> is the goal here.
> 
> I don't see the usefulness of trying to limit anything at this stage. The
> interesting part will be if submitting the mean instead of all of the data
> points for each measurement will allow us to see regressions. Another
> possibility is to submit all of the data points but perfherder might not
> support that much data.

How many datapoints are we talking about? Generally we take the median, not the mean, in Talos as that's more resilient to outliers. Kyle can also give helpful advice on this.
The raw data consists of first and second page loads x start and stop times x number of iterations for each test... 2 * 2 * 8 = 32.

We currently have 6 variations of s1s2 (local, remote) x (blank, twitter, nytimes).
I humbly suggest Autophone submits all data to Perfherder:  Then Perfherder can change its ingestion to use mean, or median, or all samples, or some other statistic, without impacting the submission code.
(In reply to Kyle Lahnakoski [:ekyle] from comment #5)
> I humbly suggest Autophone submits all data to Perfherder:  Then Perfherder
> can change its ingestion to use mean, or median, or all samples, or some
> other statistic, without impacting the submission code.

Perfherder currently relies on the client to send an aggregated summary of each test, and to be honest, I'm not keen on changing this. I'm already pretty busy with trying to maintain the existing codebase without adding in a per-framework ETL framework into the mix. I think it's better to delegate that responsibility to the test framework maintainers, though of course I'm happy to give advice.

If no one has a better idea, I'd suggest starting with median and going from there, that's the closest thing we have to a best practice.
add median choice to existing graph.
Attachment #8741803 - Flags: review?(jmaher)
fix tooltip revision for release branches.
Attachment #8741804 - Flags: review?(jmaher)
remove unused errorbar vars in handlers.py
Attachment #8741805 - Flags: review?(jmaher)
Attached patch WIP1 bug-1260824-all.patch (obsolete) — Splinter Review
Not ready for review.
Comment on attachment 8741803 [details] [diff] [review]
bug-1260824-add-median.patch

Review of attachment 8741803 [details] [diff] [review]:
-----------------------------------------------------------------

nice and simple!
Attachment #8741803 - Flags: review?(jmaher) → review+
Comment on attachment 8741804 [details] [diff] [review]
bug-1260824-tooltip.patch

Review of attachment 8741804 [details] [diff] [review]:
-----------------------------------------------------------------

good fix.
Attachment #8741804 - Flags: review?(jmaher) → review+
Attachment #8741805 - Flags: review?(jmaher) → review+
Attached patch bug-1260824-all-v2.patch (obsolete) — Splinter Review
Deployed to phonedash-dev for testing. phonedash-dev has a copy of production data up through 2016-04-09.

http://phonedash-dev.allizom.org/all.html#/2016-04-03/2016-04-09
Attachment #8741806 - Attachment is obsolete: true
Attachment #8743304 - Flags: review?(jmaher)
The previous patch didn't hook up the filter for the product. When attempting to fix that it turned out that the jquery selectors containing periods, e.g. #org.mozilla.fennec, didn't work. jgraham posits it was due to the periods being treated as class selectors. Since product and repo are really the same thing, I've dropped product altogether as a filter.
Attachment #8743304 - Attachment is obsolete: true
Attachment #8743304 - Flags: review?(jmaher)
Attachment #8743386 - Flags: review?(jmaher)
Comment on attachment 8743386 [details] [diff] [review]
bug-1260824-all-v3.patch

Review of attachment 8743386 [details] [diff] [review]:
-----------------------------------------------------------------

a few nits to consider, nothing critical

::: html/scripts/allapp.js
@@ +348,5 @@
> +        vars:   series.vars,
> +        label:  plot_series_key,
> +        data:   [],
> +        data_hash: {},
> +        counts: [],

nit: these are justified to match counts:, but data_hash breaks that, maybe just leave them normal with a single space

@@ +412,5 @@
> +    }
> +    if (a.label < b.label) {
> +      return -1;
> +    }
> +    if (a.label > b.label) {

can this be an else if?

@@ +508,5 @@
> +              start = next_day(start);
> +              if (start < end) {
> +                getData(start, end, params, day+1);
> +              }
> +              else {

can this be |} else {|, bring it on the same line?  I see mixed styles in this code, I prefer same line, but at the very least pick one for the entire file and project.

@@ +523,5 @@
> +  $.makeArray($('#controls select').each(function(i, e) { params[e.name] = e.value; }));
> +  var startdatestr = $('#startdate').attr('value');
> +  var enddatestr = $('#enddate').attr('value');
> +
> +  var hash = '#/' + startdatestr + '/' + enddatestr;

I could see a need for making the url params more typical rather than just a date.  startDate=xx-xx-xxx&endDate=xx-xx-xxxx&...

Something to consider later, it would help with cut/paste urls for bugs.

@@ +532,5 @@
> +
> +  if (load != "load") {
> +    makePlot(params, all_data);
> +  }
> +  else {

another case of else on the new line.

::: html/scripts/plot.js
@@ +145,5 @@
> +    url: '',
> +    count: count
> +  };
> +  if (typeof(valueerr) != 'undefined') {
> +    params.valueerr = '&plusmn;' + Math.floor(valueerr);

what is plusmn?

::: server/handlers.py
@@ +298,5 @@
> +        startdate = tz_pac.localize(startdate)
> +        enddate = tz_pac.localize(enddate)
> +        # convert the datetimes to utc.
> +        startdate = startdate.astimezone(tz_utc)
> +        enddate = enddate.astimezone(tz_utc)

this is confusing as we assing startdate/enddate 3 times, but reading it a few times I understand what is going on.  I don't know if simplifying to a more complex line of code would help...more of a comment than a request.
Attachment #8743386 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #15)
> Comment on attachment 8743386 [details] [diff] [review]
> bug-1260824-all-v3.patch
> 
> Review of attachment 8743386 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> a few nits to consider, nothing critical
> 
> ::: html/scripts/allapp.js
> @@ +348,5 @@
> > +        vars:   series.vars,
> > +        label:  plot_series_key,
> > +        data:   [],
> > +        data_hash: {},
> > +        counts: [],
> 
> nit: these are justified to match counts:, but data_hash breaks that, maybe
> just leave them normal with a single space
> 

Ok.

> @@ +412,5 @@
> > +    }
> > +    if (a.label < b.label) {
> > +      return -1;
> > +    }
> > +    if (a.label > b.label) {
> 

I normally don't do the else if thingy if the earlier branch was an early return.

> can this be an else if?
> 
> @@ +508,5 @@
> > +              start = next_day(start);
> > +              if (start < end) {
> > +                getData(start, end, params, day+1);
> > +              }
> > +              else {
> 
> can this be |} else {|, bring it on the same line?  I see mixed styles in
> this code, I prefer same line, but at the very least pick one for the entire
> file and project.
> 

Ok. I'm not a fan of } else {, but will change the project to match this style.

> @@ +523,5 @@
> > +  $.makeArray($('#controls select').each(function(i, e) { params[e.name] = e.value; }));
> > +  var startdatestr = $('#startdate').attr('value');
> > +  var enddatestr = $('#enddate').attr('value');
> > +
> > +  var hash = '#/' + startdatestr + '/' + enddatestr;
> 
> I could see a need for making the url params more typical rather than just a
> date.  startDate=xx-xx-xxx&endDate=xx-xx-xxxx&...
> 
> Something to consider later, it would help with cut/paste urls for bugs.
> 

The whole Router thing I inherited from the original makes this troublesome. I didn't like losing the bookmarkability but didn't spend any time thinking about it. I could probably work around it though and add the query string parameters. Let me think about it.

> @@ +532,5 @@
> > +
> > +  if (load != "load") {
> > +    makePlot(params, all_data);
> > +  }
> > +  else {
> 
> another case of else on the new line.
> 

Ok.

> ::: html/scripts/plot.js
> @@ +145,5 @@
> > +    url: '',
> > +    count: count
> > +  };
> > +  if (typeof(valueerr) != 'undefined') {
> > +    params.valueerr = '&plusmn;' + Math.floor(valueerr);
> 
> what is plusmn?
> 

This is the +- thingy in the tooltip.

> ::: server/handlers.py
> @@ +298,5 @@
> > +        startdate = tz_pac.localize(startdate)
> > +        enddate = tz_pac.localize(enddate)
> > +        # convert the datetimes to utc.
> > +        startdate = startdate.astimezone(tz_utc)
> > +        enddate = enddate.astimezone(tz_utc)
> 
> this is confusing as we assing startdate/enddate 3 times, but reading it a
> few times I understand what is going on.  I don't know if simplifying to a
> more complex line of code would help...more of a comment than a request.

Yeah. I barely understand the timezone stuff. This was a clone from the original data api. I think it would be worth revisiting but not in this bug.
* address review comments (querystring deferred)
* fix stddev calc in allapp.js
* fix binning thinko
* fix phone options when only one of a phone type is available.
* make sure values are sorted before getting median in handlers.py
* default valuetype to median in all.html
* add throbbertime metric (throbberstop - throbberstop)
Attachment #8743952 - Flags: review?(jmaher)
I've been running variations of this patch against treeherder.allizom.org and phonedash-dev.allizom.org and finding/fixing issues.

This probably collects too many measurements. Feedback requested.

Note: phonedash-dev.allizom.org *appears* to be dropping some of the data points without adding them to the database. I'm currently submitting the latest runs to my local instance. I'll upload to phonedash-dev later.

Ignore the runs before today: https://treeherder.allizom.org/perf.html
Attachment #8743954 - Flags: feedback?(jmaher)
Comment on attachment 8743952 [details] [diff] [review]
bug-1260824-all-v4.patch

Review of attachment 8743952 [details] [diff] [review]:
-----------------------------------------------------------------

thanks for the updated patch!
Attachment #8743952 - Flags: review?(jmaher) → review+
Comment on attachment 8743954 [details] [diff] [review]
bug-1260824-perfherder-s1s2-v1.patch

Review of attachment 8743954 [details] [diff] [review]:
-----------------------------------------------------------------

just a suggestion about the name to use.

::: tests/s1s2test.py
@@ +286,5 @@
> +                perfherder_suite = PerfherderSuite(name=testname, value=utils.geometric_mean(test_values))
> +                for metric_key in metric_keys:
> +                    for cache_key in cache_keys:
> +                        cache_name = cache_names[cache_key]
> +                        subtest_name = "%s %s median" % (metric_key, cache_name)

I am not sure if we want 'median' as part of the test name

@@ +288,5 @@
> +                    for cache_key in cache_keys:
> +                        cache_name = cache_names[cache_key]
> +                        subtest_name = "%s %s median" % (metric_key, cache_name)
> +                        perfherder_suite.add_subtest(subtest_name, perfherder_values[metric_key][cache_key]['median'])
> +                    subtest_name = "%s geometric mean" % metric_key

same here for geometric mean
Attachment #8743954 - Flags: feedback?(jmaher) → feedback+
instead of local-twitter throbberstart second median opt
           local-twitter throbberstart second opt

but what about the ones that are essentially binned and that i'm using geometric mean on? Would calling them summary be too confusing? As in:

instead of local-blank throbberstart geometric mean opt
           local-blank throbberstart summary opt

or should i leave these binned values out?
Flags: needinfo?(jmaher)
as discussed on irc, lets do this for staging, using 'binned' as the keyword.
Flags: needinfo?(jmaher)
checking my results locally I found that not all data points are being retrieved by api/s1s2/alldata. Looking into it.
When reviewing data, I realized that alldata wasn't returning every measurement for a run. I made a mistake in thinking runstamp was set by Autophone when in reality it is set by Phonedash when it saves the data. This corrects the order of how the rows are presented and included rejected in the runs key.
Attachment #8744436 - Flags: review?(jmaher)
Comment on attachment 8744436 [details] [diff] [review]
bug-1260824-fix-alldata-collection.patch

Review of attachment 8744436 [details] [diff] [review]:
-----------------------------------------------------------------

the big old rubber stamp comes out!
Attachment #8744436 - Flags: review?(jmaher) → review+
deployed to phonedash-dev:

Example graph with an apparent regression:

http://phonedash-dev.allizom.org/all.html#/2016-03-20/2016-03-26/binning=repo-phonetype-phoneid-test_name&rejected=norejected&errorbars=noerrorbars&errorbartype=standarderror&valuetype=median&local-blank=on&throbberstart=on&first=on&fx-team=on&mozilla-inbound=on&nexus-4=on&nexus-4-1=on&nexus-4-2=on&nexus-4-3=on&nexus-4-4=on&nexus-4-5=on&nexus-4-6=on&nexus-5=on&nexus-5-1=on&nexus-5-2=on&nexus-5-3=on&nexus-5-4=on&nexus-5-5=on&nexus-6=on&nexus-6-1=on&nexus-6-2=on&nexus-6p-6=on&nexus-9=on&nexus-9-1=on&nexus-9-2=on&nexus-s=on&nexus-s-3=on&nexus-s-8=on&nexus-s-9=on&samsung-gs3=on&samsung-gs3-3=on

Now with progressbar notification and a different layout. Note you can select x-ranges, drag tooltips, bookmark and reload urls.

If you load the base page on phonedash-dev, there will be no data displayed since there is no data for today. If you expand the date range to include 4/9/2016 the page will reload but the checkboxes for the new data will not be checked automatically.

I'll finish this off later today.
Attachment #8745241 - Flags: feedback?(jmaher)
PS. I submitted mozilla-inbound for 2016-03-23T01:30:00 to 2016-03-23T09:40:00 with my gs3 and 6p reporting to treeherder staging. The data was submitted to my local phonedash so you can't see it on phonedash-dev yet, but the perfherder data was submitted to treeherder staging. Hopefully there was enough data for alerting. Not sure if the data is clean enough to generate an alert probably at least partially due to the fact that I was using my workstation at the same time and I think it has affected at least the remote measurements.
I see some s1s2 data on treeherder.allizom.org:
https://treeherder.allizom.org/perf.html#/graphs?series=%5Bmozilla-inbound,fe6d0c64ee0f4cfbb7b4b073423163d223f8fb05,1%5D

that is cool to see, I just added the summary data there- but it appears there is a lot of choices.

Looking at android 4, I see more summary data and on things like local-blank and remote-blank, I can see many revisions:
https://treeherder.allizom.org/perf.html#/graphs?series=%5Bmozilla-inbound,d109471419b3f69da4dae10a05699f07343c48b1,1%5D&series=%5Bmozilla-inbound,a08c494bbfb265619c075fb53dd1854800ceac37,1%5D&series=%5Bmozilla-inbound,31af9529d5771ce861e18e88b2ec721621fd5a8c,1%5D&series=%5Bmozilla-inbound,cd8f7d8edc66ccea3f351060c129f18d329d091c,1%5D

it appears that the android 4 data came in two parts- maybe we adjusted the number of tests or test names and that forced us to generate a new "signature" causing two different graphs.  Having the data split between a couple of signatures, does result in no alerts- although I am not sure we would end up with alerts- maybe a different signature or two.
Comment on attachment 8745241 [details] [diff] [review]
WIP bug-1260824-selections-tooltips-drag-querystring.patch

Review of attachment 8745241 [details] [diff] [review]:
-----------------------------------------------------------------

overall this looks good- there are a lot of changes as you warned me of!  looking at this on phonedash-dev looks great and seems fairly easy to understand.   +1 for making shareable links work:)

::: html/scripts/allapp.js
@@ +857,2 @@
>      }).init('/');
> +  //  });

can we just delete this line that is commented out?
Attachment #8745241 - Flags: feedback?(jmaher) → feedback+
(In reply to Joel Maher (:jmaher) from comment #29)
> I see some s1s2 data on treeherder.allizom.org:

Yeah, we changed the names by removing median, adding binning to some of the binned results which resulted in multiple names for many things. It seems impossible to do any date ranged except those permitted in the select box so it is difficult to pick the exact date range as far as I know.

I think the new names are in affect for after Thursday Noon: https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=d2670d2a634a3ee685da69395d43a18ebc965575

I think this shows a partial view of the summary tests:
<https://treeherder.allizom.org/perf.html#/graphs?timerange=604800&series=%5Bmozilla-inbound,a08c494bbfb265619c075fb53dd1854800ceac37,1%5D&series=%5Bmozilla-inbound,d109471419b3f69da4dae10a05699f07343c48b1,1%5D&series=%5Bmozilla-inbound,72346dd0ddf4c4d6a3094b3f41ef97b1e17acfdb,1%5D&series=%5Bmozilla-inbound,4ed63842d23b513721c5ab65e1fda64289a800f1,1%5D&series=%5Bmozilla-inbound,336c083daa3cff0d82515bc1645f9576d03e1fb2,1%5D&series=%5Bmozilla-inbound,d1d19256902dfd166cfbac390b5623c9ff4bae5b,1%5D&series=%5Bmozilla-inbound,606758e66355f38cb3ab87ea18c9adef779546e6,1%5D&series=%5Bmozilla-inbound,4f6caa88ea415fa44ee65ff83c4c1b1884419ece,1%5D&series=%5Bmozilla-inbound,31af9529d5771ce861e18e88b2ec721621fd5a8c,1%5D&series=%5Bmozilla-inbound,0b4d56abd4c3579748e3846f4eb27179ac44e73b,1%5D&series=%5Bmozilla-inbound,cd8f7d8edc66ccea3f351060c129f18d329d091c,1%5D&series=%5Bmozilla-inbound,213311b66166247a829098ed46af16cb2ac3f5be,1%5D&series=%5Bmozilla-inbound,7aa02785947d9a106ba8a3cbca85e8e49ce7d201,1%5D&series=%5Bmozilla-inbound,ea6f4e8ad202e747291e5336c92e5de63f63bdc7,1%5D>
(In reply to Joel Maher (:jmaher) from comment #30)
> Comment on attachment 8745241 [details] [diff] [review]
> WIP bug-1260824-selections-tooltips-drag-querystring.patch

> 
> can we just delete this line that is commented out?

done.

ekyle ran into a non-standard use of exponent &&. changed to use Math.pow not **

I added a patch to translate old phonedash urls to new versions, and remove the remove unnecessary comment and another. Deployed to phonedash-dev.

try this:

http://phonedash-dev.allizom.org/#/org.mozilla.fennec/throbberstart/local-blank/norejected/2016-03-21/2016-03-24/notcached/noerrorbars/standarderror/notry/median

http://phonedash-dev.allizom.org/all.html#/org.mozilla.fennec/throbberstart/local-blank/norejected/2016-03-21/2016-03-24/notcached/noerrorbars/standarderror/notry/median

Then select the phones you want and click Apply.

If this passes muster, I would like to do one more patch before landing which would make this the default page and get rid of the old style page altogether. Is that ok with you?
Attachment #8745880 - Flags: review?(jmaher)
Comment on attachment 8745880 [details] [diff] [review]
bug-1260824-translate-urls.patch

Review of attachment 8745880 [details] [diff] [review]:
-----------------------------------------------------------------

::: html/scripts/allapp.js
@@ +801,5 @@
> +   * where querystring is the url encoded values of the form elements.
> +   */
> +
> +  var parts = document.location.hash.split('/');
> +  if (parts && parts.length == 12) {

are we guaranteed to have all old style urls having parts.length=12
Attachment #8745880 - Flags: review?(jmaher) → review+
No. Some of the older urls will have different sets of parameters. Rather than try to cover all of the bases in the url translation, I will review the outstanding bugs and post new urls to the relevant graphs.
Oh, what the hell. There aren't that many url versions in phonedash. I'll work up the complete set this afternoon after I get back and we'll have all of the versions covered.

As part of the replacement of index.html with all.html I would also like to add a bit of help text which will be displayed instead of the singularly unhelpful "No data" message we currently display.
(In reply to Joel Maher (:jmaher) from comment #33)
> 
> I clicked the second link and tried adding phones, then clicking apply.  I
> would like a sanity check if there is no data displayed and no phones
> selected to then have a message similar to "there are no phones selected,
> that might be why there is no data"

I must have left out the metric in the link:

http://phonedash-dev.allizom.org/all.html#/2016-03-21/2016-03-24/binning=repo-phonetype-phoneid-test_name-cached_label-metric&rejected=norejected&errorbars=noerrorbars&errorbartype=standarderror&valuetype=median&local-blank=on&throbberstart=on&first=on&mozilla-inbound=on&nexus-4=on&nexus-4-1=on&nexus-4-2=on&nexus-4-5=on&nexus-4-6=on&nexus-5=on&nexus-5-1=on&nexus-5-2=on&nexus-5-3=on&nexus-5-4=on&nexus-6=on&nexus-6-1=on&nexus-6-2=on&nexus-9=on&nexus-9-1=on&nexus-9-2=on&nexus-s=on&nexus-s-3=on&nexus-s-8=on&nexus-s-9=on

Note I am loading a copy of my local database so we can see the nexus-6p and samsung-gs3 testing for this range so phonedash-dev will be unresponsive until it finishing importing.

I think updating the No data message and including a Help link will cover this. It has been worrying me as well.
sounds like a plan!
I realized I can use the same trick to help people if their first landing page doesn't have data. If the first page load has no data and they then change the dates, force the new options to be checked when the new data is loaded. /me drops phone. ;-)
Comment on attachment 8746360 [details] [diff] [review]
bug-1260824-translate-urls-v2.patch

Review of attachment 8746360 [details] [diff] [review]:
-----------------------------------------------------------------

nice solution!
Attachment #8746360 - Flags: review?(jmaher) → review+
One final patch ( i hope ) to handle no data situations:

* reinitializes META when initializing.
* optimizes some property accesses getDataPoints.
* added some console.time and console.timeEnd calls for use in performance 
  profiling.
* removed the filtering of build times with the current selection in 
  getDataPoints.
* clear the dynamically generated checkboxes and legend when no data is 
  displayed.
* when the document.location.hash is changed, the Router will fire setControls.
  Since we modify the document.location.hash in getDataPoints after we have
  already scheduled a plot, added the ability to suppress the redundant plot
  request due to the hash modification.
Attachment #8746955 - Flags: review?(jmaher)
this removes the old version of phonedash's app and replaces it with the new 'all' version. I haven't done the help page yet, but want to get this landed on phonedash production so I can start using it to triage results. I'll do a follow up patch at a more convenient time to add the help.

Both patches have been deployed to phonedash-dev.
Attachment #8746956 - Flags: review?(jmaher)
Comment on attachment 8746955 [details] [diff] [review]
bug-1260824-force-checkboxes-v1.patch

Obsoleting this since I broke bookmarking.
Attachment #8746955 - Attachment is obsolete: true
Attachment #8746955 - Flags: review?(jmaher)
Attachment #8746956 - Attachment is obsolete: true
Attachment #8746956 - Flags: review?(jmaher)
Comment on attachment 8746955 [details] [diff] [review]
bug-1260824-force-checkboxes-v1.patch

Review of attachment 8746955 [details] [diff] [review]:
-----------------------------------------------------------------

what is here is nice

::: html/scripts/allapp.js
@@ +610,5 @@
>  
>    setPlotHeight();
>  
> +  if (console) {
> +    console.time('Plotting...');

can we make this "Phonedash: Plotting...", likewise other console messages
Attachment #8746955 - Flags: review+
Comment on attachment 8746956 [details] [diff] [review]
bug-1260824-replace-original.patch

Review of attachment 8746956 [details] [diff] [review]:
-----------------------------------------------------------------

just some nits, but this is a good patch, rs=me

::: html/scripts/app.js
@@ +255,5 @@
> +  }
> +  INITIALIZING = false;
> +
> +  if (console) {
> +    console.time('getDataPoints: all_series...');

as the other patch, lets add a phonedash flag here and all other console messages.

@@ +488,3 @@
>      }
>      if (a.label > b.label) {
> +      return +1;

why +1 and -1?  is +1 == 1 ?
Attachment #8746956 - Flags: review+
(In reply to Joel Maher (:jmaher) from comment #45)
> Comment on attachment 8746955 [details] [diff] [review]
> bug-1260824-force-checkboxes-v1.patch
> 
> Review of attachment 8746955 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> what is here is nice
> 
> ::: html/scripts/allapp.js
> @@ +610,5 @@
> >  
> >    setPlotHeight();
> >  
> > +  if (console) {
> > +    console.time('Plotting...');
> 
> can we make this "Phonedash: Plotting...", likewise other console messages

Try this out for yourself in Developer Tools -> Performance in the Waterfall. The text is used as the label for timed used in a section of code. It is usually pretty short and truncated. If we go with a Phonedash: prefix, I think the Waterfall will show mostly the Phonedash: part of the label. I'll look again to make sure it isn't already shortened too much in the Waterfall, but do please check it out since providing the labels and times for the Performance Waterfall is the sole purpose for the output.

I considered removing the console.time, console.timeEnd since I only wanted them when investigating the performance. I decided to leave them in for others to use if they were interested in helping improve the graph performance.

(In reply to Joel Maher (:jmaher) from comment #46)
> Comment on attachment 8746956 [details] [diff] [review]
> bug-1260824-replace-original.patch
> 
> Review of attachment 8746956 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> just some nits, but this is a good patch, rs=me
> 
> ::: html/scripts/app.js
> @@ +255,5 @@
> > +  }
> > +  INITIALIZING = false;
> > +
> > +  if (console) {
> > +    console.time('getDataPoints: all_series...');
> 
> as the other patch, lets add a phonedash flag here and all other console
> messages.
> 

See above.

> @@ +488,3 @@
> >      }
> >      if (a.label > b.label) {
> > +      return +1;
> 
> why +1 and -1?  is +1 == 1 ?

It's redundant but it's always been that way.

https://github.com/markrcote/phonedash/blob/master/html/scripts/app.js#L120
ok, I get the concern with too much text being hard to figure out- my main goal for the prefix was to understand where the message was coming from.
carrying forward r+ with minor tweaks.
Attachment #8747593 - Flags: review+
carrying forward r+
Attachment #8747595 - Flags: review+
I've updated Mark's repo and gotten the mozilla repo up to date. I'll start treating the mozilla version as the official repo from now on though phonedash.mozilla.org and phonedash-dev.allizom.org still pull from mark's repo by default.

merge
https://github.com/mozilla/phonedash/commit/34943e3ef8277f80b1f3680b73c39db64c28949f

https://github.com/mozilla/phonedash/commit/52e2473fb7491a45f2498200d75b8013b9f1393e
https://github.com/mozilla/phonedash/commit/674da92d42dc3e58ba808ff3521c4e57d50bcc1d
https://github.com/mozilla/phonedash/commit/ec54f3e55582c4347976dd57dcf8cbc92b651edf
https://github.com/mozilla/phonedash/commit/120be1b22b95d1122328dfadf6642cc47ccfecba
https://github.com/mozilla/phonedash/commit/33d657e3e02e2b8e1aed7b0e8a64f330fa087e50

deployed to phonedash.mozilla.org and phonedash-dev.allizom.org and autophone-{1,2,3}.
From what I can tell, we have the following properties that can be set for the summary and subtests being submitted to perfherder.

boolean shouldAlert    # if false don't generate alerts for the test
        default True

int     alertThreshold # percentage change
        default 2

int     minBackWindow  # minimum backward looking window in # of builds
        default 12 

int     maxBackWindow  # maximum backward looking window in # of builds
        default 24

int     foreWindow     # forward looking window in # of builds.
        default 12

We should load these possible values from the test's config file. e.g. https://github.com/mozilla/autophone/blob/master/configs/s1s2-blank-local.ini . The [treeherder] section is probably the most appropriate.

It may be the case that a single global value is inappropriate for all device classes. For example, we might want to set different values depending on the device class. It isn't clear to me the best approach to support per device class settings however.

wlach, do have any concerns or suggestions about customizing these properties? Will the 'signatures' change if these properties are changed in the future?
Flags: needinfo?(wlachance)
actually, i think a separate [perfherder] section would be more appropriate.
(In reply to Bob Clary [:bc:] from comment #53)
> From what I can tell, we have the following properties that can be set for
> the summary and subtests being submitted to perfherder.
> ...
> wlach, do have any concerns or suggestions about customizing these
> properties? Will the 'signatures' change if these properties are changed in
> the future?

The default settings work well in most cases, I wouldn't change them unless it's shown to be needed. To answer your question though, no, the signatures won't change if these properties are altered.
Flags: needinfo?(wlachance)
Right. No need to fiddle the dials unless necessary. I'm just trying to make the adaptations possible in the future without having to write more code.

And we'll probably need to have separate values for each metric: throbberstart, throbberstop and throbbertime.
I am in the process of completing a test run on mozilla-beta with my local devices reporting to treeherder staging for the range 2016-02-22T00:00:00 to 2016-03-16T00:00:00. There is what appears to me to be a significant improvement for throbberstart for the 2016-03-07 08:02:31 merge.

This isn't quite completed yet. The devices on are 2016-03-14 12:04:02 builds atm.

You can see the phonedash-dev graph at:

http://phonedash-dev.allizom.org/#/2016-02-20/2016-03-16/binning=repo-phonetype-phoneid-test_name-cached_label-metric&rejected=norejected&errorbars=noerrorbars&errorbartype=standarderror&valuetype=median&local-nytimes=on&throbberstart=on&first=on&mozilla-beta=on&nexus-6p-6=on&samsung-gs3-3=on

and the perfherder graphs for local-nytimes throbberstart first visit at 

https://treeherder.allizom.org/perf.html#/graphs?timerange=7776000&series=%5Bmozilla-beta,47c9411524c6b1d75b3c9079f34c34e8112ea89d,1%5D&series=%5Bmozilla-beta,b2a8741e0c6dc8d909dccf861f91b3aec3ca8440,1%5D&series=%5Bmozilla-beta,58b18f2720ec016758f6b3af9b85e3a40ca99bc3,1%5D&series=%5Bmozilla-beta,fc2622b7d1af66a1564ee76ed42f9e276e95669a,1%5D

I would have thought that perfherder would generate an improvement alert for the merge on 2016-03-07 but so far I don't see anything at 

https://treeherder.allizom.org/perf.html#/alerts?status=-1&framework=3

Do I just not have enough data? Is it too noisy? Am I doing something else wrong?
Flags: needinfo?(wlachance)
(In reply to Bob Clary [:bc:] from comment #57)
> Do I just not have enough data? Is it too noisy? Am I doing something else
> wrong?

To keep computation costs reasonable, we don't generate alerts for data more than 2 weeks old:

https://github.com/mozilla/treeherder/blob/706014481e7b9e0e55335a04f0b9ff75924b082b/treeherder/perf/alerts.py#L18
Flags: needinfo?(wlachance)
crap. Ok, so I guess we'll have to test live. thanks!
We will have to adjust our windows for beta and and release. For '2016-04-18' and '2016-05-02' I have 20 mozilla-beta builds and 15 mozilla-release builds which isn't enough with the default back and fore windows.
one trick is to retrigger (do >1 job) for each build- that will get the 12+ data points much faster.
(In reply to Bob Clary [:bc:] from comment #59)
> ****. Ok, so I guess we'll have to test live. thanks!

If you'd like I could run the alert generation stuff manually on your series to see what the results would have been...
(In reply to William Lachance (:wlach) from comment #62)
> (In reply to Bob Clary [:bc:] from comment #59)
> > crap. Ok, so I guess we'll have to test live. thanks!
> 
> If you'd like I could run the alert generation stuff manually on your series
> to see what the results would have been...

That would be totally AWESOME! Thanks!
Attachment #8748104 - Flags: review?(jmaher)
Comment on attachment 8748104 [details] [diff] [review]
bug-1260824-perfherder-s1s2-v2.patch

Review of attachment 8748104 [details] [diff] [review]:
-----------------------------------------------------------------

this looks really straightforward, nice patch!
Attachment #8748104 - Flags: review?(jmaher) → review+
(In reply to Bob Clary [:bc:] from comment #63)
> (In reply to William Lachance (:wlach) from comment #62)
> > (In reply to Bob Clary [:bc:] from comment #59)
> > > ****. Ok, so I guess we'll have to test live. thanks!
> > 
> > If you'd like I could run the alert generation stuff manually on your series
> > to see what the results would have been...
> 
> That would be totally AWESOME! Thanks!

Ok results here:

http://nbviewer.jupyter.org/url/people.mozilla.org/%7Ewlachance/autophone%20perfalerts.ipynb

The output of the alert generation algorithm is a "t value" indicating our confidence that there was a regression (that's the first value in each tuple in the output). To reduce the number of false positives we have to deal with, we set a pretty high threshold for this (7) by default.

It looks like using the default settings only the series `local-nytimes throbberstart second opt` (fc2622b7d1af66a1564ee76ed42f9e276e95669a) showed a regression. In other cases we weren't quite confident enough to be sure.

You can play with the software by installing the following in a virtualenv:

* treeherder-client
* ipython[notebook]
* pip install -e treeherder/treeherder/perfalert # local install of performance alert generator
Attached file alerts.py (obsolete) —
simple script to generate data for autophone.
Attachment #8748729 - Flags: feedback?(wlachance)
alerts.py mozilla-inbound 60 days
alert.py mozilla-beta 60 days
Two things come to mind:

1. It is not clear to me that the binned measurements provide any value.

2. Currently we are only reporting the first major digit of the Android version. So all 4.x.x is reported as 4.0, etc. It should report at least the first decimal, so 4.0.4->4.0, 4.2.2 -> 4.2, 4.4.2 > 4.4, etc.

Thoughts?
Attachment #8748729 - Attachment mime type: text/x-python → text/plain
Attached file alerts.py
fixed a hard coded 7 in the test on whether to print.
Attachment #8748729 - Attachment is obsolete: true
Attachment #8748729 - Flags: feedback?(wlachance)
Attachment #8748761 - Flags: feedback?(wlachance)
Comment on attachment 8748761 [details]
alerts.py

This looks good! At some point we should add a wrapper to perfherder client to access this API:

https://treeherder.mozilla.org/api/performance/framework/

It would be added to this file:

https://github.com/mozilla/treeherder/blob/master/treeherder/client/thclient/perfherder.py

If you'd be willing to file a bug and attach a PR, I'd be eternally grateful. :) Otherwise I'm sure I'll get to it eventually.
Attachment #8748761 - Flags: feedback?(wlachance) → feedback+
sure. will do.
final patch with binned measurements removed. carrying forward r+.
Attachment #8749761 - Flags: review+
Attached file phonedash-alerts.py
python alerts.py --host=treeherder.allizom.org --repo=mozilla-release --interval=one_year > /tmp/treeherder.allizom-mozilla-release.txt
python phonedash-alerts.py --phonedash-host=phonedash-dev.allizom.org --start=2016-01-01 --end=2016-05-06 --product=org.mozilla.firefox --repo=mozilla-release > /tmp/phonedash-dev-mozilla-release.txt
https://github.com/mozilla/autophone/commit/f870c0da3c2fe4b227f735a68c962061ffe99b9a
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Blocks: 1270946
deployed 2016-05-06 12:55
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: