1021842 - Talos: improve graphserver's formula for "final test result"

Reporter

Description

•

10 years ago

+++ This bug was initially created as a clone of Bug #1020677 +++

Bug 1020677 discusses 2 main changes to improve our regression detection for tests where each sub-test is independently meaningful (e.g. yes TART, not TP5):

1. Improved formula for processing the sub-results into a single number.
2. Regression detection which also takes individual sub-results into account.

This bug is about #1 above.

Summary of the current formula:

> - A talos run of a test consists of N executions of each sub-test (mostly N==25).
> - Take median for each N results per sub-test.
> - Ignore the highest of those medians.
> - Average the rest <-- the "final" number by which we calculate regressions
>                        and which graphserver displays at its graphs.

The new formula will:
- Use geometric mean instead of average,
- Without dropping the highest sub-result.

The formula is used explicitly in one place (when collecting the data, e.g. when talos sends it to graphserver) where graphserver then stores the result of this calculation, and it's also used implicitly at any later time when the graphs are displayed (using this stored pre-calculated value).

So, unfortunately, we can't just modify the formula and expect all the graphs to reflect it retroactively, because the calculation result is cached, and it's probably impractical to literally rewrite history cache using a new formula (though TBH, it might not be an absolutely terrible idea if we believe the new formula is better).

So, we'll approach this transition from the old to the new formula as follows:
- The old formula will keep producing results for the transition period.
- The new formula will produce results under a new temporary test name for this period.

This will allow us to catch regressions around the switch (using any of the formulas).

Once the transition period ends and we're happy with the new formula, we'll terminate the old formula, and will only report values using the new formula - with the _old_ names (so effectively also terminating the new temporary names). Such that whoever browses the graph doesn't have to switch to a different test (name), and instead may see only some "jump" in values when the formula switched.

We'll also add a dev.tree-management and wiki notice about tests which started using the new formula, and at which date.

Anything still missing?

Avi Halachmi (:avih)

Reporter

Comment 1

•

10 years ago

Joel, so re bug 1020677 comment 41 - 44, seems like nothing blocks us from going on with this. Correct?

How do you suggest to continue? I'd prefer if you take this bug, but if you can't, let me know and I'll try to handle it myself. Right now it seems we've decided to continue but.. nothing happens :)

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 2

•

10 years ago

Attached patch initial work done (0.5) — Details — Splinter Review

uploading my WIP patch, it requires a few more changes and a lot of local testing.  I won't get to this for at least a week, most likely 2 weeks.

Flags: needinfo?(jmaher)

Avi Halachmi (:avih)

Reporter

Comment 3

•

10 years ago

Comment on attachment 8448973 [details] [diff] [review]
initial work done (0.5)

Review of attachment 8448973 [details] [diff] [review]:
-----------------------------------------------------------------

I know it's a WIP patch, but wanted to check it out regardless.

What repo is it at (so I could compare the existing formula [insisible at the patch] to the new one)?

::: server/pyfomatic/collect.py
@@ +228,3 @@
>    _updateAverageForTestRun(average, databaseCursor, inputStream, metadata)
> +
> +  # I am concerned that this might fail and we never 

Accidentally cut line?

::: sql/data.sql
@@ +4570,4 @@
>  insert into tests values (NULL,"sessionrestore_no_auto_restore","Session Restore no Auto Restore Test",1,1,NULL);
>  
>  insert into tests values (NULL,"ts_paint_cold","TS Paint, Cold Load Time",1,1,NULL);
> +insert into tests values (NULL,"tp5o_scroll_paint","TP5 Scroll",1,1,15);

I'm guessing this unrelated to this bug?

Avi Halachmi (:avih)

Reporter

Comment 4

•

10 years ago

Pardon the (way too many) typos on comment 3...

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 5

•

10 years ago

here is the repo:
http://hg.mozilla.org/graphs

you can run it locally without too much trouble.  yes, the patch needs a lot of cleanup :)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 6

•

10 years ago

Attached patch add geomean field to db and populate it (1.0) (obsolete) — Details — Splinter Review

Assignee: nobody → jmaher

Status: NEW → ASSIGNED

Attachment #8499099 - Flags: review?(rhelmer)

Attachment #8499099 - Flags: review?(avihpit)

Avi Halachmi (:avih)

Reporter

Comment 7

•

10 years ago

Comment on attachment 8499099 [details] [diff] [review]
add geomean field to db and populate it (1.0)

Review of attachment 8499099 [details] [diff] [review]:
-----------------------------------------------------------------

Due to my lack of familiarity with Graph server internals, this is a f+ that it looks roughly right to me (I especially like the calculation part), but it shouldn't be considered a proper review.

Attachment #8499099 - Flags: review?(avihpit) → feedback+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 8

•

10 years ago

Attached patch modify api that graph.html uses to include geomean in addition to everything else (0.9) (obsolete) — Details — Splinter Review

this hasn't been tested, a few more code paths for me to explore.

Attachment #8499113 - Flags: feedback?(MattN+bmo)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 9

•

10 years ago

for the core geomean calculation in the first patch, we need to run this on the database:
alter table alerts add geomean FLOAT;

when we get r+, I will request this be done on the graph server db's.  :MattN, this is something you will need to run manually on your local database.

Robert Helmer [:rhelmer]

Comment 10

•

10 years ago

Comment on attachment 8499099 [details] [diff] [review]
add geomean field to db and populate it (1.0)

Review of attachment 8499099 [details] [diff] [review]:
-----------------------------------------------------------------

Unless I am missing something,

::: server/pyfomatic/collect.py
@@ +229,5 @@
> +
> +  try:
> +    databaseCursor.execute("""select exp(sum(log(value))/count(value)) from test_run_values where test_run_id = %s""", (metadata.test_run_id))
> +    geomean = databaseCursor.fetchall()[0][0]
> +    if geomean is None:average

Is "average" supposed to be there ^

@@ +234,5 @@
> +      geomean = values[1]
> +    _updateGeoMeanForTestRun(geomean, databaseCursor, inputStream, metadata)
> +  except Exception, x:
> +    # we don't want to rollback or raise an exception, this should be passively running in the background
> +    pass

Silently eating exceptions makes me sad, although the comment is reasonable. Maybe we could log something?

Attachment #8499099 - Flags: review?(rhelmer) → review-

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 11

•

10 years ago

Attached patch add geomean field to db and populate it (1.1) — Details — Splinter Review

oops- there was a cut/paste nugget in there :(

I have cleaned up it, and added print statements.  This is running in the context of a webserver, so I assume these print statements show up in the webserver log somewhere?

Attachment #8499099 - Attachment is obsolete: true

Attachment #8499136 - Flags: review?(rhelmer)

Robert Helmer [:rhelmer]

Comment 12

•

10 years ago

Comment on attachment 8499136 [details] [diff] [review]
add geomean field to db and populate it (1.1)

Review of attachment 8499136 [details] [diff] [review]:
-----------------------------------------------------------------

Hmm so print might not work - there isn't really any logging going on in here already is there :/

This is running under WSGI now so may not work if something is printed to stdout, I will test.

Robert Helmer [:rhelmer]

Comment 13

•

10 years ago

Comment on attachment 8499136 [details] [diff] [review]
add geomean field to db and populate it (1.1)

lgtm, tested print on the dev server and goes to Apache error log, which seems fine.

Attachment #8499136 - Flags: review?(rhelmer) → review+

Matthew N. [:MattN]

Updated

•

10 years ago

Attachment #8499113 - Attachment is patch: true

Attachment #8499113 - Attachment mime type: text/html → text/plain

Matthew N. [:MattN]

Comment 14

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #9)
> alter table alerts add geomean FLOAT;

Note that the correct table is test_runs:

alter table test_runs add geomean FLOAT;

Matthew N. [:MattN]

Comment 15

•

10 years ago

Comment on attachment 8499136 [details] [diff] [review]
add geomean field to db and populate it (1.1)

While testing this patch I got the following error upon upload of a TART result:

  Exception: to determine geomean from 'test_run_values' for  1369 - 'long' 
  object is not iterable

Matthew N. [:MattN]

Comment 16

•

10 years ago

Comment on attachment 8499136 [details] [diff] [review]
add geomean field to db and populate it (1.1)

Review of attachment 8499136 [details] [diff] [review]:
-----------------------------------------------------------------

::: server/pyfomatic/collect.py
@@ +228,4 @@
>    _updateAverageForTestRun(average, databaseCursor, inputStream, metadata)
> +
> +  try:
> +    databaseCursor.execute("""select exp(sum(log(value))/count(value)) from test_run_values where test_run_id = %s""", (metadata.test_run_id))

(In reply to Matthew N. [:MattN] from comment #15)
> 'long' object is not iterable

This is from a missing comma after "metadata.test_run_id". The argument should be a tuple like:
(metadata.test_run_id,)

With that the patch works on my server :)

Attachment #8499136 - Flags: feedback+

Matthew N. [:MattN]

Comment 17

•

10 years ago

Comment on attachment 8499113 [details] [diff] [review]
modify api that graph.html uses to include geomean in addition to everything else (0.9)

Review of attachment 8499113 [details] [diff] [review]:
-----------------------------------------------------------------

I have this being used on my GS instance when "?geo" is the query string: http://graphs.mattn.ca/graph-local.html?geo

I didn't test all of the affected endpoints and I didn't check the math by hand.

::: server/api.py
@@ +322,5 @@
>              for changeset, total in averages.iteritems())
> +
> +        geomeans = dict(
> +            (changeset, total / geo_totals[changeset])
> +            for changeset, total in geomeans.iteritems())

OK, so this is averaging the geometric means for a specific changeset but it's just confusing because geo_totals actually contains a count of the number of geometrics means for a changeset, not a total as the name says. num_geos would be easier to understand IMO. Not sure if you want to change that for none, one, or both of the cases (avg & geo).

@@ +328,5 @@
>                    'averages': averages,
>                    'min': min(row['average'] for row in rows),
>                    'max': max(row['average'] for row in rows),
> +                  'geomeans': geomeans,
> +                  'mingeo': min(row['geomean'] for row in rows),

Not sure how much of a problem this is but "mingeo" ends up being "null" since there are old rows where mingeo wasn't calculated yet. It would be nice for it to exclude null.

Attachment #8499113 - Flags: feedback?(MattN+bmo) → feedback+

Matthew N. [:MattN]

Updated

•

10 years ago

Blocks: 1077177

Matthew N. [:MattN]

Comment 18

•

10 years ago

I tested that the following two endpoints show the data:
http://graphs.mattn.ca/api/test/runs/revisions?revision=b6c2deadbeef (used by compare-talos)
http://graphs.mattn.ca/api/test/runs?id=293&branchid=1&platformid=13 (used by graphs)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

10 years ago

Depends on: 1078272

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 19

•

10 years ago

landed this backend change:
http://hg.mozilla.org/graphs/rev/77bdf53fae9f

waiting for the database to be updated before we make this live.  Next up is getting the api changes landed.

Avi Halachmi (:avih)

Reporter

Comment 20

•

10 years ago

Do we have a bug to backfill the new column with the new calculation result? such that we could examine 2-3+ months old data with the new calculation?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 21

•

10 years ago

I was not aware of doing that, I was under the impression this plan was to make this possible and collect it going forward.  What benefit would we get from going back in time?  If you think it is beneficial, please file a bug and outline what needs to be done.

Avi Halachmi (:avih)

Reporter

Comment 22

•

10 years ago

Bug 1020677 comment 41 - 44 suggested that we should just switch the formula, at which case back-filling (practically - rewriting history) would probably a bad idea.

However, the way which was apparently chosen to go forward, implied by the patch at this bug, is that we add a new column for the new formula calculation, at which case this column would be empty for older results, at which case I think it should be possible to back-fill it.

If we think that the new formula is better, and I think we do, then being able to watch graphs over few months with the new formula would be better than watching them with the old formula.

So I guess the first question is "would it be possible to back-fill the new column of the new formula?"

If yes (and I _think_ it should be a yes), then I'll file a bug for it.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 23

•

10 years ago

yeah, upon further thought and your comment, we really should backfill as much as possible.  I suspect it just requires us writing a script to query the data for the page values, then calculate/insert the geomean.

That might take some fiddling with, but it should be possible.  I suspect a custom script that we could run would be the ideal thing to do.

Avi Halachmi (:avih)

Reporter

Comment 24

•

10 years ago

Comment on attachment 8499136 [details] [diff] [review]
add geomean field to db and populate it (1.1)

Review of attachment 8499136 [details] [diff] [review]:
-----------------------------------------------------------------

::: server/pyfomatic/collect.py
@@ +228,4 @@
>    _updateAverageForTestRun(average, databaseCursor, inputStream, metadata)
> +
> +  try:
> +    databaseCursor.execute("""select exp(sum(log(value))/count(value)) from test_run_values where test_run_id = %s""", (metadata.test_run_id))

So what happens if value is 0? log(value) would be -INFINITY, which is probably undesirable (and in a simpler calculation of Nth root of the product, the product would be 0).

I think some (sub)tests do report 0 on cases of error, and while we do want it to affect the result, I don't think we want it to null-ify it.

In the simplified calculation, I'd say just use Nth root of the product of N-1 values (if it only had a single 0 sub-result) which would be the equivalent of averaging N values where one of them is 0, though I'm not sure how to approach it with the formula which you used at this patch.

Avi Halachmi (:avih)

Reporter

Comment 25

•

10 years ago

(In reply to Avi Halachmi (:avih) from comment #24)
>
> So what happens if value is 0? log(value) would be -INFINITY, which is
> probably undesirable (and in a simpler calculation of Nth root of the
> product, the product would be 0).


Hmm.. I think there's a simple workaround for this issue, and which would also solve the general issue of values lower than 1 (which I don't think we have right now, but it's not impossible to have them).

How about just do ... sum(log(value + 1))/count(value) ... ?

Such change would still keep the relative results the same, would hardly affect the absolute values (since most of them are considerably higher than 1 anyway) and will solve the 0 issue and the general "smaller than 1" issue.

Joel?

Avi Halachmi (:avih)

Reporter

Comment 26

•

10 years ago

Maybe also subtract 1 from the final result (because it's a mean of each value plus 1) and clamping it at 0 if it's negative?

Kyle Lahnakoski [:ekyle]

Comment 27

•

10 years ago

I suggest ignoring the zeros, and using absolute value, just in case:

> select 
>     exp(sum(log(abs(value)))/count(value)) 
> from 
>     test_run_values 
> where 
>     test_run_id = %s""" and
>     abs(value) > 0

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 28

•

10 years ago

Kyle, what would happen if we selected nothing?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 29

•

10 years ago

Attached patch modify api that graph.html uses to include geomean in addition to everything else (1.0) (obsolete) — Details — Splinter Review

Matt, thanks for the comment here, I have updated the patch to make the variable names more useful and to default to a value of 0 if we find a non float type.

Avi, for geomeans I am really taking an average of the geomeans if we end up with >1 value for a given build/test/platform/branch combo.  Do you think that is correct?

Attachment #8499113 - Attachment is obsolete: true

Attachment #8500664 - Flags: review?(MattN+bmo)

Flags: needinfo?(avihpit)

Matthew N. [:MattN]

Updated

•

10 years ago

Attachment #8500664 - Flags: review?(rhelmer)

Avi Halachmi (:avih)

Reporter

Comment 30

•

10 years ago

(In reply to Kyle Lahnakoski [:ekyle] from comment #27)
> I suggest ignoring the zeros, and using absolute value, just in case:
> 
> > select 
> >     exp(sum(log(abs(value)))/count(value)) 
> > from 
> >     test_run_values 
> > where 
> >     test_run_id = %s""" and
> >     abs(value) > 0

This would completely ignore the 0's at the geomean calculation. However, I'd think that preferably having 0's at some sub tests would also be reflected at the summary/average/geomean.

The average already reflects it by including 0 at the average (so the average will be lower), my suggestion of adding 1 to each value and then subtracting 1 from the sum would be the equivalent when doing a geomean, I think. But if that's hard to define in SQL syntax, then your suggestion is still better than the patch prior to comment 29. Thanks.

(In reply to Joel Maher (:jmaher) from comment #29)
> ...
> Avi, for geomeans I am really taking an average of the geomeans if we end up
> with >1 value for a given build/test/platform/branch combo.  Do you think
> that is correct?

Not sure I follow you. geomean _is_ the replacement for the average at the new calculation, what does "average of geomeans" mean?

Also, I really liked the previous calculation, how's the new one better? (other than also dealing with 0's - which I think we could also have done with the previous calculation instead of replacing it altogether).

Flags: needinfo?(avihpit)

Kyle Lahnakoski [:ekyle]

Comment 31

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #28)
> Kyle, what would happen if we selected nothing?

The result would be NULL (well, not even calculated).  If there are no values, or all values are zero, then geomean has no clear definition.  Alternatively, I would define geomean([])=1, just for the sake of logical consistency.

Avi Halachmi (:avih)

Reporter

Comment 32

•

10 years ago

(In reply to Avi Halachmi (:avih) from comment #30)
> (In reply to Kyle Lahnakoski [:ekyle] from comment #27)
> > I suggest ignoring the zeros, and using absolute value, just in case:
> > 
> > > select 
> > >     exp(sum(log(abs(value)))/count(value)) 
> > > from 
> > >     test_run_values 
> > > where 
> > >     test_run_id = %s""" and
> > >     abs(value) > 0
> 
> This would completely ignore the 0's at the geomean calculation. However,
> I'd think that preferably having 0's at some sub tests would also be
> reflected at the summary/average/geomean.
> ...


I think there is actually a case where we'd like to completely ignore the value - if there's no value (i.e. not even 0). We could get this case if we cancel a page/sub-test, so that its value isn't sent anymore to graphserver, but does exist for test runs prior to removing this page. On such case we should completely ignore it at any calculation (be it average, geomean, or anything else) where it isn't available.

Joel, What happens now if we cancel a page, e.g. like we removed alipay.com from tp5o at bug 1026869 ? is it taken into account with the current average calculation for tp5o?

Kyle, what about comment 30 WRT the inclusion of 0's into the geomean to indicate a difference at the summary as well?

Flags: needinfo?(klahnakoski)

Flags: needinfo?(jmaher)

Kyle Lahnakoski [:ekyle]

Comment 33

•

10 years ago

In reply to Avi Halachmi (:avih) from comment #30)
> (In reply to Kyle Lahnakoski [:ekyle] from comment #27)
> > > select 
> > >     exp(sum(log(abs(value)))/count(value)) 
> > > from 
> > >     test_run_values 
> > > where 
> > >     test_run_id = %s""" and
> > >     abs(value) > 0
> 
> This would completely ignore the 0's at the geomean calculation. However,
> I'd think that preferably having 0's at some sub tests would also be
> reflected at the summary/average/geomean.

Avi, 

I am concerned your log(v+1) proposal will swamp legitimate performance values between 0 and 1.  I do not know if that even happens, so maybe my concern is unwarranted.

Ignoring zeros can be good, especially if zero is being used to indicate failure.  Furthermore, the ignorance will impact the overall geomean[1]; which means the introduction of zeros will be detectable.  Finally, the v8 test suite uses geomean: using log(v+1) we loose the ability to compare.


[1] except in the case when the ignored metric is very close to 1

Flags: needinfo?(klahnakoski)

Robert Helmer [:rhelmer]

Comment 34

•

10 years ago

Comment on attachment 8500664 [details] [diff] [review]
modify api that graph.html uses to include geomean in addition to everything else (1.0)

Review of attachment 8500664 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable to me, I am not set up to test at the moment. I wish we had tests on this codebase :/

Attachment #8500664 - Flags: review?(rhelmer) → review+

Avi Halachmi (:avih)

Reporter

Comment 35

•

10 years ago

(In reply to Kyle Lahnakoski [:ekyle] from comment #33)
> I am concerned your log(v+1) proposal will swamp legitimate performance
> values between 0 and 1.  I do not know if that even happens, so maybe my
> concern is unwarranted.

What does "swamp" mean/imply here? could you please provide sample set of numbers which would exhibit this issue with the (v+1) calculation? Also note than apart from adding 1 to each value, my suggestion also includes subtracting one from the final result (comment 26).

> Ignoring zeros can be good, especially if zero is being used to indicate
> failure.  Furthermore, the ignorance will impact the overall geomean[1];
> which means the introduction of zeros will be detectable.

Indeed ignoring 0's will impact the overall, but IMO less so and less deterministically so than including an unlikely low value into the calculation (such as 0 into an average - which we do now already).

> Finally, the v8
> test suite uses geomean: using log(v+1) we loose the ability to compare.

Not sure I understand this. Would you be able to provide a sample set of results which would get hurt from using the v+1 ... -1 calculation?

Matthew N. [:MattN]

Comment 36

•

10 years ago

Comment on attachment 8500664 [details] [diff] [review]
modify api that graph.html uses to include geomean in addition to everything else (1.0)

Flag me for review again once we've settled on the formula and I can test.

Attachment #8500664 - Flags: review?(MattN+bmo)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 37

•

10 years ago

Kyle,

I want to get this resolved, can you help clarify your concerns with avi's approach of:
exp(sum(log(abs(value + 1)))/count(value)) - 1

Flags: needinfo?(jmaher) → needinfo?(klahnakoski)

Kyle Lahnakoski [:ekyle]

Comment 38

•

10 years ago

Avi,

(In reply to Avi Halachmi (:avih) from comment #35)
> What does "swamp" mean/imply here?

My apologies upfront about being pedantic here.  A geometric argument would be easier to digest.

Let
    F0(v)=exp(sum(log(vi))/n)
    F1(v)=exp(sum(log(vi+1))/n)-1
where
    v = [v1, v2, v3, ..., vn]

Consider a performance change of E in one of the statistics, such that v1' = (1+E)*v1
and v' = [v1', v2, v3, ..., vn]


F1(v') = exp(sum(log(vi'+1))/n)
F1(v') = exp([log(v1'+1) + log(v2+1) + ... + log(vn+1)]/n)
F1(v') = exp([log((1+E)*v1 + 1) + log(v2+1) + ... + log(vn+1)]/n)
F1(v') = exp([log((1+E)*v1 + 1) - log(v1+1) + log(v1+1) + log(v2+1) + ... + log(vn+1)]/n)
F1(v') = exp([log((1+E)*v1 + 1) - log(v1+1) + sum(log(vi+1))]/n)
F1(v') = exp(log((1+E)*v1 + 1)/n - log(v1+1)/n + sum(log(vi+1))/n)
F1(v') = exp(log((1+E)*v1 + 1)/n - log(v1+1)/n) * exp(sum(log(vi+1))/n))
F1(v') = exp(log((1+E)*v1 + 1) - log(v1+1))^(1/n) * F1(v)
F1(v') = ((1+E)*v1 + 1)/(v1+1))^(1/n) * F1(v)
F1(v') = ((v1+1 + v1*E)/(v1+1))^(1/n) * F1(v)

We will use Taylor's theorem near v1 to approximate the slope wrt E:

dF1(v)/dE = (1/n)*(((v1+1 + v1*E)/(v1+1))^((1/n)-1))*v1/(v1+1) * F1(v)
dF1(v)/dE (at E=0) = (1/n)*(1^((1/n)-1))*v1/(v1+1) * F1(v)
dF1(v)/dE (at E=0) = (1/n) * v1/(v1+1) * F1(v)

F1(v') = F1(v) + (1/n) * v1/(v1+1) * F1(v) * E  (at E near 0)

We look at F1(v') - F1(v) to emphasize our slope:

F1(v') - F1(v) = (1/n) * v1/(v1+1) * F1(v) * E  (at E near 0)
F1(v') - F1(v) = v1/(v1+1) * (F1(v) * E)/n  (at E near 0)

Using similar logic, compare this to behaviour of F0 near E=0:

F0(v') - F0(v) = (F0(v) * E)/n  (at E near 0)

The F1 function has an extra v1/(v1+1) term; for large values this is near 1, but for small values this number is small, and makes error detection more difficult.

F1(v') - F1(v) = v1/(v1+1) * (F0(v') - F0(v))  (at E near 0)


Suppose we have a statistic around 0.5, and it changes by 10%:  How much does F1 change? 

F1(v') - F1(v) = 0.5/(0.5+1) * (F0(v') - F0(v))  (at E near 0)
F1(v') - F1(v) = 1/3 * (F0(v') - F0(v))  (at E near 0)

Which is 1/3 less sensitive than F0.  And smaller values are worse, of course.

This is what I meant by "swamp".  All this is mute if performance values are > 1.

Flags: needinfo?(klahnakoski)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 39

•

10 years ago

thanks Kyle.  what formula do you use in dzalerts?

Kyle Lahnakoski [:ekyle]

Comment 40

•

10 years ago

Avi, 

How does F1 compare to F0 in the face of zeros? We start with F1, and compare the new value to the old:

F1(v')/F1(v) = prod(vi'+1)^(1/n) / prod(vi+1)^(1/n)
F1(v')/F1(v) = (prod(vi'+1)/prod(vi+1))^(1/n)
F1(v')/F1(v) = [(0+1)*(v2+1)*(v3+1)*...*(vn+1) / ((v1+1)*(v2+1)*(v3+1)*...*(vn+1))]^(1/n)
F1(v')/F1(v) = [(0+1)/(v1+1)]^(1/n)
F1(v')/F1(v) = [1/(v1+1)]^(1/n)

Same for F0, which ignores the zero (v1' = 0):

F0(v')/F0(v) = [v2 * v3 * ... * vn]^(1/(n-1)) / [v1 * v2 * v3 * ... * vn]^(1/n)
F0(v')/F0(v) = [v2 * v3 * ... * vn]^(n/n(n-1)) / [v1 * v2 * v3 * ... * vn]^((n-1)/n(n-1))
F0(v')/F0(v) = [[v2 * v3 * ... * vn]^n / [v1 * v2 * v3 * ... * vn]^(n-1)]^(1/(n(n-1)))
F0(v')/F0(v) = [[v2 * v3 * ... * vn] / v1^(n-1)]^(1/(n(n-1)))
F0(v')/F0(v) = [v2 * v3 * ... * vn]^(1/(n(n-1))) / v1^(1/n)
F0(v')/F0(v) = (1/v1)^(1/n) * F0(v')^(1/n)


Let K1 = F1(v')/F1(v) 
Let K0 = F0(v')/F0(v)

When is K1 > K0 ?

<=> [1/(v1+1)]^(1/n) > (1/v1)^(1/n) * F0(v')^(1/n)
<=> [1/(v1+1)]^(1/n) / (1/v1)^(1/n) > F0(v')^(1/n) 
<=> [(1/(v1+1)) / (1/v1)]^(1/n) > F0(v')^(1/n) 
<=> [v1/(v1+1)]^(1/n) > F0(v')^(1/n)
<=> [v1/(v1+1)] > F0(v')

If performance values less than 1 do not exist: Then F0(v') > 1, and K1 < K0, and we can conclude F0 is a better zero indicator than F1.  If performance values less than 1 do exist, then K1 could perform better given the condition above.  The difference between F0 and F1 is probably inconsequential in practice due to high vi>>1 values and n>3.  

When F0 and F1 do differ, we must also consider the noise is similarly different.  An analysis of how this affects confidence must be done to be sure F0 still performs better than F1.

In conclusion, I like F0 for its adherence to definition, and its proper treatment of performance values less than one.

Kyle Lahnakoski [:ekyle]

Comment 41

•

10 years ago

Joel,

I use the formal definition of geomean: F0

Kyle Lahnakoski [:ekyle]

Comment 42

•

10 years ago

(In reply to Avi Halachmi (:avih) from comment #35)
> (In reply to Kyle Lahnakoski [:ekyle] from comment #33)
> > Finally, the v8
> > test suite uses geomean: using log(v+1) we loose the ability to compare.
> 
> Not sure I understand this. Would you be able to provide a sample set of
> results which would get hurt from using the v+1 ... -1 calculation?

All I am saying here is that the V8 performance statistic uses the formal definition of geomean to combine the test results.  If we use some other aggregate metric, like F1, then we will not be able to compare the official V8 statistic to the one we see in graph server.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 43

•

10 years ago

I thought I had implemented the formal definition to start with.  Here is what I have in talos:
http://hg.mozilla.org/build/talos/file/b2a3d0493173/talos/filter.py#l61

but we have questions about values <1.

Kyle, can you link me to the formal definition you are using?

Flags: needinfo?(klahnakoski)

Avi Halachmi (:avih)

Reporter

Comment 44

•

10 years ago

(In reply to Kyle Lahnakoski [:ekyle] from comment #38)
> Avi,
> 
> (In reply to Avi Halachmi (:avih) from comment #35)
> > What does "swamp" mean/imply here?
> 
> My apologies upfront about being pedantic here.  A geometric argument would
> be easier to digest.

Pedantic is appreciated, I just didn't understand what swamp imply :) and now I do - it compresses the dynamic range slightly more than the "pure" geometric mean formula, and this compression increases the closer we are to 0.

I also never claimed that the two calculations are equivalent - they're clearly not.

However, consider these two "sets of results" which are undeniably much closer to 0 than to 1 and therefore should "suffer" from being less sensitive to detect regression between them with the +/- 1 variant:

var set1 = [.01, .02, .03, .04, .05];
var set2 =      [.02, .03, .04, .05, .06];

Their averages are 0.03 and 0.04 respectively.
"proper" geometric means are 0.026 and 0.037
geometric means with the +1/-1 variant are 0.0299 and 0.0399

regression from set1 to set2:

Using average: 33.33%
Using proper geometric mean: 43.1%
Using +1/-1 geometric mean: 33.4%

The +/-1 version is indeed less sensitive than the proper geomean.

However, even for such small values, the +1/-1 geomean regression calculations will still be more sensitive than the average-based regression calculation which we've been using till now. So compared to now, I don't think it will hurt us.

> This is what I meant by "swamp".  All this is mute if performance values are
> > 1.

I've never seen results smaller than 1. Maybe Joel did?

(In reply to Kyle Lahnakoski [:ekyle] from comment #40)
> ...
> In conclusion, I like F0 for its adherence to definition, and its proper
> treatment of performance values less than one.

I don't disagree with the preference to choose the purest form available. But I'm trying to apply a practical approach where this specific deviation from "purest" might have more pros than cons, or that in practice the cons are negligible, as I _think_ they are.

(In reply to Kyle Lahnakoski [:ekyle] from comment #42)
> All I am saying here is that the V8 performance statistic uses the formal
> definition of geomean to combine the test results.  If we use some other
> aggregate metric, like F1, then we will not be able to compare the official
> V8 statistic to the one we see in graph server.

I still don't get it. We've been using averages till now, and it never bothered us. Surely geomean +/-1 would be still better than averages?

Also, the V8 numbers are more than 15k, the +/-1 variant difference compared to "pure" version would be less than negligible.

Kyle Lahnakoski [:ekyle]

Comment 45

•

10 years ago

Joel, 

The code has implemented the formal definition.

Flags: needinfo?(klahnakoski)

Kyle Lahnakoski [:ekyle]

Comment 46

•

10 years ago

Avi, 

We are in agreement.

(In reply to Avi Halachmi (:avih) from comment #44)
> (In reply to Kyle Lahnakoski [:ekyle] from comment #42)
> > All I am saying here is that the V8 performance statistic uses the formal
> > definition of geomean to combine the test results.  If we use some other
> > aggregate metric, like F1, then we will not be able to compare the official
> > V8 statistic to the one we see in graph server.
> 
> I still don't get it. We've been using averages till now, and it never
> bothered us. Surely geomean +/-1 would be still better than averages?

Using averages was wrong, and geomean +/-1 is way better.  Also geomean +/-1 is negligibly different from the geomean for the purposes of regression detection given the >>1 performance values we see.

Kyle Lahnakoski [:ekyle]

Comment 47

•

10 years ago

(In reply to Kyle Lahnakoski [:ekyle] from comment #45)
> Joel, 
> 
> The code has implemented the formal definition.

well, except for it's treatment of zeros, where it is undefined :)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 48

•

10 years ago

Attached patch reduce the chance of values = 0 (1.0) — Details — Splinter Review

Attachment #8503345 - Flags: review?(avihpit)

Avi Halachmi (:avih)

Reporter

Comment 49

•

10 years ago

Comment on attachment 8503345 [details] [diff] [review]
reduce the chance of values = 0 (1.0)

Review of attachment 8503345 [details] [diff] [review]:
-----------------------------------------------------------------

w00t!

Attachment #8503345 - Flags: review?(avihpit) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 50

•

10 years ago

https://hg.mozilla.org/graphs/rev/a1977105ed7c

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 51

•

10 years ago

Attached patch modify api that graph.html uses to include geomean in addition to everything else (1.1) (obsolete) — Details — Splinter Review

slight adjustment to the script, accounting for your previous feedback and double checking the code.

Going from the definitions listed here:
https://wiki.mozilla.org/Perfomatic:API#Get_test_run_information

We have 3 cases where the average is displayed and we should also display the geometric mean:
run info:
desc: https://wiki.mozilla.org/Perfomatic:API#Get_test_run_information
ex: http://graphs.mozilla.org/api/test/runs/info/?id=3794718 (broken)

test runs:
desc: https://wiki.mozilla.org/Perfomatic:API#Get_test_runs_.28data.29
ex: http://graphs.mozilla.org/api/test/runs?id=223&branchid=131&platformid=31&days=7

by revision:
desc: https://wiki.mozilla.org/Perfomatic:API#Get_test_data_by_revision
ex: http://graphs.mozilla.org/api/test/runs/revisions?revision=10d2046d2b64&revision=d16525937c8b

going through the code for these 3 cases, I believe I have properly handled added geomean to the api.

Attachment #8500664 - Attachment is obsolete: true

Attachment #8503400 - Flags: review?(MattN+bmo)

Matthew N. [:MattN]

Updated

•

10 years ago

Attachment #8503400 - Attachment is patch: true

Attachment #8503400 - Attachment mime type: text/html → text/plain

Matthew N. [:MattN]

Comment 52

•

10 years ago

My server is now updated with the landed patch and attachment 8503400 [details] [diff] [review]. Testing now.

Matthew N. [:MattN]

Comment 53

•

10 years ago

The mingeo and/or maxgeo line with the min/max is causing a 500 error on the server. I think it's not returning a number in some cases.

Matthew N. [:MattN]

Comment 54

•

10 years ago

Comment on attachment 8503400 [details] [diff] [review]
modify api that graph.html uses to include geomean in addition to everything else (1.1)

Review of attachment 8503400 [details] [diff] [review]:
-----------------------------------------------------------------

::: server/api.py
@@ +329,5 @@
>                    'min': min(row['average'] for row in rows),
>                    'max': max(row['average'] for row in rows),
> +                  'geomeans': geomeans,
> +                  'mingeo': min(type(row['geomean']) if type(row['geomean']) == type(1.0) else 0.0 for row in rows),
> +                  'maxgeo': max(type(row['geomean']) if type(row['geomean']) == type(1.0) else 0.0 for row in rows),

I think the first |type(…)| on each of these lines is wrong but I'm not expert in Python. It ends up returning the type instead of the value.

Attachment #8503400 - Flags: review?(MattN+bmo) → review-

Robert Helmer [:rhelmer]

Comment 55

•

10 years ago

(In reply to Matthew N. [:MattN] (behind on reviews - focused on Loop) from comment #54)
> Comment on attachment 8503400 [details] [diff] [review]
> modify api that graph.html uses to include geomean in addition to everything
> else (1.1)
> 
> Review of attachment 8503400 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: server/api.py
> @@ +329,5 @@
> >                    'min': min(row['average'] for row in rows),
> >                    'max': max(row['average'] for row in rows),
> > +                  'geomeans': geomeans,
> > +                  'mingeo': min(type(row['geomean']) if type(row['geomean']) == type(1.0) else 0.0 for row in rows),
> > +                  'maxgeo': max(type(row['geomean']) if type(row['geomean']) == type(1.0) else 0.0 for row in rows),
> 
> I think the first |type(…)| on each of these lines is wrong but I'm not
> expert in Python. It ends up returning the type instead of the value.

You're right... comparing type(row['geomean']) == type(1.0) is a bit confusing too, that should probably just be e.g.:

'mingeo': min(row['geomean'] if type(row['geomean']) == float else 0.0 for row in rows),

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 56

•

10 years ago

ah:
'mingeo': min(row['geomean'] if isinstance(x, float) else 0.0 for row in rows),


can we assume all values are float?  I assume so, but let me double check.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 57

•

10 years ago

Attached patch modify api that graph.html uses to include geomean in addition to everything else (2.0) — Details — Splinter Review

updated patch to check for float|int.  Tested with mocked up python locally.

Attachment #8503400 - Attachment is obsolete: true

Attachment #8504277 - Flags: review?(rhelmer)

Attachment #8504277 - Flags: review?(MattN+bmo)

Matthew N. [:MattN]

Comment 58

•

10 years ago

Comment on attachment 8504277 [details] [diff] [review]
modify api that graph.html uses to include geomean in addition to everything else (2.0)

Review of attachment 8504277 [details] [diff] [review]:
-----------------------------------------------------------------

::: server/api.py
@@ +329,5 @@
>                    'min': min(row['average'] for row in rows),
>                    'max': max(row['average'] for row in rows),
> +                  'geomeans': geomeans,
> +                  'mingeo': min(row['geomean'] if isinstance(row['geomean'], float) or isinstance(row['geomean'], int) else 0.0 for row in rows),
> +                  'maxgeo': max(row['geomean'] if isinstance(row['geomean'], float) or isinstance(row['geomean'], int) else 0.0 for row in rows),

This seems to work. I would have preferred just checking for the None/Null case (however that gets represented) instead of float/int but this is better than nothing.

Attachment #8504277 - Flags: review?(MattN+bmo) → review+

Robert Helmer [:rhelmer]

Comment 59

•

10 years ago

Comment on attachment 8504277 [details] [diff] [review]
modify api that graph.html uses to include geomean in addition to everything else (2.0)

Review of attachment 8504277 [details] [diff] [review]:
-----------------------------------------------------------------

::: server/api.py
@@ +329,5 @@
>                    'min': min(row['average'] for row in rows),
>                    'max': max(row['average'] for row in rows),
> +                  'geomeans': geomeans,
> +                  'mingeo': min(row['geomean'] if isinstance(row['geomean'], float) or isinstance(row['geomean'], int) else 0.0 for row in rows),
> +                  'maxgeo': max(row['geomean'] if isinstance(row['geomean'], float) or isinstance(row['geomean'], int) else 0.0 for row in rows),

Hm so this could be coerced to float and have a None check instead:

'mingeo': min(float(row['geomean']) if not isinstance(row['geomean'], None) else 0.0 for row in rows),

That will raise an exception if something is in there that can't be coerced to float. If this is never expected to happen then raising here might be a good thing.

Attachment #8504277 - Flags: review?(rhelmer) → review+

Robert Helmer [:rhelmer]

Comment 60

•

10 years ago

(In reply to Robert Helmer [:rhelmer] from comment #59)
> Comment on attachment 8504277 [details] [diff] [review]
> modify api that graph.html uses to include geomean in addition to everything
> else (2.0)
> 
> Review of attachment 8504277 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: server/api.py
> @@ +329,5 @@
> >                    'min': min(row['average'] for row in rows),
> >                    'max': max(row['average'] for row in rows),
> > +                  'geomeans': geomeans,
> > +                  'mingeo': min(row['geomean'] if isinstance(row['geomean'], float) or isinstance(row['geomean'], int) else 0.0 for row in rows),
> > +                  'maxgeo': max(row['geomean'] if isinstance(row['geomean'], float) or isinstance(row['geomean'], int) else 0.0 for row in rows),
> 
> Hm so this could be coerced to float and have a None check instead:
> 
> 'mingeo': min(float(row['geomean']) if not isinstance(row['geomean'], None)
> else 0.0 for row in rows),
> 
> That will raise an exception if something is in there that can't be coerced
> to float. If this is never expected to happen then raising here might be a
> good thing.

(maybe I should say "cast" instead of "coerce" since it's explicit)

Anyway that's more of a response to comment 58, you can take it or leave it I think the code looks OK to go in as-is. Thanks to MattN for testing this.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 61

•

10 years ago

https://hg.mozilla.org/graphs/rev/f8c4ec8f9ac9

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 62

•

10 years ago

:rhelmer,  now that we have the backend in place, can we get this deployed and started on collected data.  We can view the results from a local version of graph.html.  Should we file a bug for deployment?  do you have a suggested path forward?

Flags: needinfo?(rhelmer)

Robert Helmer [:rhelmer]

Comment 63

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #62)
> :rhelmer,  now that we have the backend in place, can we get this deployed
> and started on collected data.  We can view the results from a local version
> of graph.html.  Should we file a bug for deployment?  do you have a
> suggested path forward?

I filed bug 1082694 to track this and will work on it now.

Flags: needinfo?(rhelmer)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 64

•

10 years ago

so here is what I believe is left to do:
* look at the data now that we have some and verify it is what we want
* deploy a UI change to production that will view this data
* adjust the alerting all to reference the geomean instead of average field

am I missing anything?

Matthew N. [:MattN]

Comment 65

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #64)
> * deploy a UI change to production that will view this data

That's bug 1077177 for graph server btw.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 66

•

10 years ago

after a brief chat on irc, the next steps are:
* send geometric mean alerts to dev.tree-alerts (the new newsgroup)
* continue to send average alerts to dev.tree-management
* after a week or two to ensure all is well, disable average alerts to dev.tree-management

This involves some assumptions, but all very doable:
* analyze_talos.py has enough cpu bandwidth to run two passes
* we are fine not emailing patch authors twice (only email on 'average' alerts for now)
* someone will look at both sources.

The problem becomes, what do we do if we find alerts that show up in 'average' only and not in 'geomean'?  Will we be fine ignoring those?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 67

•

10 years ago

Attached patch update scripts / tooling to support geomean (1.0) (obsolete) — Details — Splinter Review

I am not sure how to modify the analysis.cfg.template.  I would like to make sure that we have:
# Who should emails be sent to
regression_emails = dev.tree-management@lists.mozilla.org
geomean_regression_emails = dev.tree-alerts@lists.mozilla.org

I honestly have no idea what value is in regression_emails, so figuring out what is there would be a good idea.

Attachment #8526144 - Flags: review?(rhelmer)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

10 years ago

Blocks: 1100654

Robert Helmer [:rhelmer]

Comment 68

•

10 years ago

Comment on attachment 8526144 [details] [diff] [review]
update scripts / tooling to support geomean (1.0)

I don't actually know where this tool is run... IIRC catlee has the analysis tool running somewhere.

Flags: needinfo?(catlee)

Robert Helmer [:rhelmer]

Updated

•

10 years ago

Attachment #8526144 - Flags: review?(rhelmer) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 69

•

10 years ago

catlee, can you answer this question:
I am not sure how to modify the analysis.cfg.template.  I would like to make sure that we have:
# Who should emails be sent to
regression_emails = dev.tree-management@lists.mozilla.org
geomean_regression_emails = dev.tree-alerts@lists.mozilla.org

I guess making available the analysis.cfg file (sans private keys, etc.) would be a good thing :)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 70

•

10 years ago

Also, if catlee knows more about this script and it isn't run on the normal graph server (i.e. this is run someplace that he has access to), then I would like to know what the procedure is for deploying this.

Ed Morley [:emorley]

Comment 71

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #69)
> I guess making available the analysis.cfg file (sans private keys, etc.)
> would be a good thing :)

Ideally we could just check it into the repo in the appropriate location, but named as analysis.cfg.example etc

Chris AtLee [:catlee]

Comment 72

•

10 years ago

That's what analysis.cfg.template is supposed to be.

Flags: needinfo?(catlee)

Chris AtLee [:catlee]

Comment 73

•

10 years ago

In any case, the deployment process right now is to file a bug in releng (cc me) specifying which settings need to be added or modified.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 74

•

10 years ago

Attached patch update scripts / tooling to support geomean (1.1) — Details — Splinter Review

fixed reference to data_type as a value instead of a string.

Attachment #8526144 - Attachment is obsolete: true

Attachment #8532064 - Flags: review?(catlee)

Chris AtLee [:catlee]

Updated

•

10 years ago

Attachment #8532064 - Flags: review?(catlee) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 75

•

10 years ago

landed on graphs:
http://hg.mozilla.org/graphs/rev/4095308c115a

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 76

•

10 years ago

fixed a typo and verified an alert on tree-alerts:
https://groups.google.com/forum/#!topic/mozilla.dev.tree-alerts/a7L7nh1PJlQ

caveat: this most likely will only show unique geomean alerts due to caching of the warnings to prevent duplication of alerts.  The talos alert sheriff team will look at both average/geomean on all alerts for the bake in period until we turn off the dev.tree-management average alerts.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 77

•

10 years ago

Attached patch store the single value for average as geomean (1.0) — Details — Splinter Review

on 3 tests: tcanvasmark, v8_7, and kraken we upload a single value from talos:
http://hg.mozilla.org/build/talos/file/tip/talos/output.py#l253

these are of type 'AVERAGE', not 'VALUES'.  In the code to store a geometric mean, we only handle 'VALUES'.  This patch stores the single value as the geometric mean for the 'AVERAGE' inputs we get.

Attachment #8537197 - Flags: review?(rhelmer)

Attachment #8537197 - Flags: review?(avihpit)

Avi Halachmi (:avih)

Reporter

Comment 78

•

10 years ago

Comment on attachment 8537197 [details] [diff] [review]
store the single value for average as geomean (1.0)

Review of attachment 8537197 [details] [diff] [review]:
-----------------------------------------------------------------

Changing to f+ because I'm not familiar with the graphserver internals, but the fix looks valid. THanks!

Attachment #8537197 - Flags: review?(avihpit) → feedback+

Robert Helmer [:rhelmer]

Updated

•

10 years ago

Attachment #8537197 - Flags: review?(rhelmer) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 79

•

10 years ago

http://hg.mozilla.org/graphs/rev/c6266505147e

:rhelmer, can you deploy this to the live server?

Robert Helmer [:rhelmer]

Updated

•

10 years ago

Blocks: 1112387

Robert Helmer [:rhelmer]

Comment 80

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #79)
> http://hg.mozilla.org/graphs/rev/c6266505147e
> 
> :rhelmer, can you deploy this to the live server?

Done in bug 1112387

Ed Morley [:emorley]

Comment 81

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #76)
> caveat: this most likely will only show unique geomean alerts due to caching
> of the warnings to prevent duplication of alerts.  The talos alert sheriff
> team will look at both average/geomean on all alerts for the bake in period
> until we turn off the dev.tree-management average alerts.

How are the new alerts looking? Are we good to switch off the ones sent to dev.tree-management now? :-)

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 82

•

9 years ago

yes, we are ready to do it, I will need to figure out a patch and coordinate with catlee on how to deploy it.  I verified the data, found an error a couple weeks ago, now all the data is good.

Flags: needinfo?(jmaher)

Ed Morley [:emorley]

Comment 83

•

9 years ago

That's great, thank you :-)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 84

•

9 years ago

Attached patch run analysis for geomean as default (1.0) — Details — Splinter Review

This patch does a couple things:
1) sets geomean as the default on graph server
2) only runs one analysis, not two, so we will be posting to dev.tree-alerts
3) fixes the upload/reporting to use geomean and passively collect average

Once this is live we could eventually remove the average code.

Attachment #8544026 - Flags: review?(rhelmer)

Attachment #8544026 - Flags: review?(catlee)

Chris AtLee [:catlee]

Comment 85

•

9 years ago

Comment on attachment 8544026 [details] [diff] [review]
run analysis for geomean as default (1.0)

Review of attachment 8544026 [details] [diff] [review]:
-----------------------------------------------------------------

the regression analysis bits look fine to me. let me know when you want to deploy.

Attachment #8544026 - Flags: review?(catlee) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 86

•

9 years ago

Comment on attachment 8544026 [details] [diff] [review]
run analysis for geomean as default (1.0)

via irc chat with rhelmer, he is fine with this patch and will help coordinate deployment.

Attachment #8544026 - Flags: review?(rhelmer)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 87

•

9 years ago

checked in:
http://hg.mozilla.org/graphs/rev/f0a48c84d941

Next steps are to:
1) deploy changes to live server (:rhelmer)
2) update analyze_talos machine (:catlee)

I would like to do this in the same general time window.  Maybe Monday or Tuesday morning PST so :rhelmer can deploy and then in EST midday/afternoon :catlee can update the analyze talos script.

;rhelmer, :catlee- can you guys confirm this update/deployment time sounds good?

Flags: needinfo?(rhelmer)

Flags: needinfo?(catlee)

Robert Helmer [:rhelmer]

Comment 88

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #87)
> checked in:
> http://hg.mozilla.org/graphs/rev/f0a48c84d941
> 
> Next steps are to:
> 1) deploy changes to live server (:rhelmer)
> 2) update analyze_talos machine (:catlee)
> 
> I would like to do this in the same general time window.  Maybe Monday or
> Tuesday morning PST so :rhelmer can deploy and then in EST midday/afternoon
> :catlee can update the analyze talos script.
> 
> ;rhelmer, :catlee- can you guys confirm this update/deployment time sounds
> good?

WFM!

Flags: needinfo?(rhelmer)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 89

•

9 years ago

this was just deployed about 15 minutes ago.

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Flags: needinfo?(catlee)

Resolution: --- → FIXED

initial work done (0.5) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 2.77 KB, patch		Details \| Diff \| Splinter Review
add geomean field to db and populate it (1.0) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 3.68 KB, patch	rhelmer : review- avih : feedback+	Details \| Diff \| Splinter Review
modify api that graph.html uses to include geomean in addition to everything else (0.9) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 4.46 KB, patch	MattN : feedback+	Details \| Diff \| Splinter Review
add geomean field to db and populate it (1.1) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 3.86 KB, patch	rhelmer : review+ MattN : feedback+	Details \| Diff \| Splinter Review
modify api that graph.html uses to include geomean in addition to everything else (1.0) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 4.72 KB, patch	rhelmer : review+	Details \| Diff \| Splinter Review
reduce the chance of values = 0 (1.0) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 1.38 KB, patch	avih : review+	Details \| Diff \| Splinter Review
modify api that graph.html uses to include geomean in addition to everything else (1.1) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 4.73 KB, patch	MattN : review-	Details \| Diff \| Splinter Review
modify api that graph.html uses to include geomean in addition to everything else (2.0) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 4.79 KB, patch	MattN : review+ rhelmer : review+	Details \| Diff \| Splinter Review
update scripts / tooling to support geomean (1.0) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 11.00 KB, patch	rhelmer : review+	Details \| Diff \| Splinter Review
update scripts / tooling to support geomean (1.1) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 10.83 KB, patch	catlee : review+	Details \| Diff \| Splinter Review
store the single value for average as geomean (1.0) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 1.40 KB, patch	rhelmer : review+ avih : feedback+	Details \| Diff \| Splinter Review
run analysis for geomean as default (1.0) 9 years ago Joel Maher ( :jmaher ) (UTC -8) 9.03 KB, patch	catlee : review+	Details \| Diff \| Splinter Review