evaluate existing platform_microbenchmarks to determine what is stable and what alerts we should generate

RESOLVED FIXED in Firefox 56

Status

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: jmaher, Assigned: igoldan)

Tracking

Trunk
mozilla56
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 fixed)

Details

(Whiteboard: [PI:July])

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
in bug 1323238 we disabled some of the platform_microbenchmarks.  The others need more attention.

There are:
4x Strings*
1x Stylo*
3x TestStandardURL*

Each of these run on 5 platforms, so there are 40 signatures to evaluate.

foreach of the 40 signatures, I would like to look at the 60 day view to determine:
1) is this trend alerting often, but staying in the same range?
2) is this trend showing a consistent value
3) is this trend multi modal

Once we have data on each signature, I would like to make recommendations to the original authors (so the 20 Strings* data points we can talk with :milan!) and see if they have more information.  Ideally we will have a better set of data and thresholds that we can use to generate alerts so there is not a lot of noise.
(Reporter)

Comment 1

2 years ago
Ionut, is this something you can do?
Flags: needinfo?(ionut.goldan)
Whiteboard: [PI:June]
(Reporter)

Comment 2

2 years ago
:milan mentions for the String* ones that:
* 20% alert threshold is good (assuming that catches things outside of bi-modal)
* email him with specifics- maybe 'file bug' can email a few people instead
:jmaher Yes, I'm looking over this.
Flags: needinfo?(ionut.goldan)
Assignee: nobody → ionut.goldan
(Reporter)

Comment 4

2 years ago
thanks :igoldan for looking into this:
https://docs.google.com/a/mozilla.com/spreadsheets/d/1frn--TWt9k_Be-qeujJXlHMjbdEVTNlXeEm0n5DvFlU/edit?usp=sharing

I think in summary we see almost all linux32 and most linux64 data points as noisy.  I would recommend disabling linux32 and linux64 as platforms and keeping osx/windows.

the Strings* tests are the most volatile, specifically Strings PerfStripCharsWhitespace and Strings PerfStripCharsCRLF.  The might be smoothing out, so it seems reasonable to keep them.

Cleaning this up will ensure that we can not waste sheriffing time looking into stuff and not randomizing test owners and developers.  Ideally an alert == a bug to fix.

I would like some general agreement on:
* disabling linux32/linux64 for all platforms

:milan and :bholley do you have concerns with the above recommendation?
Flags: needinfo?(milan)
Flags: needinfo?(bobbyholley)
(In reply to Joel Maher ( :jmaher) from comment #4)
> I would like some general agreement on:
> * disabling linux32/linux64 for all platforms
> 
> :milan and :bholley do you have concerns with the above recommendation?

Not especially - I mostly care about perf on windows and osx.

That said, it does look from the data that the CSS parsing tests aren't particularly noisy on linux. But I'm not sure if there's some win to be gained by turning off _all_ the tests for a given platform, even if they're not all noisy on that platform.
Flags: needinfo?(bobbyholley)
I'm good with the suggestion.
Flags: needinfo?(milan)
I would recommend using an environment variable inside the unit tests to determine whether something should be turned on or not:

http://searchfox.org/mozilla-central/source/testing/gtest/mozilla/MozGTestBench.cpp

Something like:

bool submitPerfherder = bool(getenv("PERFHERDER_SUBMISSION_ENABLED"));
if (submitPerfherder) {
  ...
}

I assume we could set the PERFHERDER_SUBMISSION_ENABLED flag somewhere inside mozharness or the taskcluster config (I'm not sure how to do this offhand but we can probably find someone easily enough who does).

If we want to collect the data but not alert, that is also possible, just make an environment variable that sets the shouldAlert property to true and then vary the perfherder output accordingly. E.g.

bool shouldAlert = bool(getenv("PERFHERDER_ALERTING_ENABLED"));
printf("PERFHERDER_DATA: {\"framework\": {\"name\": \"%s\"}, "
                           "\"suites\": [{\"name\": \"%s\", \"subtests\": "
                               "[{\"name\": \"%s\", \"value\": %i, \"lowerIsBetter\": true, \"shouldAlert\": %s}]"
                           "}]}\n",
                           MOZ_GTEST_BENCH_FRAMEWORK, aSuite, aName, msDuration, shouldAlert ? "true" : "false);
Whiteboard: [PI:June] → [PI:July]
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 10

2 years ago
mozreview-review
Comment on attachment 8889428 [details]
Bug 1369807 - disable platform microbenchmarks on Linux

https://reviewboard.mozilla.org/r/160480/#review165722

This looks ok to me, at least the shouldAlert part (I think we should skip the submitPerfherder stuff for now). I don't know if we should get a peer to look over the gtests code as well, probably not a big enough change to worry about too much.

::: testing/gtest/mozilla/MozGTestBench.cpp:19
(Diff revision 2)
>  namespace mozilla {
>  void GTestBench(const char* aSuite, const char* aName,
>                  const std::function<void()>& aTest)
>  {
> +bool submitPerfherder = bool(getenv("PERFHERDER_SUBMISSION_ENABLED"));
> +bool shouldAlert = bool(getenv("PERFHERDER_ALERTING_ENABLED"));

These lines should be indented in the style of the rest of the file (2 space).

::: testing/gtest/mozilla/MozGTestBench.cpp:26
(Diff revision 2)
>  #ifdef DEBUG
>    // Run the test to make sure that it doesn't fail but don't log
>    // any measurements since it's not an optimized build.
>    aTest();
>  #else
> +  if (submitPerfherder) {

We should still run the test even if submitPerfherder is false, just make the actual print (below) conditional on that variable being present.

Though, now that I think about it, maybe we should only bother with "shouldAlert" in this code, since I think we still want to always collect the data (at least for now).
Attachment #8889428 - Flags: review?(wlachance) → review-
Comment hidden (mozreview-request)
(Reporter)

Comment 12

2 years ago
mozreview-review
Comment on attachment 8889428 [details]
Bug 1369807 - disable platform microbenchmarks on Linux

https://reviewboard.mozilla.org/r/160480/#review165838

this is nice and simple- have you tested this on try server?
Attachment #8889428 - Flags: review?(jmaher) → review+
Looks like the DEBUG builds are failing, due to a warning on unused variable.
I always define the shouldAlert variable, but it's not used in #DEBUG mode.

Guess I'll move it under the #else macro from here: https://hg.mozilla.org/try/rev/23d771dd2953decc350ed67b54d0610662c13eca#l1.18
Flags: needinfo?(jmaher)
Comment hidden (mozreview-request)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #16)
> I made a new push.
> I also tested it on Try. I see some X7 test is failing:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=d991302cddc86c388fb4b222bce26a54ea9b73a9&selectedJob=1
> 17474177

That couldn't have been affected by your changes, you can safely ignore it. However it looks like "shouldAlert" is still true if you look at the the Linux tests (examine log and look for "PERFHERDER_DATA") so I think your patch still needs a bit of work:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d991302cddc86c388fb4b222bce26a54ea9b73a9&filter-searchStr=gtest&selectedJob=117474035
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #14)
> Looks like the DEBUG builds are failing, due to a warning on unused variable.
> I always define the shouldAlert variable, but it's not used in #DEBUG mode.
> 
> Guess I'll move it under the #else macro from here:
> https://hg.mozilla.org/try/rev/23d771dd2953decc350ed67b54d0610662c13eca#l1.18

This is fine.
Flags: needinfo?(jmaher)

Comment 19

2 years ago
mozreview-review
Comment on attachment 8889428 [details]
Bug 1369807 - disable platform microbenchmarks on Linux

https://reviewboard.mozilla.org/r/160480/#review166244

It seems like this patch isn't quite working right yet, so holding off on r+. Code looks fine now though.
Attachment #8889428 - Flags: review?(wlachance) → review-

Comment 20

2 years ago
mozreview-review
Comment on attachment 8889428 [details]
Bug 1369807 - disable platform microbenchmarks on Linux

https://reviewboard.mozilla.org/r/160480/#review166266

::: testing/gtest/rungtests.py:120
(Diff revision 4)
> +        env["PERFHERDER_ALERTING_ENABLED"] = "1"
>          pathvar = ""
>          if mozinfo.os == "linux":
>              pathvar = "LD_LIBRARY_PATH"
> +            # disable alerts for unstable tests (Bug 1369807)
> +            env["PERFHERDER_ALERTING_ENABLED"] = "0"

I just realized what the problem is. getenv in C returns a pointer, not a value. So a "0" value will resolve to true here (since it is a pointer value to a "0" string).

I think just deleting the environment variable here should do the trick (since getenv will then return NULL, which resolves to false):

```py
del env['PERFHERDER_ALERTING_ENABLED']
```
Related to this, I filed bug 1384245 with an idea which should improve stability.
Depends on: 1384245
See Also: → bug 1384245
Comment hidden (mozreview-request)
Thanks, again. I used your approach and now it works. 'shouldAlert' on non-Debug builds is now false on linux. On Windows it's true. OSX tests are still running:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2a31a4a4cff5c8f6d361ccfb854cff5d35b7bcc

Comment 24

2 years ago
mozreview-review
Comment on attachment 8889428 [details]
Bug 1369807 - disable platform microbenchmarks on Linux

https://reviewboard.mozilla.org/r/160480/#review166774

lgtm now
Attachment #8889428 - Flags: review?(wlachance) → review+

Comment 25

2 years ago
Pushed by wlachance@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a323aee88b0a
disable platform microbenchmarks on Linux r=jmaher,wlach

Comment 26

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a323aee88b0a
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.