web-tooling subtests should be 'score'

VERIFIED FIXED in Firefox 65


7 months ago
6 months ago


(Reporter: armenzg, Assigned: armenzg)


Version 3

Firefox Tracking Flags

(firefox65 fixed)



(3 attachments)

In bug 1502036 we discovered that web-tooling subtests are marked as *ms* "lower is better" (by *not* specifying "lower_is_better: false") [1]

However, we need it to be a *score* "higher is better".

This is similar to bug 1486789.
This blocks bug 1502036.
The metadata in Perfherder will be updated in bug 1502073.

Adapted from bug 1502036:
> Looking at the graphs for webtooling [1] we see a peculiar oddity. 
> In the case of web-tooling, the aggregate results are being reported as a __Score__,
> where higher-is-better, and each subtest is being reported as __execution time in ms__,
> where lower-is-better.
> However, if you look at the score and subtests, comparing Chrome and SpiderMonkey,
> they disagree about what is better: Chrome shows a lower (worse) score than
> SpiderMonkey in the aggregate score, but for every subtest shows lower (better)
> execution time, which should imply a higher score.

From the logs we can see that the subscores are also scores [2] as mgaudet points out.
From PERFHERDER_DATA only the main score has __"lowerIsBetter": false__ while the subtests don't have that property (in the absence of it we default to lower is better).

We need to change the __subtests__ of web-tooling to report as higher is better.

I think the line that needs editing is this one:
> self.suite['subtests'].append({'name': test_name, 'value': mean})
I think we just need to change it to:
> self.suite['subtests'].append({
>     'lower_is_better': false,
>     'name': test_name,
>     'value': mean,
> })

I will give it a try. I don't remember the last time I pushed to Try :)

[1]: https://arewefastyet.com/linux64/web-tooling?numDays=90
[task 2018-11-09T18:23:34.868Z] Running Web Tooling Benchmark 0.3.2...
[task 2018-11-09T18:23:34.869Z] --------------------------------------
[task 2018-11-09T18:23:41.227Z]          acorn:  6.93 runs/sec
[task 2018-11-09T18:23:48.455Z]          babel:  6.71 runs/sec
[task 2018-11-09T18:23:55.041Z]        babylon:  6.23 runs/sec
[task 2018-11-09T18:24:03.716Z]          buble:  4.88 runs/sec
[task 2018-11-09T18:24:07.838Z]           chai: 10.13 runs/sec
[task 2018-11-09T18:24:18.674Z]   coffeescript:  3.64 runs/sec
[task 2018-11-09T18:24:31.902Z]         espree:  3.05 runs/sec
[task 2018-11-09T18:24:38.583Z]        esprima:  5.66 runs/sec
[task 2018-11-09T18:24:51.414Z]         jshint:  3.17 runs/sec
[task 2018-11-09T18:24:58.603Z]          lebab:  5.87 runs/sec
[task 2018-11-09T18:25:06.131Z]        prepack:  5.51 runs/sec
[task 2018-11-09T18:25:13.417Z]       prettier:  5.95 runs/sec
[task 2018-11-09T18:25:17.881Z]     source-map: 15.04 runs/sec
[task 2018-11-09T18:25:23.878Z]     typescript:  7.39 runs/sec
[task 2018-11-09T18:25:28.665Z]      uglify-es: 11.92 runs/sec
[task 2018-11-09T18:25:38.502Z]      uglify-js:  4.16 runs/sec
	"framework": {
		"name": "js-bench"
	"suites": [{
		"name": "web-tooling-benchmark-sm",
		"lowerIsBetter": false,
		"value": 6.03,
		"shouldAlert": false,
		"units": "score",
		"subtests": [{
			"name": "web-tooling-benchmark-source-map",
			"value": 15.04
		}, {
No longer blocks: 1502036
Component: Raptor → General
Posted patch patchSplinter Review
Could someone please try this patch on Try?

You just need to test this for the webtool jobs (sm and v8).
Thanks mgaudet!

The subtests now seem to have 'lower_is_better': false defined [1]
I think this is sufficient.

> "subtests": [{
>   "lower_is_better": false,
>   "name": "web-tooling-benchmark-source-map",
>   "value": 13.75
> }, {
Comment on attachment 9024078 [details] [diff] [review]

Review of attachment 9024078 [details] [diff] [review]:

Hi Ionut,
If this looks good to you, would you mind landing it for me? Thanks!

My Hg account has been disabled for lack of usage and I won't reactivate it since I won't need it in the future.
Attachment #9024078 - Flags: review?(igoldan)
Attachment #9024078 - Flags: review?(igoldan) → review+
(In reply to Armen [:armenzg] from comment #5)
> Comment on attachment 9024078 [details] [diff] [review]
> patch
> Review of attachment 9024078 [details] [diff] [review]:
> -----------------------------------------------------------------
> Hi Ionut,
> If this looks good to you, would you mind landing it for me? Thanks!
> My Hg account has been disabled for lack of usage and I won't reactivate it
> since I won't need it in the future.

Could you first push this to Phabricator, using Arcanist? I'm not familiar with landing patches on mozilla-inbound directly.
Flags: needinfo?(armenzg)
Assignee: nobody → armenzg
Flags: needinfo?(armenzg)
Let me know if there's anything else I need to do:

I don't know how to land from Phabricator; Could you please land it for me?

Comment 9

6 months ago
Pushed by igoldan@mozilla.com:
web-tooling subtests should be 'score' r=igoldan

Comment 10

6 months ago
Last Resolved: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
We named it wrong (lowerIsBetter vs lower_is_better).
The subtests are still reporting incorrectly as lower is better [1]

[1] https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1760198,1,11&selected=mozilla-central,1760198,403880,643400991,11
[2] https://taskcluster-artifacts.net/F1dfOqM-THe72nyfI5Bt0g/0/public/logs/live_backing.log
> {
> 	"framework": {
> 		"name": "js-bench"
> 	},
> 	"suites": [{
> 		"name": "web-tooling-benchmark-sm",
> 		"lowerIsBetter": false,
> 		"value": 6.0,
> 		"shouldAlert": false,
> 		"units": "score",
> 		"subtests": [{
> 			"lower_is_better": false,
> 			"name": "web-tooling-benchmark-source-map",
> 			"value": 13.58
> 		}, {
Resolution: FIXED → ---
How do we schedule jobs on Try from Phabricator?
AFAIK that's not currently possible. I will submit it for you :)
Thank you!

igoldan: would you be able to review this and land it for me? thanks!

>		"lowerIsBetter": false,
>		"value": 6.11,
>		"shouldAlert": false,
>		"units": "score",
>		"subtests": [{
>			"lowerIsBetter": false,

Comment 17

6 months ago
Pushed by jmaher@mozilla.com:
web-tooling subtests to use lowerIsBetter instead of lower_is_better r=jmaher

Comment 18

6 months ago
Last Resolved: 6 months ago6 months ago
Resolution: --- → FIXED
Flags: needinfo?(mgaudet)
Ok, this looks good.
Flags: needinfo?(mgaudet)
You need to log in before you can comment on or make changes to this bug.