Add motionmark benchmark results for nightly/inbound/autoland/try

NEW
Assigned to

Status

Testing
Talos
2 months ago
3 days ago

People

(Reporter: sphilp, Assigned: jmaher)

Tracking

(Depends on: 2 bugs, Blocks: 1 bug)

Version 3
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [PI:January])

(Reporter)

Description

2 months ago
As part of the WebRender project, they would like to have results from the motion mark benchmark to check progress and regressions as WebRender moves along in development. 

All major platforms if possible, the necessary prefs to enable are:

gfx.webrender.enabled
gfx.webrender.blob-images
image.mem.shared
and on (Linux only) layers.acceleration.force-enabled
(Assignee)

Comment 1

2 months ago
we will need a description here:
https://wiki.mozilla.org/Buildbot/Talos/Tests

that description needs:
* development owner to contact for questions <- this is not something that can be determined from the motionmark source code
* what we are measuring
* a summary of the calculation and example of numbers


Also does this duplicate any work we have already done or is in work.  For example there are some new tests for OMTP coming online in bug 1419306.  We also have tcanvasmark, and glterrain- I am not sure if any of those tests could be retired :)

Does this test require -qr builds or any other special builds?

Also are there any issues with motionmark being added as third_party code to the mozilla-central tree?
Whiteboard: [PI:December]
We will probably want to do the equivalent of http://browserbench.org/MotionMark/developer.html with certain preferences modified from the default.  We get more consistent results that way.  Glenn can help with some instructions as to what the most optimal setup may be.
Flags: needinfo?(gwatson)

Comment 3

2 months ago
The settings we'd ideally want for an initial implementation would be (based on the options available at the http://browserbench.org/MotionMark/developer.html page mentioned above):

* Run all tests in the Animometer and HTML Suite groups.
* Test length: 15 seconds.
* Complexity: "Keep at a fixed complexity".
* Other settings as default.

We'll need to work out good values to set the complexity value for each of those tests. This will depend on the hardware in question that the tests are being run on.
Flags: needinfo?(gwatson)
(Assignee)

Comment 4

2 months ago
right now our hardware is nvidia graphics:
https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation

this will change in Q1 for linux/windows to new hardware which has Intel chipset.  I have not seen us adjust parameters of talos tests based on hardware, only because we are comparing against the previous revision, luckily anyone can edit this as it will be living in tree.
(Assignee)

Updated

a month ago
Blocks: 1425845

Updated

18 days ago
Whiteboard: [PI:December] → [PI:January]
(Assignee)

Updated

17 days ago
Assignee: nobody → jmaher
(Assignee)

Updated

16 days ago
Depends on: 1428435
(Assignee)

Comment 5

9 days ago
I see that changing complexity from "ramp" (default) -> "fixed" (as suggested in the bug), I get no results (a lot of 0.0 or NaN).  

:gw, how important is that setting?
Flags: needinfo?(gwatson)

Comment 6

7 days ago
When I use 'ramp' on my my local machine, I get very unstable results - the reported value is often different by an order of magnitude between runs, which is of course not ideal for benchmarking. Ramp modifies the complexity of the benchmark dynamically, depending on how it thinks the browser is performing.

I have an idea why you might be seeing invalid results in Fixed mode, although it's just a guess. When you use Fixed mode in a web browser, you get a series of text input boxes, one for each test. This contains the complexity to run the test at, and the number is stored between runs. Perhaps in the CI context we're running it, those complexity values are not initialized and that may be why you're seeing invalid results?

If that's the case, I suspect it's probably possible to specify a complexity value to use for the test via query parameters in the URL.
Flags: needinfo?(gwatson)
(Assignee)

Comment 7

5 days ago
I am stuck trying to get it to run outside of CI, this is just loading the file locally and running in nightly.  using ramp it all works, using fixed it fails.  :gw, can you own driving this to make sure that we can run this, I run it via:
file:///C:/Users/elvis/mozilla-inbound/third_party/webkit/PerformanceTests/MotionMark/developer.html

Without proper specs for how to run this, I cannot move forward- so this is not a priority for me until I can run it locally first.  As a note, I get the same results in both my attempt at CI and the method above.
Flags: needinfo?(gwatson)

Comment 8

5 days ago
What complexity value are you setting in the fixed test mode?

For example, on the page you linked to:
 * Click on "Keep at a fixed complexity"
 * Click on the "Animometer" option to open that test suite.
 * Click on the tickbox for "Multiply" to enable running that test.
 * What is the currently set complexity value for that test (there will be a number entry field next to each of those tests).
 * Do you get valid results if you set that complexity value to ~500 (anywhere from 100 - 5000 might be reasonable for that test, depending on test hardware).
Flags: needinfo?(gwatson)
(Assignee)

Comment 9

5 days ago
the instructions I had were:
* Run all tests in the Animometer and HTML Suite groups.
* Test length: 15 seconds.
* Complexity: "Keep at a fixed complexity".
* Other settings as default.


I clicked the checkbox to run all the tests in animometer and html, then set complexity at 'fixed' (there is one option).  If there are other options, please specify them (for example Multiple is default to '44')- going back and forth for each subtest seems like a lot of randomization, can you give the specific requirements needed to run the benchmark to get value for your use case and I can do that?
Sorry, I should have been a bit clearer above that we need to tweak the complexity values for each test.

I can't give you exact values right now, because the "correct" value to use depends on the hardware being used as the benchmark runner. We should only need to do this once (well, each time we change the underlying hardware that the benchmark will be running on).

As a rough guide, we want to choose complexity numbers for each test such that the test runs at approximately 30 FPS. The reason for this is that we can't typically measure above 60 FPS (due to the way vertical sync works). If we tune each test on the benchmark hardware to run at ~30FPS, we should be able to clearly see any major regressions or improvements in each benchmark.

Does that help? Feel free to ping me on IRC (gw) or we can set up a video call to discuss further, if that's easier.
(Assignee)

Comment 11

5 days ago
got it- we are switching hardware for our CI machines- ideally next week for linux and a few weeks later for windows- we do have different hardware for osx though- I am not sure how to differentiate this.

Is there a preferred method for determining this- maybe a debug mode?  I really don't know what the benchmark does or how to determine- when I run it locally there is usually a blank white screen.
Is the (new) hardware for each of the platforms reasonably comparable in terms of GPU and CPU? If so, we may be able to find a complexity number that is good enough to share between platforms.

The way to determine it is a bit manual. What I do is:
 * Select one of the tests (e.g. the Multiply test).
 * Set to fixed complexity, and a random number for the complexity value for that test (e.g. 500).
 * Run the test.
 * At the end of the test, the report screen (MotionMark Score) will include a table that lists the average FPS.

For example:

Test Name | Time Complexity | FPS
Multiply  | 100.00 ± 0.00%  | 20.25 ± 18.19%

In this test case, I've set the complexity to 100, and the result was an average FPS of 20.25.

So, I'd then re-run the test, with a lower complexity, in order to find a complexity that gives me ~30 FPS as a result.

It's a concerning that you're seeing a white screen though? I wonder if there is something going wrong with the machine you're running it on? For example, in the Multiply test, you should see a black screen, with a number of rotating alpha-blended border corners. Is that what you see?
(Assignee)

Comment 13

4 days ago
our new machines are much different hardware than the existing ones- possibly it is best to wait a bit until that is deployed.

As for the blank screen, this is at the end of a test- there is an error in the console when I don't use 'ramp' and I haven't been able to figure it out.  Is there a different way to run this?
(Assignee)

Comment 14

4 days ago
odd, I have ran the Multiply tests dozens of times locally and it ends in a white screen and there are no console errors :(  Possibly this benchmark isn't ready for prime time?  I have tried to read a bit more on it to see if there are setup things which I need to do in order to make it work- unfortunately I didn't come up with anything.

I do see the rotating items, but they disappear into a white screen after a short while and that is all I see
(Assignee)

Updated

4 days ago
Depends on: 1429597
Waiting until the new machines are available sounds like a plan.

That's really strange - I've been using this test suite on and off for a year or so, and never seen that problem.

Do you see the same issue running the Apple hosted version at http://browserbench.org/MotionMark/developer.html ?

Could it possibly be an addon related problem or anything like that?
(Assignee)

Comment 16

4 days ago
ok, I get results in the same browser session but from http://browserbench.org - I wonder if this is file:// access vs http:// access, let me try
(Assignee)

Comment 17

4 days ago
ok, http works vs file
OK, great. I know nothing about the build / test / benchmark process. Is it easy enough to serve those files locally over HTTP during the benchmarking process?
(Assignee)

Updated

3 days ago
Depends on: 1431408
(Assignee)

Comment 19

3 days ago
yeah, we always run via http, so this should work- I will profile on specific machines and get settings in place- probably land this sooner and then trust the numbers when we get new hardware.  This will be for:
linux64
osx10.10
windows10x64
32 bit firefox on windows10x64

if there are any of the above platforms we shouldn't be running on, now would be a good time to speak up:)
I don't know much about osx versions, but the above sounds good to me. Thanks!
You need to log in before you can comment on or make changes to this bug.