Closed Bug 1493648 Opened 1 year ago Closed 1 year ago

Can we run the Godot Engine wasm benchmark in automation?

Categories

(Testing :: Raptor, enhancement)

enhancement
Not set

Tracking

(firefox65 fixed)

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: bbouvier, Assigned: Bebe)

References

(Depends on 1 open bug)

Details

Attachments

(2 files, 4 obsolete files)

This page provides a wasm benchmark using the Godot game engine:

http://godot.eska.me/pub/wasm-benchmark/

See the results that appear in the box below the canvas: the wasm team would be interested in getting the following measures on AWFY:

- WebAssembly instantiation
- Engine initialization
- From download finish to first frame
- From download finish to interactive
- CPU time to interactive

That is, everything except the Download time, which is not relevant for us.

Ideally, we'd run this on every machine that's also running the wasm misc benchmark.

Does it look feasible?
:bbouvier, would you want to replace wasm-misc, or run this in addition to wasm-misc?
Flags: needinfo?(bbouvier)
To run it in addition to wasm-misc, if that's possible.
Flags: needinfo?(bbouvier)
:bebe, would you want to try this bug out?  look at bug 1481708 for some prior art
Flags: needinfo?(bebe)
Assignee: nobody → bebe
Flags: needinfo?(bebe)
Test added from code review. https://phabricator.services.mozilla.com/D8957
:bbouvier I used a downloaded version of the benchmark code. Do you have any other suggestion or maybe the source code?

:all the test fail locally after 2-3 iterations. From my debugging it looks like a memory leak. Can you please identify the cause?
Flags: needinfo?(rwood)
Flags: needinfo?(jmaher)
Flags: needinfo?(bbouvier)
I am unable to run this locally by applying the patch due to a console error:
JavaScript error: http://localhost:54277/wasm-godot/godot.js line 1 > WebAssembly.instantiate, line 1: CompileError: wasm validation error: at offset 0: failed to match magic number

is it possible we need to do something different for wasm or edit files to run locally?
Flags: needinfo?(jmaher)
I don't believe phabricator supports adding binary files, I remember an email about that.

diff --git a/third_party/webkit/PerformanceTests/wasm-godot/godot.pck b/third_party/webkit/PerformanceTests/wasm-godot/godot.pck
new file mode 100644
index 0000000000000000000000000000000000000000..0000000000000000000000000000000000000000
GIT binary patch
literal 0
Hc$@<O00001

See https://bugzilla.mozilla.org/show_bug.cgi?id=1486026
Flags: needinfo?(rwood)
Btw, instead of adding the benchmark in tree, we could add the source to our benchmark github repo (https://github.com/mozilla/perf-automation/tree/master/benchmarks) and use a fetch task.
good call :rwood, lets get this in the benchmarks.  There are instructions on the perf-automation page for how to upload benchmarks.  Keep in mind going this route this test will be restricted to android for now (fetch task limitations right now).

:bbouvier, if this is important to run on windows now, we can do this via the old school splinter patches :)
Attached patch 441443.patch (obsolete) — Splinter Review
Added this as a patch to run in all the envs

Also created https://github.com/mozilla/perf-automation/pull/11
Attachment #9018152 - Flags: review?(rwood)
Attachment #9018152 - Flags: review?(jmaher)
Attachment #9017784 - Attachment is obsolete: true
thanks for the new patch :bebe, I am now running into problems after the first two page loads this takes exponentially longer.  I assume this is wasm related, :bbouvier, is wasm made to be reloaded rapidly?  Are there preferences we can set or delays we can do to make this more reliable?
For what it's worth, I don't have access to the source code, but just copying / downloading assets from the web site should be sufficient.

Regarding the page load time issue, is there a way I can run this on my own browser? what are the steps to reproduce the issue? The demo page http://godot.eska.me/pub/wasm-benchmark/ doesn't show any slowdowns on reloads on my local machine (x86 64 bits Ubuntu).
Flags: needinfo?(bbouvier)
:bbouvier, I downloaded the patch from this bug, applied it to my local tree:
./mach build
./mach raptor-test -t raptor-wasm-misc

and watched the first 2 loads work great, and the 3rd was slower, and the 4th hung.
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #13)
> :bbouvier, I downloaded the patch from this bug, applied it to my local tree:
> ./mach build
> ./mach raptor-test -t raptor-wasm-misc
> 
> and watched the first 2 loads work great, and the 3rd was slower, and the
> 4th hung.

./mach raptor-test -t raptor-wasm-godot
if you open the console while running the test we can see this error:
https://www.screencast.com/t/xDZC4TxR
Flags: needinfo?(bbouvier)
Added a timeout of 10 sec before opening and we got the same issue on the third test
Can't reproduce with an optimized build with debug assertions on Ubuntu x64. There should be a last-chance GC happening in the engine, but since I can't reproduce, we might need to try something else. Is there a way you can force a GC between runs using about:memory and the "minimize memory" button?
Flags: needinfo?(bbouvier)
(In reply to Benjamin Bouvier [:bbouvier] from comment #17)
> Can't reproduce with an optimized build with debug assertions on Ubuntu x64.
> There should be a last-chance GC happening in the engine, but since I can't
> reproduce, we might need to try something else. Is there a way you can force
> a GC between runs using about:memory and the "minimize memory" button?

I ran the tests using your suggestions.

>mach raptor-test -t raptor-wasm-godot
> open tab with about:memory 
> aster each cycle click 'GC' and 'minimize memory'

It looks like the tests is passing without any issues.

Any suggestions?
:aswan do you know if a webexention can force a gc?
if yes do you have a example on how can we do that?
Flags: needinfo?(aswan)
Comment on attachment 9018152 [details] [diff] [review]
441443.patch

code is good here, lets figure out how to force a gc from a web extension so we can see this running- great work so far :bebe.
Attachment #9018152 - Flags: review?(rwood)
Attachment #9018152 - Flags: review?(jmaher)
(In reply to Florin Strugariu [:Bebe] from comment #19)
> :aswan do you know if a webexention can force a gc?

I don't believe there's a way to do that now.  It sounds like something that would belong in the devtools APIs, redirecting to Luca to see if he has any thoughts.
The other option is to use a WebExtension Experiment to implement a new API for this:
https://firefox-source-docs.mozilla.org/toolkit/components/extensions/webextensions/basics.html#built-in-apis-versus-experiments
Flags: needinfo?(aswan) → needinfo?(lgreco)
Given that the main use case seems to be currently related to benchmark tests, in my opinion an Experiment API would be a better solution for it.

Also, I think that raptor may also want to be able to do this in scenarios where there will be no developer toolbox available (e.g. Fennec and Geckoview), and the devtools API namespace is only available to extension contexts that only exists with an opened developer toolbox (e.g. the extensions devtools pages are instantiated when a developer toolbox is opened on a tab, and then automatically destroyed when the toolbox is closed).

On the contrary, if the extension implements its own Experiment API, the new custom API method can be made available to the extension's background page and be available independently from a developer toolbox or any requirement of user interaction.
Flags: needinfo?(lgreco)
Alright, trying to figure out what's going on here. It could be that the tier2 compilation happens after a reload of the page, so I'd like to have us try a few things (I can't try them myself since I can't reproduce the issue in the first place).

    1. Does the out-of-memory situation happen if we run with the pref javascript.options.wasm_baselinejit set to false, javascript.options.wasm_ionjit set to true?
    2. Conversely, with the pref javascript.options.wasm_baselinejit set to true, javascript.options.wasm_ionjit set to false?
    3. Can we close the tab, wait 1 minute between each run and see if the out-of-memory situation occurs too? (I'm being told closing the tab could trigger a GC, and there's also the GC happening when the user is inactive)

Other than this, is there a virtual memory restriction on the machines running the benchmark? (hopefully they're not VMs but raw hardware, but this could be the case easily on VMs)

Any chance this API (gc()) could be used? https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Mochitest/SpecialPowers#Garbage_Collection_APIs_2
got it, #1 is the winning option:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=06dbd8e71072899af7a7dac4ff338679da5dce12&newProject=try&newRevision=06dbd8e71072899af7a7dac4ff338679da5dce12&framework=10&filter=godot

we don't need those options for google chrome, but the good news is the noise is low and this runs everywhere.

here are the changes I have:
https://hg.mozilla.org/try/rev/9c5d400510db54b62f0bcd3ca55317b6fee8f6b9

:bbouvier, any concerns with moving forward?
:bebe, take a look at my changes vs yours- maybe you can ensure we clean this patch up and get it ready for review assuming :bbouvier is ok with it.
Flags: needinfo?(bebe)
Flags: needinfo?(bbouvier)
Attached patch 441577.patch (obsolete) — Splinter Review
Updated patch with changes suggested by :jmaher
Attachment #9018152 - Attachment is obsolete: true
Flags: needinfo?(bebe)
Attachment #9018974 - Flags: review?(rwood)
Attachment #9018974 - Flags: review?(jmaher)
Yes, let's not move forward with this solution; to be clear, I've suggested this to try to understand what was going on with the issue, so we can find a more proper fix. So I now think this is due to background compilations not being over when the next run starts.

In this case, I think a possible solution is to wait for some time between runs. Which amount of time depends on the machine that runs it. Can we try different values for waiting times between two runs (e.g. 1 minute, 2 minutes, up to 5 maybe)? Otherwise, is there a way to get a handle on the CPU activity to wait for it to be below a certain threshold?
Flags: needinfo?(bbouvier)
I don't think adding a bigger timeout would solve the issue. We already tried that. 

With a timeout of 2 min between cycles the test still fails somewhere around the 3'rd test.
Having a tests with a timeout this big is not a good practice.

Monitoring the CPU load also would not work. Looking in to TaskManager we see something like this:
https://www.screencast.com/t/fD814Gvzqj1K
Comment on attachment 9018974 [details] [diff] [review]
441577.patch

looking at closing the tab for each cycle instead of tweaking preferences.
Attachment #9018974 - Flags: review?(rwood)
Attachment #9018974 - Flags: review?(jmaher)
Attached patch 442304.patch (obsolete) — Splinter Review
Updated patch to open separate tab for each run
Attachment #9018974 - Attachment is obsolete: true
Attachment #9019018 - Flags: review?(jmaher)
Comment on attachment 9019018 [details] [diff] [review]
442304.patch

Review of attachment 9019018 [details] [diff] [review]:
-----------------------------------------------------------------

pretty close, a few nits- I will push to try and see how things look

::: testing/raptor/webext/raptor/runner.js
@@ +90,5 @@
>          results.subtest_lower_is_better = settings.subtest_lower_is_better === true;
>          results.alert_threshold = settings.alert_threshold;
>  
> +        if (settings.newtab_per_cycle !== undefined) {
> +          reuseTab = settings.newtab_per_cycle ;

nit- no space before the ;

@@ +242,5 @@
>        } else if (testType == "benchmark") {
>          isBenchmarkPending = true;
>        }
> +
> +      if (reuseTab != false) {

could we say:
if (reuseTab and testTabID != 0) {

that would save the nested if statements.

@@ +244,5 @@
>        }
> +
> +      if (reuseTab != false) {
> +        if (testTabID != 0) {
> +          // close previous test tab

nit: align the indentation of the comment and code

@@ +250,5 @@
> +            postToControlServer("status", "closing Tab " + testTabID);
> +
> +            // open new tab
> +            ext.tabs.create({url: "about:blank"});
> +            postToControlServer("status", "Open new tab");

is there a risk of the tab closing slower than the new tab opening and getting tab.id confused?

@@ +257,5 @@
> +      setTimeout(function () {
> +        postToControlServer("status", "update tab " + testTabID);
> +        // update the test page - browse to our test URL
> +          ext.tabs.update(testTabID, {url: testURL}, testTabUpdated);
> +          }, newTabDelay);

nit: make sure indentation is the same here as well.

@@ +448,5 @@
> +       setTimeout(function() { nextCycle(); }, postStartupDelay);
> +     } else {
> +       setTimeout(function() {
> +         ext.tabs.create({url: "about:blank"});
> +         nextCycle();}, postStartupDelay);

why are you adding nextCycle(); here?
Attachment #9019018 - Flags: review?(jmaher) → review-
Attached patch 442310.patch (obsolete) — Splinter Review
fixed the lint issues.
Updated the if statement.

id we had any issues with the tab close/open in try?
Attachment #9019317 - Flags: review?(jmaher)
Attachment #9019018 - Attachment is obsolete: true
Comment on attachment 9019317 [details] [diff] [review]
442310.patch

Review of attachment 9019317 [details] [diff] [review]:
-----------------------------------------------------------------

there is still a python lint failure:
$ ./mach lint --outgoing
ini-parser v0.0.2 needs to be installed locally.
htmlparser2 v3.9.2 needs to be installed locally.
sax v1.2.4 needs to be installed locally.
Installing eslint for mach using "c:\Users\elvis\.mozbuild\node\npm.cmd install --loglevel=error"...

ESLint and approved plugins installed successfully!

NOTE: Your local eslint binary is at c:/Users/elvis/mozilla-inbound\node_modules\.bin\eslint

the MozReview service has been disabled; stop loading the reviewboard/mozreview extension from your hgrc files to make this warning go away
c:\Users\elvis\mozilla-inbound\testing\raptor\raptor\manifest.py
  107:100  error  line too long (100 > 99 characters)  E501 (flake8)

? 1 problem (1 error, 2 warnings)

elvis@JMAHER-WIN10 ~/mozilla-inbound

::: testing/raptor/webext/raptor/runner.js
@@ +177,5 @@
>  
> +function testTabRemoved(tab) {
> +  postToControlServer("status", "Removed tab " + testTabID);
> +  testTabID = 0;
> +}

nit: add a space between functions
Attachment #9019317 - Flags: review?(jmaher) → review-
Attached patch 442311.patchSplinter Review
Hopefully this is OK
Attachment #9019317 - Attachment is obsolete: true
Attachment #9019684 - Flags: review?(jmaher)
Comment on attachment 9019684 [details] [diff] [review]
442311.patch

Review of attachment 9019684 [details] [diff] [review]:
-----------------------------------------------------------------

looks good locally and on try
Attachment #9019684 - Flags: review?(jmaher) → review+
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b732bad1fa11
Can we run the Godot Engine wasm benchmark in automation? r=jmaher
Keywords: checkin-needed
Attachment #9017784 - Attachment is obsolete: false
Thanks for moving this forward! One uber-nit remark I've had just seeing the code is that the correct casing is Godot, not GoDot (but it's not an issue, as far as I know).

I forgot to ask: can we run this in Firefox with the other configurations too (only wasm baseline, only wasm ion), as we do for the wasm-misc benchmark, please?

I've also opened https://github.com/mozilla-frontend-infra/firefox-performance-dashboard/issues/99 for the dashboard counterpart.
https://hg.mozilla.org/mozilla-central/rev/b732bad1fa11
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
odd, this appears to have resulted in google-slides/sheets improving:
== Change summary for alert #17131 (as of Wed, 24 Oct 2018 20:28:15 GMT) ==

Improvements:

 10%  raptor-google-sheets-firefox osx-10-10 opt       1,819.69 -> 1,630.45
 10%  raptor-google-slides-firefox osx-10-10 opt       4,172.38 -> 3,749.95
  7%  raptor-google-slides-firefox linux64-qr opt      1,517.56 -> 1,407.31

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=17131

maybe we run this with a new tab on accident?
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #39)

> maybe we run this with a new tab on accident?

Hmmm strange, no just the one tab according to the raptor webext logs:

10:17:59     INFO -  127.0.0.1 - - [24/Oct/2018 10:17:59] "POST / HTTP/1.1" 200 -
10:17:59     INFO -  raptor-control-server received webext_status: opened new empty tab 2
10:18:00     INFO -  127.0.0.1 - - [24/Oct/2018 10:18:00] "POST / HTTP/1.1" 200 -
10:18:00     INFO -  raptor-control-server received webext_status: begin pagecycle 1
10:18:01     INFO -  127.0.0.1 - - [24/Oct/2018 10:18:01] "POST / HTTP/1.1" 200 -
10:18:01     INFO -  raptor-control-server received webext_status: update tab 2
10:18:01     INFO -  127.0.0.1 - - [24/Oct/2018 10:18:01] "POST / HTTP/1.1" 200 -
10:18:01     INFO -  raptor-control-server received webext_status: test tab updated 2
so this is odd, running locally I see a "sessionrestore" tab and test tab- is that what we see here?  But we have improvements- this makes the numbers better, but does it make the numbers more stable- might be worth looking into a bit more.
I don't see gdocs for osx for chrome, but I do see that our numbers appear more stable :)
You need to log in before you can comment on or make changes to this bug.