Open Bug 1506865 Opened 6 years ago Updated 2 years ago

Convert raptor-unity-webgl to a pageload test

Categories

(Testing :: Raptor, task, P3)

Version 3
task

Tracking

(Not tracked)

People

(Reporter: luke, Unassigned)

References

Details

It's great to have a Unity WebGL workload on Raptor!  Poking around at it recently, I noticed however that it's the asm.js version.  Unity recently switched to output wasm by default so it'd be great to update to that.  The most recent version I think can be taken from this zip:
  https://files.unity3d.com/marcot/benchmarks2018.zip
which is the version described in this blog post:
  https://blogs.unity3d.com/2018/09/17/webassembly-load-times-and-performance/

Thanks!
:luke, this is primarily running to support android- will wasm be supported on android and do you have concerns with this running on geckoview/android?

Otherwise, we could have the old version for android and the new version for desktop or ugl-w for distinction.
Flags: needinfo?(luke)
wasm is supported on Android (ARM32 and ARM64) so I think you can simply run it there too.

Just in case it's relevant, some Android apparently have broken signal handling, which disables wasm:
  https://hg.mozilla.org/mozilla-central/file/tip/js/src/wasm/WasmSignalHandlers.cpp#l746
However I'd be surprised (and quite eager to learn) if this was the case for our automation devices.
Flags: needinfo?(luke)
awesome, I put it on our next batch of work.
Blocks: 1503315
Priority: -- → P2
See Also: → 1517830
See Also: → 1499253
See Also: → 1518882

So Raptor gets the current unity-webgl benchmark source from a fetch task that pulls it from our perf-auto repo [0]. So this would involve updating that source with the source in the Description instead, and doing a github benchmark source release. Then the fetch task itself needs to be updated to use that new release (see the perf-automation github repo readme [1] for guidance) and then push that to try and see how the new benchmark runs on all the platforms.

[0] https://github.com/mozilla/perf-automation/tree/master/benchmarks/unity-webgl

[1] https://github.com/mozilla/perf-automation/tree/master/benchmarks

Blocks: 1531441

Alex, can you start on this please? Ping me on IRC if you have questions, thanks!

Flags: needinfo?(alexandru.ionescu)
Assignee: nobody → alexandru.ionescu
Status: NEW → ASSIGNED
Priority: P2 → P1

(In reply to Robert Wood [:rwood] from comment #6)

Alex, can you start on this please? Ping me on IRC if you have questions, thanks!

On it!

Flags: needinfo?(alexandru.ionescu)
Blocks: 1524545
Depends on: 1466648

Thanks Alex for your work on this so far. Setting this to P2 because of other priorities - when I'm back from PTO we will pick this up again and I'll help you get this landed.

Priority: P1 → P2
Flags: needinfo?(rwood)

(In reply to Cristina Coroiu [:ccoroiu] from comment #9)

Robert, please take a look at https://bugzilla.mozilla.org/show_bug.cgi?id=1524545#c37

Thanks - yes this failure will continue until we upgrade this benchmark source.

:davehunt, do we want to make this a priority? If we can't right now then I suggest we demote this to tier3 to save the sheriff's time?

Flags: needinfo?(rwood) → needinfo?(dave.hunt)

Rob, I can do the updates if needed.

(In reply to Robert Wood [:rwood] from comment #10)

:davehunt, do we want to make this a priority? If we can't right now then I suggest we demote this to tier3 to save the sheriff's time?

Yes, I think we have the bandwidth to pick this up.

Flags: needinfo?(dave.hunt)

I will land the fennec/fenix stuff ASAP and start working on this.

Priority: P2 → P1

FYI, as an example of adding a benchmark, have a look at :marauder's perf-automation pull-request for adding Jetstream2:

https://github.com/mozilla/perf-automation/pull/20

More specifically, where he added so that it will auto-run when the URL 'raptor' parameters is included, and how the results are posted to Raptor:

https://github.com/mozilla/perf-automation/pull/20/commits/5d75a80e2df3b0e97cd5a6afb6e99ec7821fa631

Flags: needinfo?(alexandru.ionescu)

Having a quick look through the new unity-webgl-wasm source (bencharmk2018.zip in Description) I'm guessing we would need to modify 'UnityLoader.js'.

(In reply to Robert Wood [:rwood] from comment #15)

Having a quick look through the new unity-webgl-wasm source (bencharmk2018.zip in Description) I'm guessing we would need to modify 'UnityLoader.js'.

Or we may need to do something like the current unity-webgl benchmark does - it has a custom 'mozbench.js' [0] which is called in the benchmark's main 'index.html' [1]:

[0] https://github.com/mozilla/perf-automation/blob/master/benchmarks/unity-webgl/Data/mozbench.js

[1] https://github.com/mozilla/perf-automation/blob/master/benchmarks/unity-webgl/index.html

Note: When adding the new benchmark source to the perf-automation repo we should probably call it /benchmarks/unity-webgl-wasm or maybe /benchmarks/unity-webgl-2018 so we don't touch the existing benchmark in case we ever want to revert back.

Thanks rwood!

Flags: needinfo?(alexandru.ionescu)

(In reply to Robert Wood [:rwood] from comment #16)

Or we may need to do something like the current unity-webgl benchmark does - it has a custom 'mozbench.js' [0] which is called in the benchmark's main 'index.html' [1]:

[0] https://github.com/mozilla/perf-automation/blob/master/benchmarks/unity-webgl/Data/mozbench.js

[1] https://github.com/mozilla/perf-automation/blob/master/benchmarks/unity-webgl/index.html

Note: When adding the new benchmark source to the perf-automation repo we should probably call it /benchmarks/unity-webgl-wasm or maybe /benchmarks/unity-webgl-2018 so we don't touch the existing benchmark in case we ever want to revert back.

Let me put it another way: there are 2 main differences between unity-webgl and unity-webgl-wasm: they share the results differently and they start differently. unity-webgl start its tests automatically, without human intervention. unity-webgl-wasm has a START button in the index page. I asked Marian to help me find it, I thought there something I am missing considering he already gone trhough this with his benchmark, but he couldn't. So, as far as I could research, that start button is found inside one of the .unityweb files in the Build directory.
Here is a video capture, maybe it helps you picture better what I am trying to say: [I will post it as soon as it's ready]

(In reply to Alexandru Ionescu :alexandrui from comment #18)

Let me put it another way: there are 2 main differences between unity-webgl and unity-webgl-wasm: they share the results differently and they start differently. unity-webgl start its tests automatically, without human intervention. unity-webgl-wasm has a START button in the index page. I asked Marian to help me find it, I thought there something I am missing considering he already gone trhough this with his benchmark, but he couldn't. So, as far as I could research, that start button is found inside one of the .unityweb files in the Build directory.
Here is a video capture, maybe it helps you picture better what I am trying to say: [I will post it as soon as it's ready]

Hey Alex, yes I understand - I ran the new source locally and was also trying to find out the source of the start button. As I mentioned Comment 16 above you may need to hack 'unityloader.js' or add our own content as an example in 'mozbench.js'.

Hi :luke, as the requestor of having this test in automation, are you able to help Alex? As noted above we need to be able to hack the updated benchmark source to receive a '?raptor' url parameter that will a) autostart the benchmark, and b) post the results to the Raptor runner when finished. Would you be able to help Alex do this with the updated source in 'benchmarks2018.zip'? I also had a look and it's not a straightforward task - your help would be greatly appreciated :)

Flags: needinfo?(luke)
Flags: needinfo?(alexandru.ionescu)

Perhaps Benjamin could help here?

Flags: needinfo?(luke) → needinfo?(bbouvier)

Duh, the start button is created inside the canvas where everything is rendered, so it's not easily actionable. I tried to see if the JS code mentioned "window.location" or "location.hash" or anything like that (to extract query parameters), but I didn't find anything.

I bet that when clicking on the button, there might be a wasm function that's called; if we could identify which one, we could probably just call it from the outside, and that would be enough. Unfortunately Unity doesn't help by using a proprietary format to compress their assets, so we first need to decompress those. (Alternatively, we could add hooks into the JS engine that record the wasm content on disk, but that's more work.)

What's the priority of this is / what is the expected value? In particular, how much time would we like to spend on this before calling this too complicated? Is there a shortcut we can take by leveraging our contacts at Unity, see if they can provide a benchmark that's simpler to interact with, using query parameters? (I guess this has been made deliberately hard to interact with the page, because of a "trial" text at the bottom right of the rendered canvas)

Flags: needinfo?(bbouvier) → needinfo?(luke)

It's hard to say the priority, but we currently have a Unity asm.js workload in our test harness, which seems a bit silly.

Instead of programmatically finding the button, can we just synthesize a click event at a particular X,Y coordinate? I expect the button's position is stable.

Flags: needinfo?(luke)
Flags: needinfo?(alexandru.ionescu)

(In reply to Luke Wagner [:luke] from comment #22)

Instead of programmatically finding the button, can we just synthesize a click event at a particular X,Y coordinate? I expect the button's position is stable.

Simulating the click is probably possible. I've got WIP code that tries to do this here, but I've reached the limits of my DOM knowledge, and the code doesn't work. The click events I'm creating don't seem to reach the Unity handler. Debugging tools tell me that there's a mouseover listener on the canvas, but trying many event types doesn't seem to do anything (even if I click on every 10 x 10px square within the canvas, using the TEST_MODE). Our debugging tools don't allow me to see the handlers in the JS code eval'd from the blob decompressed from the unityweb files, and Chromium shows them, but doesn't let me set breakpoints. So there's no way to see if Unity catches the event.

Thinking a bit more about it, the scores are themselves rendered within the canvas, and I'm unsure we can directly catch them in JS (the Unity JS code doesn't show any mentions of "score" or "result" related to the benchmark scores). Unless we use an OCR library, which would be kinda insane, I'm not sure this is the best way forward (there's no way to know the benchmark is done running, for one thing, so we'd need to do a canvas snapshot at regular intervals, which would have an effect on the benchmark score itself). Scores are probably somewhere in the wasm memory too, but it seems really hard to find where.

Considering all of this, I think we should either ask Unity people for a benchmark that's easier to manipulate (start benchmark automatically, if needed enable/disable some subbenchmarks, get individual and overall scores), or we should remove it from raptor.

Flags: needinfo?(luke)

(Opened bug 1563512 for the ability to see JS code from revoked blobs in devtools)

I would suggest posting in https://forum.unity.com/forums/webgl.84/ to ask for advice on automatically running the benchmark and extracting results.

Actually, this benchmark had paged out of my memory enough that I had forgotten that the button and canvas benchmark numbers are irrelevant. The whole point of this benchmark (as blogged) is to measure load/init time, and that is done just by loading the benchmark and printed to the DOM in the window on the bottom left (all before the big button is pressed).

So all we need to do to run the benchmark is navigate to the page and all we need to do to harvest the numbers is scrape them from the DOM of the results window in the bottom left.

Btw, it looks like the benchmark has been updated; the current best link/.zip pointed to by the blog post are:

Sound actionable Alexandru?

Flags: needinfo?(luke) → needinfo?(alexandru.ionescu)

I am not sure I am following. The current webgl benchmark is measuring the loadtime and the scores of each subtest (which can be achieved by pressing that start button). If we don't get past that button, in comparison with the previous version of out benchmark we'll have only the load time. Also, IIRC we don't use metrics like TTFI on benchmarks. Am I missing something?

Flags: needinfo?(rwood)
Flags: needinfo?(luke)
Flags: needinfo?(alexandru.ionescu)

We currently don't have support in Raptor to scrape results from pages. Typically for benchmark tests we actually run the benchmark source and have the results/scores reported from the benchmark to Raptor. If this just needs to be loaded, maybe it can be a page-load test instead of a benchmark? i.e. Like our existing TP6 suites, just load the page (which happens to be a benchmark, which isn't actually run) and measure the fcp, non-blank-paint, loadtime, etc.

Flags: needinfo?(rwood)

(In reply to Robert Wood [:rwood] from comment #28)

We currently don't have support in Raptor to scrape results from pages. Typically for benchmark tests we actually run the benchmark source and have the results/scores reported from the benchmark to Raptor. If this just needs to be loaded, maybe it can be a page-load test instead of a benchmark? i.e. Like our existing TP6 suites, just load the page (which happens to be a benchmark, which isn't actually run) and measure the fcp, non-blank-paint, loadtime, etc.

Bebe, you said something about this that can't be done when I told you about this scenario?

Flags: needinfo?(fstrugariu)

(In reply to Alexandru Ionescu :alexandrui from comment #27)
That's right; I'm proposing that, for the new, wasm-ified test, we ignore the subtest results and focus on the loadtime data (which actually stresses execution, not just compile time, because the init workload is significant). The benchmark subtests (that run when you click the button) mostly test WebGL perf, not asm.js/wasm anyway. If we want to keep testing WebGL perf, we could of course keep the current version. But if we're measuring load-time, it's not valuable to measure asm.js, only wasm.

Flags: needinfo?(luke)

I'm not sure if these times are the same but we can test this.

Let's run a test with the benchmark as source and measure the TP6 metrics. We can compare these results with the ones of the benchmark and see if they are the same.

Flags: needinfo?(fstrugariu)

What's the status of this? :luke is this something you'd have some time to work on if we provided you with support? If not, could you say if you think we should continue to consider this a P1?

Flags: needinfo?(luke)

I think the load-time tests (described in comment 30) would be quite valuable, much much moreso than the current asm.js version. The wasm team is pretty heads-down on CraneLift atm, so I don't think we have time. I don't know exactly what a P1 means (vs. P2 etc), but I think this task shouldn't be too hard (with the clarifications in comment 30) with someone familiar with the perf testing infrastructure.

Separately, I think we should just remove the current asm.js raptor-unity-webgl test since, as mentioned above, what it's testing is not very relevant to wasm.

Flags: needinfo?(luke)

Alex can you take a new look over this

Flags: needinfo?(alexandru.ionescu)

Following what rwood said, measuring the pageload is actually losing the scope of the test and we have tp6 test more relevant for this. As we don't currently support harvesting results from DOM (and the main intense focus right now for raptor is migrating to browser time), I would close this as WONT DO, but we can come back anytime in a more suitable moment and resume the work.
But I won't close this and I will leave rwood decide what we should do with the current webgl tests.

Flags: needinfo?(alexandru.ionescu) → needinfo?(rwood)

Based on the comments from :luke let's convert this test from a benchmark to a pageload test, or at least modify the benchmark to use a value available in the DOM after pageload rather than executing the benchmark. I'm setting this to a P3 as we're not actively working on this, but feel free to increase the priority if this becomes more valuable,. We're happy to provide support for anyone that might be available to work on this.

Status: ASSIGNED → NEW
Type: defect → task
Flags: needinfo?(rwood)
Priority: P1 → P3
Summary: Could raptor-unity-webgl be updated to wasm? → Convert raptor-unity-webgl to a pageload test
Assignee: aionescu → nobody

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.