Closed Bug 1559592 Opened 1 year ago Closed 5 months ago

Replace drawWindow with new Fission compatible drawSnapshot API

Categories

(Testing :: Marionette, defect, P1)

Version 3
defect

Tracking

(Fission Milestone:M4, firefox70 fixed)

RESOLVED FIXED
mozilla70
Fission Milestone M4
Tracking Status
firefox70 --- fixed

People

(Reporter: ato, Assigned: whimboo)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(7 files)

https://bugzilla.mozilla.org/show_bug.cgi?id=1499845 replaces
drawWindow, which is not compatible with Fission, with a new API
for taking screenshots (yet to be defined). Marionette needs to be
updated to use this.

Priority: -- → P3
See Also: → 1434313
Attached patch marionette.diffSplinter Review

This is an example patch that somewhat works with Jonathan Watt's web crawler.

The new API has landed as a member of WindowGlobalParent, and supports passing a rectangle, or null to capture the whole viewport.

You should be able to get a WindowGlobalParent for the chrome context, as well as for each tab content context, so it should be possible to have the same code for both.

I think you could use the listener to convert element references into bounding rectangles (if it's a content context), and then pass those plus the relevant window global into capture.js (always in the parent).

Depends on: 1499845
See Also: → 1563746

:whimboo can you see about getting this part finished as a priority to make sure we don't block any fission work

Assignee: nobody → hskupin
Flags: needinfo?(hskupin)

Sure, I will get started soon.

Status: NEW → ASSIGNED
Flags: needinfo?(hskupin)
Priority: P3 → P1
Attached file marionette_testcase.py

Attached you can find a testcase for Marionette which takes a screenshot in chrome or content context and Fission enabled (requires a restart after setting the pref).

Note that we also already fail - that without Fission - to capture the tabs content when taking a screenshot in chrome context.

Depends on: 1569930
Depends on: 1569974
Fission Milestone: --- → M4
Depends on: 1570147

Matt, I have a question in regards the flags for drawWindow() which are in use right now, but won't be supported in drawSnapshot(). Here an example of the wpt-reftests:

        ctxInterface.DRAWWINDOW_DRAW_CARET |
        ctxInterface.DRAWWINDOW_DRAW_VIEW |
        ctxInterface.DRAWWINDOW_USE_WIDGET_LAYERS;

Can I basically completely get rid of those or would we have to add a flags argument to drawSnapshot to cover those customizations?

Flags: needinfo?(matt.woodrow)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #5)

Matt, I have a question in regards the flags for drawWindow() which are in use right now, but won't be supported in drawSnapshot(). Here an example of the wpt-reftests:

        ctxInterface.DRAWWINDOW_DRAW_CARET |
        ctxInterface.DRAWWINDOW_DRAW_VIEW |
        ctxInterface.DRAWWINDOW_USE_WIDGET_LAYERS;

Can I basically completely get rid of those or would we have to add a flags argument to drawSnapshot to cover those customizations?

So, drawWindow+DRAWWINDOW_USE_WIDGET_LAYERS is an entirely different code path to the standard drawWindow. That's requesting a readback of the pixels currently in the compositor/window, and is already fission compatible (since the compositor can draw remote iframes).

drawSnapshot is a replacement for the other drawWindow cases where you're drawing things that aren't exactly what's on screen. The drawWindow code path for that is broken with fission (only captures current process), whereas drawSnapshot can collect content across process boundaries.

So you'd want to leave those callers as is for now!

In the future I'd like to add a new API that's explicitly just for the compositor readback case (readbackSnapshot?), and deprecate drawWindow. No rush on that though.

Flags: needinfo?(matt.woodrow)

So this actually makes it a bit more complicated to get our code updated. But thankfully the only consumer of the DRAWWINDOW_USE_WIDGET_LAYERS flag is the reftest.js file in Marionette:

https://searchfox.org/mozilla-central/rev/38c88cbf4be87dfa0636d15a5d3599a8ea0d1a72/testing/marionette/reftest.js#514-518

James, given that you implemented this patch you could shed some light on it, especially why it's needed? And also if we still need all of those flags set?

If we have to keep it I would suggest the following:

  1. Move any canvas and drawing related code out of framescript (listener.js) into the parent process (driver.js), and only keep code which determines the region to capture, and to collect all the rectangles for elements to highlight. Those values we can easily transfer back. As Matt mentioned drawWindow() will only work on the parent process if the DRAWWINDOW_USE_WIDGET_LAYERS flag is set. So I would temporarily patch that in a commit to ensure that capturing the document works in all the cases.

This also helps us with future work for Fission, because we have lesser code in the framescript, and let the parent do most of it.

  1. Refactor the canvas.js code to make use of drawScreenshot() but only in those cases where DRAWWINDOW_USE_WIDGET_LAYERS hasn't been specified. If that flag is set, we have to keep the call to drawWindow().

I think that with those changes it should be possible to make it work. I'm only a bit anxious in using this new API because not a single test exists yet. So we clearly have to ensure that all the code paths we hit with Marionette have proper wdspec tests.

Flags: needinfo?(james)

The short answer is "we use DRAWWINDOW_USE_WIDGET_LAYERS because that's what the reftest harness does; c.f. https://searchfox.org/mozilla-central/source/layout/tools/reftest/reftest.jsm#840 ". But the longer answer is that requesting a readback of the pixels in the compositor makes sense for testing use cases since you want the result to be exactly what's shown on screen and not involve re-rendering to a different buffer which may change the results of the test.

Flags: needinfo?(james)

Ok, so I would say that for reftests we do not depend on the canvas.js implementation but instead have those parts duplicated which are really necessary. As such it will be independent from the rest of Marionette's capturing feature, and can be updated once Matt got around to implement the readbackSnapshot() or such API.

This moves all the screenshot related code from the framescript to
the parent process, so that canvas.js is no longer called from
within a content process.

The remaining code in the framescript is only needed to compute
the dimensions of the screenshot, and from all the to highlight
elements.

This move is necessary to allow switching to the new drawSnapshot
API which only works from within the parent process.

No longer blocks: 1377335
Depends on: 1377335
Depends on: 1571341

Matt, as discovered when trying to create a snapshot of a huge document, the drawSnapshot API has a maximum allowed width and height of 32767 pixels. In Marionette we already reduce the specified dimensions if those are larger than this, and it was working fine for drawWindow. But with drawSnapshot I get an exception now. So I wonder if that is expected, and the dimension should not exceed the maximal skia dimension, or if the new API should not throw? Maybe it internally uses a buffer or canvas like element which is based on the same size restrictions?

Flags: needinfo?(matt.woodrow)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #11)

Matt, as discovered when trying to create a snapshot of a huge document, the drawSnapshot API has a maximum allowed width and height of 32767 pixels. In Marionette we already reduce the specified dimensions if those are larger than this, and it was working fine for drawWindow. But with drawSnapshot I get an exception now. So I wonder if that is expected, and the dimension should not exceed the maximal skia dimension, or if the new API should not throw? Maybe it internally uses a buffer or canvas like element which is based on the same size restrictions?

Yeah we're still limited by the max size skia can draw, which is 32767, so passing a rect bigger than that will result in the promise being rejected.

Flags: needinfo?(matt.woodrow)
Depends on: 1571769
Depends on: 1572448

I would like to verify those changes also with Wd jobs, but currently --setpref fission.autostart=true is not forwarded to the pytestrunner, and as such the jobs always run without enabling fission! See bug 1572448.

Now with all important dependencies fixed, I should have a working patch including more wdspec tests today. Means that we should be able to switch to the new API soon.

Blocks: 1377335
No longer depends on: 1377335
No longer blocks: 1574155
Depends on: 1574155

I'm currently waiting for some feedback from Matt in how to use the new optional flags option for drawSnapshot(). I will continue on this bug once I got the details.

First full try build for Marionette, wdspec, and wpt-reftests with non-fission and fission jobs:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1e793f25d3dbc21e12d70fa7aa26147ae540fa6e

Note that I expect failures for fission until bug 1559841 has been fixed.

As expected there are a lot of failures with Fission enabled. Most of them for Mn jobs are known, and are actually bugs in Firefox. The Android tests are broken because I missed to skip the changed Mn screenshot test for Android. With the next try build it should be green. Also the permafailing screenshot test TestScreenCaptureContent.test_capture_flags I skipped over on bug 1330560 but only on Windows. Other platforms are fine. I will investigate once this patch has been landed.

Wr (reftests) looks more broken, but that also happens without my patches. I filed bug 1574508.

Here a new try build for Mn jobs on Windows and Android:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3813dd87ba4933bf333ae33d0dc5fe0ab2e2aa43

So we have unit tests for Marionette which check for some capture flags specifically for the caret:

https://searchfox.org/mozilla-central/rev/5912f376ab6a17afcba2b7654586013158ed64b5/testing/marionette/harness/marionette_harness/tests/unit/test_screenshot.py#309

Given that we do not support this flag in the new API, the tests at least started permafailing on Android. I don't know why this doesn't is the case for MacOS and Linux, and causes intermittent failures on Windows. It looks kinda inconsistent:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3813dd87ba4933bf333ae33d0dc5fe0ab2e2aa43&selectedJob=262078661

Matt, what would prevent us to add this flag? And also why do we see those different behaviors when using the API across platforms?

Flags: needinfo?(matt.woodrow)
Blocks: 1487124

The problem with test_capture_flags before the change should be that it fails because the caret is not always present, and as such causes an identical image to the reference image. Maybe we should simply remove this test?

Also there is another test failure for TestScreenCaptureContent.test_capture_vertical_bounds on Windows, where canvas.toDataURL() fails with a generic exception: UnknownException: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE).

https://searchfox.org/mozilla-central/rev/5912f376ab6a17afcba2b7654586013158ed64b5/testing/marionette/harness/marionette_harness/tests/unit/test_screenshot.py#285

Here the code:
https://searchfox.org/mozilla-central/rev/5912f376ab6a17afcba2b7654586013158ed64b5/dom/html/HTMLCanvasElement.cpp#725-770

I don't know what could be wrong here given that the call to drawSnapshot didn't raise any error. So maybe there is some kind of invalid data? But given the documentation on MDN only a security error is allowed to be raised.

Olli, any idea what this could be related to? Thanks

Flags: needinfo?(bugs)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #22)

So we have unit tests for Marionette which check for some capture flags specifically for the caret:

https://searchfox.org/mozilla-central/rev/5912f376ab6a17afcba2b7654586013158ed64b5/testing/marionette/harness/marionette_harness/tests/unit/test_screenshot.py#309

Given that we do not support this flag in the new API, the tests at least started permafailing on Android. I don't know why this doesn't is the case for MacOS and Linux, and causes intermittent failures on Windows. It looks kinda inconsistent:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3813dd87ba4933bf333ae33d0dc5fe0ab2e2aa43&selectedJob=262078661

Matt, what would prevent us to add this flag? And also why do we see those different behaviors when using the API across platforms?

Given that the test is just checking that the two aren't equal, I think we're usually passing because we have a focus ring drawn (which happens even without the caret flag). We don't draw focus rings on android, so that needs the caret to be different.

Is showing the caret behaviour you really want, or just something that this unit test happens to sometimes rely on?

Flags: needinfo?(matt.woodrow)

(In reply to Matt Woodrow (:mattwoodrow) from comment #25)

Given that the test is just checking that the two aren't equal, I think we're usually passing because we have a focus ring drawn (which happens even without the caret flag). We don't draw focus rings on android, so that needs the caret to be different.

Gotcha. That would explain the behavior on Android. Thanks!

Is showing the caret behaviour you really want, or just something that this unit test happens to sometimes rely on?

Nothing in the WebDriver spec particularly refers to the caret when doing screenshots. As such I would say we simply remove this test.

On Android there is no focus ring drawn, and in general it depends
on the visibility of the caret, which constantly blinks. As such
intermittent and perma failures can always occur. Best is to just
remove this flaky test.

Depends on D42372

Depends on: 1574935

(In reply to Olli Pettay [:smaug] from comment #28)

Where is the code calling toDataURL?

We will continue over on bug 1574935. For now we will temporarily skip the test.

Flags: needinfo?(bugs)

Matt, there is one question still unanswered, which is about the flag for DRAW_WINDOW. How needs this to be set? When I call drawSnapshot the following way on mozilla.org, it doesn't matter which value I pass, a full screenshot is always created based on the top-left coordinates of the document, but not the current viewport (after scrolling).

var rect = new window.DOMRect(0, 300, 1280, 2062);
var flag = 0;  // 2, 4 make no difference
let snapshot = await windowGlobal.drawSnapshot(rect, 2.0, "white", flag);

As I thought the flag should indicate the reference of the top-left corner for the screenshot?

Flags: needinfo?(matt.woodrow)

I didn't end up adding the flag, since all callers of drawWindow that used the DRAW_VIEW flag also used USE_WIDGET_LAYERS (which isn't applicable here).

The flag would also enable drawing of scrollbars, and clips what you get drawn to just the viewport area. Is that something you want/need, that isn't covered by using drawWindow(DRAW_VIEW|USE_WIDGET_LAYERS)?

If there is a hybrid case where you want the scroll position applied, but not the rest of the USE_WIDGET_LAYERS readback, then you could just manually apply scrollTop to your rect. I'd be interested in what that use-case is though.

My hope is that we can have snapSnapshot be exclusively for drawing documents (where the concept of a viewport and scroll position doesn't exist), and drawWindow(DRAW_VIEW|USE_WIDGET_LAYERS) (to be replaced/renamed readbackWindow) be exclusively for capturing the current viewport as seen by the user (with scroll position and scrollbars included).

Flags: needinfo?(matt.woodrow)

(In reply to Matt Woodrow (:mattwoodrow) from comment #31)

I didn't end up adding the flag, since all callers of drawWindow that used the DRAW_VIEW flag also used USE_WIDGET_LAYERS (which isn't applicable here).

I'm confused now. So what has been added with https://hg.mozilla.org/mozilla-central/rev/5b8583d3aa37? Isn't it exactly that flag?

The flag would also enable drawing of scrollbars, and clips what you get drawn to just the viewport area. Is that something you want/need, that isn't covered by using drawWindow(DRAW_VIEW|USE_WIDGET_LAYERS)?

Given that this bug is about moving to drawSnapshot (lets make it clear by updating the summary), and to have a Fission compatible API. Marionette itself will only use drawWindow for reftests where it is important to have everything drawn as the real output. Otherwise drawSnapshot() would be used. And for that I don't get the flag working.

Take the mozilla.org page as example. When the page has been scrolled down (maybe half) there are two options for us:

  1. Get a screen capture of the current viewport only (which is the default for WebDriver)
  2. Request a full screen capture (whole rect of documentElement)

Whatever I specify as flag here, the screen capture I get is always the full document which is clipped by the top/left values as specified by the rect input. What would have to be specified to ONLY get the screen capture of the current viewport?

If there is a hybrid case where you want the scroll position applied, but not the rest of the USE_WIDGET_LAYERS readback, then you could just manually apply scrollTop to your rect. I'd be interested in what that use-case is though.

I feel that we talk about different ways taking a screenshot - drawWindow vs. drawSnapshot. Maybe I'm wrong, or still fail to understand the flag as being added in the above mentioned commit.

My hope is that we can have snapSnapshot be exclusively for drawing documents (where the concept of a viewport and scroll position doesn't exist), and drawWindow(DRAW_VIEW|USE_WIDGET_LAYERS) (to be replaced/renamed readbackWindow) be exclusively for capturing the current viewport as seen by the user (with scroll position and scrollbars included).

Hm, so it means I should change Marionette to still use drawWindow(), especially when a capture of the viewport is requested? Would that mean the additional flag as been added above was not needed?

Flags: needinfo?(matt.woodrow)
Summary: Replace drawWindow with new Fission compatible screen capture API → Replace drawWindow with new Fission compatible drawSnapshot API

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #32)

I'm confused now. So what has been added with https://hg.mozilla.org/mozilla-central/rev/5b8583d3aa37? Isn't it exactly that flag?

There are no changes to the .webidl file there, so there's no new flags parameter added to the JS-visible API. There are some internal changes to interpret coordinates relative to the document by default (using internal flags).

Take the mozilla.org page as example. When the page has been scrolled down (maybe half) there are two options for us:

  1. Get a screen capture of the current viewport only (which is the default for WebDriver)
  2. Request a full screen capture (whole rect of documentElement)

Whatever I specify as flag here, the screen capture I get is always the full document which is clipped by the top/left values as specified by the rect input. What would have to be specified to ONLY get the screen capture of the current viewport?

What is the existing code for doing this doing? Is this capture.viewport?

Looks like that code is doing the equivalent of DOMRect(win.pageXOffset, win.pageYOffset, win.innerWidth, win.innerHeight), and that should continue to work with drawSnapshot.

Hm, so it means I should change Marionette to still use drawWindow(), especially when a capture of the viewport is requested? Would that mean the additional flag as been added above was not needed?

Any code currently using drawWindow(USE_WIDGET_LAYERS) should remain unchanged, the rest should be able to switch to drawSnapshot (without flags, and using the same rects as they currently are).

Flags: needinfo?(matt.woodrow)

(In reply to Matt Woodrow (:mattwoodrow) from comment #33)

There are no changes to the .webidl file there, so there's no new flags parameter added to the JS-visible API. There are some internal changes to interpret coordinates relative to the document by default (using internal flags).

Ok, so that explains why I do NOT see any differences when adding such a flag to the call for drawSnapshot(). :)

Take the mozilla.org page as example. When the page has been scrolled down (maybe half) there are two options for us:

  1. Get a screen capture of the current viewport only (which is the default for WebDriver)
  2. Request a full screen capture (whole rect of documentElement)

Whatever I specify as flag here, the screen capture I get is always the full document which is clipped by the top/left values as specified by the rect input. What would have to be specified to ONLY get the screen capture of the current viewport?

What is the existing code for doing this doing? Is this capture.viewport?

The current capture code is here:
https://searchfox.org/mozilla-central/rev/03853a6e87c4a9405fce1de49e5d03b9e7a7a274/testing/marionette/capture.js#150-164

And indeed when wanting to capture the viewport we take the current scroll position into account:
https://searchfox.org/mozilla-central/source/testing/marionette/capture.js#74

Looks like that code is doing the equivalent of DOMRect(win.pageXOffset, win.pageYOffset, win.innerWidth, win.innerHeight), and that should continue to work with drawSnapshot.

But that's what my patch is also doing. Hm, I will have to take another look.

Any code currently using drawWindow(USE_WIDGET_LAYERS) should remain unchanged, the rest should be able to switch to drawSnapshot (without flags, and using the same rects as they currently are).

Ok, I will keep you posted once I finished my observation from above. Thanks.

To give an update for bug 1574935 (exception thrown when calling toBase64()... This failure happens on Windows 7 32bit only, and is actually an OOM because the huge bitmap has two copies in memory (one for drawSnapshot, and another one for the canvas). By making sure to get rid of the snapshot after drawing it into the canvas the OOM condition is no longer triggered. I will get this added as a workaround until there is a real solution for bug 1574935.

Here the try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7032df66872996fcb04928d0d70b3102fbd8c570

Depends on: 1575511

Ok, I found the remaining case. Given that the DRAW_WINDOW flag is not present - as identified above - every position needs to be absolute. That wasn't the case for taking a screenshot from elements before, where it was enough to just use the getBoundingClientRect() value. Now we also have to take the current scroll position into account. The updated patches should fix those remaining issues, and hopefully should also produce a green try build.

Landing the patch has to wait for bug 1575511.

Depends on: 1559841

The landed patch on bug 1559841 didn't fix the remaining issue with the cross-origin iframe screenshot tests. But that shouldn't stop us from finally landing this patch series.

I just pushed a final try build to ensure that it is still fine before landing the patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=157c7c28423e5d258277194a20cf3d02dff5f2b8

Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dc7d1cfdaf58
[marionette] Change test_capture_full_area to capture an opened dialog. r=webdriver-reviewers,automatedtester
https://hg.mozilla.org/integration/autoland/rev/2ce935953dd0
[marionette] Remove unit tests for screenshot flags. r=webdriver-reviewers,ato
https://hg.mozilla.org/integration/autoland/rev/5fee1e8e06c6
[marionette] Take screenshots for content in parent process. r=webdriver-reviewers,ato,automatedtester
https://hg.mozilla.org/integration/autoland/rev/c4ce2527e763
[marionette] Make use of the new Fission compatible screen capture API. r=webdriver-reviewers,ato,automatedtester
https://hg.mozilla.org/integration/autoland/rev/e1415988eb6f
[wdspec] Add same and cross origin iframe tests for "Take (Element) Screenshot". r=webdriver-reviewers,automatedtester,ato
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/18700 for changes under testing/web-platform/tests
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Upstream PR merged by moz-wptsync-bot

== Change summary for alert #22875 (as of Wed, 28 Aug 2019 11:52:00 GMT) ==

Improvements:

0.43% Base Content JS windows10-64-shippable-qr opt 4,190,749.33 -> 4,172,914.67
0.39% Base Content JS windows10-64-shippable opt 4,189,242.67 -> 4,172,754.67
0.39% Base Content JS windows10-64-shippable-qr opt 4,189,429.33 -> 4,172,914.67

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=22875

I cannot see why this should give us any performance wins given that the changed code is Marionette only, and most likely isn't exercised by the code which runs the performance tests.

Flags: needinfo?(fstrugariu)

:whimboo not sure why this change. But the graph looks quite clear... I will do some retriggers to make sure we did not missed anything

As the results shown this bug has improved the performance of the Base Content JS tests. Not sure how they are connected though...

Flags: needinfo?(fstrugariu)

Which test suite is that exactly? Could you point me to that?

Flags: needinfo?(fstrugariu)
Flags: needinfo?(fstrugariu)

Ok, so that this a test for measuring the base memory consumption of a content process specifically for Fission. Given that AWSY is using Marionette which implies its framescript called listener.js is loaded in each content process, the changes from [1] might have resulted in this memory improvement.

Maybe this is the removal of the dependency to the capture.js module. Eric, would you agree?

[1] https://hg.mozilla.org/integration/autoland/rev/5fee1e8e06c6

Flags: needinfo?(erahm)

That sounds plausible, modifications to marionette have caused changes in the past. As long as it's stable (it looks like it is) this is fine. Reducing the overhead of the harness in general is a good thing as it gets closer to a realistic view of end-user memory usage. Comparing the memory reports it looks like script-sources went down along classes, shapes, strings, etc. Basically we're loading less JS.

Flags: needinfo?(erahm)

Great. So when we continue with Fission work, we will then see a couple of those improvements over time. Thanks Eric!

Depends on: 1595395
Depends on: 1600269
See Also: → 1571419
Regressions: 1606794
You need to log in before you can comment on or make changes to this bug.