Open Bug 1499845 Opened 11 months ago Updated Last month

Fission: Implement frontend API(s) for taking a screenshot of OOP iframes

Categories

(Core :: Graphics, enhancement, P3)

enhancement

Tracking

()

Fission Milestone M4

People

(Reporter: rhunt, Assigned: mattwoodrow, NeedInfo)

References

(Blocks 4 open bugs)

Details

(Whiteboard: [gfx-noted], m4-browsing)

With Fission we will have remote iframes for cross-origin sites. This means that a single content process will not be painting all of the web content inside of a tab.

We'll be able to support this using our compositor infrastructure for normal rendering, as the compositor will receive layers from all frames in a tab and composite them into one image.

This will not work for CanvasRenderingContext2D::DrawWindow, which is used in lots of places for taking a screenshot of web content. There are two different problems that make me believe a new API is needed.

The first is that we shouldn't allow a content process to request a screenshot of cross-origin content. If a content process is compromised this could allow it to access sensitive information. So this new API should be limited to the browser process if possible, or privileged content processes if needed.

The second is that painting a cross process document tree is an async operation. It will be slower than traditional drawWindow, and the browser process cannot block on content processes. So this new API should be async.

I implemented internals to support this with an unused API in bug 1475139.

I've been now looking at all the different use cases an API should support. To my knowledge these are the use cases in mozilla-central.

1. Draw the whole browser window
2. Draw the full tab including non-visible areas
3. Draw the visible rect of a tab
4. Draw a specified rect of a tab
5. Draw a specific DOM node in a tab
6. Draw the contents of a <iframe mozbrowser>

These all have different requirements and we have multiple re-implementations of all these using drawWindow in the code base. It'd be nice for the screenshot API to support all these use cases.

I think something like this could work:

```
// Takes a screenshot of the clientBoundingRect of element found
// by querySelector(selector). If no selector is given, then takes
// a screenshot of the current viewport.
Promise<ImageBitmap> captureSelector(DOMString selector, CaptureElementOptions options)

dictionary CaptureElementOptions {
    // Whether to scroll to the selected element before capturing it
    // Useful as it will layout fixed and sticky position items correctly
    // Will need to restore scroll offsets after the capture
    bool scrollIntoView = true,
    // An offset to be applied to the selected client bounding rect
    float top = 0,
    // An offset to be applied to the selected client bounding rect
    float left = 0,
    // An offset to be applied to the selected client bounding rect
    float width = undefined,
    // A height to override of the selected client bounding rect
    float height = undefined,
    // A resolution to render the content at, defaults to devicePixelRatio
    float resolution = undefined,
}
```

I think that we could provide a WebExtensions API for this similar to captureVisibleTab. [1] Internally, I think we could expose this off of <browser> as well.

The biggest question I have, is where does this API need to be exposed and what processes does it need to be available in?

Is only allowing the API in the parent process viable, or do we need to allow it in certain processes?

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/tabs/captureVisibleTab
Mike, Brian, would you have any insights into the requirements for this API? Or know who else to talk to.

Here are some of the uses of drawWindow I was mentioning.

Page thumbnail [1], Devtools screen capture [2], Devtools eye dropper [3].

[1] https://searchfox.org/mozilla-central/rev/eef79962ba73f7759fd74da658f6e5ceae0fc730/toolkit/components/thumbnails/PageThumbUtils.jsm#227
[2] https://searchfox.org/mozilla-central/rev/3d989d097fa35afe19e814c6d5bc2f2bf10867e1/devtools/shared/screenshot/capture.js#94
[3] https://searchfox.org/mozilla-central/rev/3d989d097fa35afe19e814c6d5bc2f2bf10867e1/devtools/server/actors/highlighters/eye-dropper.js#490
Flags: needinfo?(mconley)
Flags: needinfo?(bgrinstead)
Making this async makes perfect sense, and the idea that content processes shouldn't be able to capture pixels from other processes makes perfect sense.

We should get some of the browser screenshots devs to look this proposal over - I've needinfo'd Jared Hirsch.
Flags: needinfo?(mconley) → needinfo?(jhirsch)
(In reply to Ryan Hunt [:rhunt] from comment #1)
> Mike, Brian, would you have any insights into the requirements for this API?
> Or know who else to talk to.
> 
> Here are some of the uses of drawWindow I was mentioning.
> 
> Page thumbnail [1], Devtools screen capture [2], Devtools eye dropper [3].
> 
> [1]
> https://searchfox.org/mozilla-central/rev/
> eef79962ba73f7759fd74da658f6e5ceae0fc730/toolkit/components/thumbnails/
> PageThumbUtils.jsm#227
> [2]
> https://searchfox.org/mozilla-central/rev/
> 3d989d097fa35afe19e814c6d5bc2f2bf10867e1/devtools/shared/screenshot/capture.
> js#94
> [3]
> https://searchfox.org/mozilla-central/rev/
> 3d989d097fa35afe19e814c6d5bc2f2bf10867e1/devtools/server/actors/highlighters/
> eye-dropper.js#490

I think your (1), (2), (3), and (5) in Comment 0 cover the DevTools usecases (using --chrome, no args, --fullpage, and --selector, respectively). Yulia was the last person to work with screenshots in DevTools (during GCLI removal), so she may be able to give more feedback or be able to redirect appropriately.
Flags: needinfo?(bgrinstead) → needinfo?(ystartsev)
Hi Ryan,

We already have a cross process issue with screenshots on devtools, this is the bug for it https://bugzilla.mozilla.org/show_bug.cgi?id=1474006

It would be really great to have a general api so we don't keep reimplementing the same functionality! At the moment, screenshots are represented by a target scoped actor. (https://searchfox.org/mozilla-central/source/devtools/server/actors/screenshot.js, uses the capture code that you mentioned)

Since it is done this way, we have the capability to take a screenshot in every process that we need to. The hard part is merging the image. In order to make it fission ready, what we would need to do is traverse the targets in a given tab, take a screenshot in each one, determine the offset of the screenshot, then combine the images into a final product. 

I am not sure how the browser folks are doing it, so I am looking forward to their ideas as well.
Flags: needinfo?(ystartsev)
(In reply to Yulia Startsev [:yulia] from comment #4)
> Hi Ryan,
> 
> We already have a cross process issue with screenshots on devtools, this is
> the bug for it https://bugzilla.mozilla.org/show_bug.cgi?id=1474006
> 
> It would be really great to have a general api so we don't keep
> reimplementing the same functionality! At the moment, screenshots are
> represented by a target scoped actor.
> (https://searchfox.org/mozilla-central/source/devtools/server/actors/
> screenshot.js, uses the capture code that you mentioned)
> 
> Since it is done this way, we have the capability to take a screenshot in
> every process that we need to. The hard part is merging the image. In order
> to make it fission ready, what we would need to do is traverse the targets
> in a given tab, take a screenshot in each one, determine the offset of the
> screenshot, then combine the images into a final product. 
> 
> I am not sure how the browser folks are doing it, so I am looking forward to
> their ideas as well.

Unfortunately it's not always possible to correctly merge the contents of a tab
in that manner. For example, content in one frame can overlap and underlap the
contents of a sub frame [1]. In addition, with transforms [2] and filters [3] it's
not a simple copy and paste operation.

That's why we've been doing some work to provide a platform API which will take
care of all of this.

We'd also like to not expose this API inside processes containing web content. Does
the devtools panel run in the browser process or a content process?

I'm not super familiar with devtools internals so I'm not sure where it'd be convenient
for this API to be exposed.

[1] https://eqrion.github.io/web-tests/embed/same-origin-overlap.html
[2] https://eqrion.github.io/web-tests/embed/same-origin-3d-transform.html
[3] https://eqrion.github.io/web-tests/embed/same-origin-filter.html
Sorry for the long delay. The Screenshots use cases are:

2. Draw the full tab including non-visible areas
3. Draw the visible rect of a tab
4. Draw a specified rect of a tab

Screenshots is implemented as a webextension, so you'll need to expose the API to the webextension process (Screenshots is a specially-privileged webextension with the 'mozillaAddons' permission; not sure if this affects anything at the process level).

Screenshots captures page content via a content script iframe that calls chrome.drawWindow, if that API is available. If not, it falls back to calling captureVisibleTab from the background page. We could just call the new API from the background page instead, I reckon.

An async API works fine for us.
Flags: needinfo?(jhirsch)

Hi Ryan,

I just saw that you worked on this concurrently to this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1475139
To answer your question - the devtools panel runs in the parent process.

This api that you worked on looks really promising: would we be able to use it?

Flags: needinfo?(rhunt)

Hi Yulia, I haven't gotten any time to work on it since this proposal. It should be fairly simple to implement though.

I'll try and take a look at it in the next week.

Flags: needinfo?(rhunt)
Duplicate of this bug: 1523668
Fission Milestone: --- → ?
Summary: Implement frontend API(s) for taking a screenshot of web content with fission → Fission: Implement frontend API(s) for taking a screenshot of OOP iframes
Blocks: rendering-fission
No longer blocks: graphics-fission
Fission Milestone: ? → M3
Fission Milestone: M3 → M4

Ryan and I chatted about this today, and we're concerned that it will be hard for devtools to come up with a unique selector for an arbitrary DOM element that you want to snapshot.

We discussed the possibility of chaining nth-child selectors to identify an element w.r.t the nearest uniquely identifiable ancestor, but it seems like that could change in undesired ways between when it's computed in devtools code, to when it's asynchronously processed after a roundtrip through the parent process.

Are there other ways to identify an element that devtools could pass to the parent process, and then the snapshot API can pass down to the child?

We discussed another alternative, where there would be a captureSnapshot API in the content process that returns an opaque identifier, and then a separate API in the parent which retrieves the ImageBitmap result for that identifier asynchronously. As mentioned earlier, we can't return the ImageBitmap to the content process, since it may include pixels from a different origin.

Yulia, do any of those sound workable from a devtools perspective?

Flags: needinfo?(ystartsev)

Are there other ways to identify an element that devtools could pass to the parent process, and then the snapshot API can pass down to the child?

I think I am out of my depth to answer that. Patrick, I think you have more knowledge about how the inspector handles selections and what we could do. Could you chime in?

Flags: needinfo?(ystartsev) → needinfo?(pbrosset)

Right now, DevTools captures screenshots of DOM nodes in the content process where those DOM nodes live using drawWindow.

When you right-click on a node in the inspector and select the "screenshot node" option, then that sends a request to the DevTools server running in the content process (which has access to chrome privileges) with some DevTools unique ID for this node.
Upon receiving that request, the DevTools server part locates the DOM node, grabs the screenshot with drawWindow and then sends back a data-uri of it to the DevTools front-end so it can take it from here and save the file for the user.

With Fission, DevTools will have a server running in the content page process, as well as one server per remote iframe.
So, if DevTools sends the request to the right server, in the right process (the one where the DOM node lives), then I assume it can still continue using drawWindow. Except the content of nested remote iframes would appear blank.

So, if we could replace drawWindow with the proposed captureSelector API and still pass it a DOM node reference, then I guess it would work fine for DevTools.

We could however change this and always call captureSelector from the browser parent process instead, but in that case we'd have to pass a unique selector that identifies the element.
We do this in a number of places already in DevTools. So yes, we could chain these selectors as required to traverse all iframes (i.e. an array of selectors where each item targets a DOM node in a given document).

But you are right in saying that the time between calculating this selector and taking the screenshot will be enough for the DOM to mutate.
This is a problem we already are facing when doing, e.g., right-click on a page and selecting "inspect element".

DevTools uses unique "actor IDs" for each DOM nodes that are displayed in the inspector.
Given this ID and code running in the process where the corresponding DOM node exists, it's possible to retrieve the reference to that node directly.

Flags: needinfo?(pbrosset)
Depends on: 1556555

Hi Andreas,

It looks like GeckoDriver/marionette also currently uses drawWindow (from the content process) so will also suffer from the issue of not being able to capture content within OOP iframes.

Could you please have a look at the proposals above and see if they seem feasible to support within marionette?

Thanks!

Flags: needinfo?(ato)

To clarify, there are two possible APIs that we're considering at this point.

  • The parent process-only captureSelector detailed in comment 0.

This requires a unique selector for a content process Element within the parent process (though there is prior art for doing this in toolkit code), and is potentially racy (though devtools has this same issue for inspect element, and it appears to be workable in practice).

  • A two part API that captures in content, but returns the rasterized snapshot to the parent process.

In the content process (including OOP <iframe> processes):

SnapshotIdentifier captureSnapshot(Rect)

In the parent process

Promise<ImageBitmap> retrieveSnapshot(SnapshotIdentifier)

The calling code would be responsible for getting the SnapshotIdentifier (an opaque integer ident) to the parent process. This avoids most of the races, since you'd be able to get the bounding rect of an element and have it drawn synchronously. There is still a potential race if the requested area includes OOP content (which is why we can't return the rasterized result), which may still change during the snapshot process.

For WebDriver, even the API for taking an element screenshot specifically says it is to "draw a bounding box from the framebuffer". That links into the Screen capture section that makes clear that there is only one frame buffer, the "initial viewport’s framebuffer" (i.e. the framebuffer of the top-level window).

Looking at Google's Puppet API - which is generally more flexible I believe - taking a screenshot is only available for Page and Element. "Page" is the top level webpage, which is separate from Frame (which does not provide access to a child "Page", or screenshot API of its own). The Element.screenshot API says that it "uses page.screenshot to take a screenshot" (after scrolling the element into view).

So a parent process based API would seem to be adequate both for WebDriver and for Puppeteer.

Flags: needinfo?(matt.woodrow)

To be clear, Puppeteer is a "competitor" to WebDriver, not an implementation of it. Puppeteer I believe is basically aimed at exposing Chrome's devtools API to give a lot more flexibility that WebDriver.

Mike mentioned this in the fission meeting today, but there's a JS module for passing cross-process friendly references to DOM elements [1] that we could use here.

[1] https://searchfox.org/mozilla-central/rev/227f5329f75bd8b16c6b146a7414598a420260cb/toolkit/modules/ContentDOMReference.jsm

Flags: needinfo?(matt.woodrow)

Sorry for the delay to my reply, but I’ve been on holiday.

jwatt summarises the needs of WebDriver well, but it bears mentioning
that WebDriver’s presumed “inflexibility” here is dictated by what
can be achieved cross-browser. For Gecko this means a simulated
limitation where we don’t include the rendered content that is
out-of-view for “full document screenshots”. Other browsers have
different limitations, but it is possible that the disappearance
of Edge from the browser landscape means these restrictions can be
lifted in the future.

The relevance of Puppeteer in this context is that we have an ongoing
project to support it for Firefox (tracked in puppeteer and
puppeteer-mvp), so we should make certain a new platform screen
capture API also can support its requirements.

To my understanding, a parent process API like the one outlined in
comment #0 is sufficient for Marionette (and the CDP remote
agent/Puppeteer) as well, as long as it composes in the painted
content from any nested or child browsing contexts with it, i.e.
produces a final screenshot that includes the content of any OOP
<iframe>s.

Flags: needinfo?(ato)
Assignee: rhunt → matt.woodrow
Whiteboard: [gfx-noted] → [gfx-noted], m4-browsing
Blocks: 1568831
No longer blocks: marionette-fission
Blocks: 1571419

Matt, what's left for this bug given that the new drawSnapshot Gecko API exists now?

Flags: needinfo?(matt.woodrow)
You need to log in before you can comment on or make changes to this bug.