Closed Bug 1512337 Opened 6 years ago Closed 5 years ago

Support Chrome Puppeteer API

Categories

(DevTools :: General, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: aslushnikov, Assigned: aslushnikov)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36




Expected results:

Let's make https://github.com/GoogleChrome/puppeteer/ work with Firefox!
This patch:
- introduces a new event "onFrameLocationChange" that is needed
  to track same-document navigations inside iframes.
- adds a new "//testing/juggler" folder that implements a custom
  automation protocol to back Puppeteer API.

With this patch, running Firefox with "-juggler 0" flag will expose
a Juggler debugging server.
Component: Untriaged → General
Product: Firefox → DevTools
Assignee: nobody → aslushnikov
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Priority: -- → P2
Comment on attachment 9029755 [details]
Introduce "juggler" remote debugging protocol

Flagging Alex and Brian for a first round of feedback.
Attachment #9029755 - Flags: feedback?(poirot.alex)
Attachment #9029755 - Flags: feedback?(bgrinstead)
Thanks, I've left a couple of initial comments in Phabricator.

Andrey, would you be willing to take a patch into the main Puppeteer library that enables switching the browser that gets used? Possibly something like `puppeteer.launch({ browser: 'firefox' })` or similar.

I think the steps that would be needed for this on the Firefox side on top of the current implementation of Juggler would be:

1) Use CDP for transport 
2) Implement the current CDP methods that Puppeteer uses for chrome (as opposed to the sometimes-modified ones that Juggler currently uses). Or alternatively, adapt Chrome to use the modified methods.

We've started to look into doing this a bit, and it seems viable (even while sharing much of the rest of the Juggler implementation).
Flags: needinfo?(aslushnikov)
Hi Brian, thanks for looking through this!

> Andrey, would you be willing to take a patch into the main Puppeteer library that enables switching the browser that gets used? Possibly something like `puppeteer.launch({ browser: 'firefox' })` or similar.

How do you see this working? Puppeteer npm package only downloads Chromium and doesn't download Firefox. Requiring "puppeteer" and specifying a "browser" in the launch options won't work.

We currently ship a separate NPM package - `puppeteer-firefox`. This package downloads Firefox and provides an API to drive it. 
Puppeteer users can pick a driver by just requiring the right package, or even require both and run the same tests against two different browsers - that's what we currently do in our own tests: https://github.com/GoogleChrome/puppeteer/blob/0cccc586cc3e7140fb9c51d859943cfb39d83b02/experimental/puppeteer-firefox/test/test.js#L63-L72

And as a bonus, TypeScript types work out-of-the-box :)

> 1) Use CDP for transport 

Not using CDP is a conscious decision. CDP is a poor place to align upon: CDP exposes a lot of chrome-specific aspects. As Chromium evolves over time, CDP changes with it. Puppeteer became *the* place where we deal with CDP complexities to provide a neutral, semver-compliant automation API. From engineering point of view, it's just way cheaper for us to do this in Puppeteer than in Chromium.

Many Chrome-specific things that leak big times through CDP (e.g. out-of-process-iframes and cross-process navigations) are foreign concepts to Gecko. It's an endless uphill battle to try to mock how Chrome works, and benefits are unclear to me. I do see a lot of shortcomings though.

IMO it's crucial for the success of puppeteer-firefox to use a protocol that's natural to Firefox. This not only keeps things simple on the implementation side, but gives all the leverage to address user needs.
(In reply to aslushnikov from comment #4)
> > Andrey, would you be willing to take a patch into the main Puppeteer library that enables switching the browser that gets used? Possibly something like `puppeteer.launch({ browser: 'firefox' })` or similar.
> 
> How do you see this working? Puppeteer npm package only downloads Chromium
> and doesn't download Firefox. Requiring "puppeteer" and specifying a
> "browser" in the launch options won't work.
> 
> We currently ship a separate NPM package - `puppeteer-firefox`. This package
> downloads Firefox and provides an API to drive it. 
> Puppeteer users can pick a driver by just requiring the right package, or
> even require both and run the same tests against two different browsers -
> that's what we currently do in our own tests:
> https://github.com/GoogleChrome/puppeteer/blob/
> 0cccc586cc3e7140fb9c51d859943cfb39d83b02/experimental/puppeteer-firefox/test/
> test.js#L63-L72

I see the concern with shipping both binaries in the main package. I'm also now reading about puppeteer-core at https://pptr.dev/#?product=Puppeteer&version=v1.11.0&show=api-puppeteer-vs-puppeteer-core, and assuming that puppeteer-firefox is also using this shared core, then adding new package(s) for individual browsers does make sense.

My main concern when asking this is around making sure that the API is complete and well-tested in Firefox so that puppeteer-firefox doesn't fall out of date as things on either the Pupeteer side or the Firefox side change over time. The fact that the tests are running against Firefox in the main repo is helpful. A few follow up questions:

1) Does that run all of the Puppeteer tests or just a subset?
2) Have you had a chance to run this against any third party application Puppeteer test suites to see if they are passing? I'm sure our testing team will want to look into this, so just curious if you've already looked into that or have pointers to projects using a lot of these tests.
3) How do you ensure that chromium commits don't break Puppeteer? Are the Puppeteer tests vendored-in to chromium?

> > 1) Use CDP for transport 
> 
> Not using CDP is a conscious decision. CDP is a poor place to align upon:
> CDP exposes a lot of chrome-specific aspects. As Chromium evolves over time,
> CDP changes with it. Puppeteer became *the* place where we deal with CDP
> complexities to provide a neutral, semver-compliant automation API. From
> engineering point of view, it's just way cheaper for us to do this in
> Puppeteer than in Chromium.
> 
> Many Chrome-specific things that leak big times through CDP (e.g.
> out-of-process-iframes and cross-process navigations) are foreign concepts
> to Gecko. It's an endless uphill battle to try to mock how Chrome works, and
> benefits are unclear to me. I do see a lot of shortcomings though.
> 
> IMO it's crucial for the success of puppeteer-firefox to use a protocol
> that's natural to Firefox. This not only keeps things simple on the
> implementation side, but gives all the leverage to address user needs.

I haven't yet had a chance to dig into each method in Juggler and compare to the corresponding CDP method, but I believe there are some changes that look superficial enough that if we stuck with CDP naming it could perhaps let puppeteer share more with firefox-puppeteer and cause less divergence between the browers? For example: https://phabricator.services.mozilla.com/D13889#inline-74729.

I think it's important for us to decide on our side if we want to use an RDP-like thing for transport or a CDP-like thing. If we want to use something like RDP, I believe there's some code reuse (from either DevTools or Marionette) that can be done inside of Juggler to avoid using a copy of it.

I think Alex and Andreas have both looked into this in more detail, so I'll ask them to add any more details that I'm missing or getting wrong. Relatedly: I'll be mostly offline this coming week, so it'd be great to get their input.
Comment on attachment 9029755 [details]
Introduce "juggler" remote debugging protocol

(In reply to Brian Grinstead [:bgrins] from comment #5)
> I think Alex and Andreas have both looked into this in more detail, so I'll
> ask them to add any more details that I'm missing or getting wrong.
> Relatedly: I'll be mostly offline this coming week, so it'd be great to get
> their input.
Attachment #9029755 - Flags: feedback?(ato)
> 1) Does that run all of the Puppeteer tests or just a subset?

ATM puppeteer-firefox passes 285 tests out of 519 puppeteer tests. The goal is to have all of them once puppeteer-firefox supports all the API surface.

> 2) Have you had a chance to run this against any third party application Puppeteer test suites to see if they are passing?

Not really. In many cases, puppeteer test suites rely on request interception that is currently not supported in puppeteer-firefox. However, a few of my personal Puppeteer automation scripts run just fine with Puppeteer-Firefox.

> 3) How do you ensure that chromium commits don't break Puppeteer? Are the Puppeteer tests vendored-in to chromium?

Oftentimes Chromium commits do break Puppeteer for legitimate reasons - we update Puppeteer code once we roll a new chromium revision into it. For example: https://github.com/GoogleChrome/puppeteer/pull/2989 

Occasional regressions do happen as well since there are no Puppeteer tests in Chromium repository. This, however, doesn't bother us much - both CDP and headless have a pretty good test coverage. I can recall only a few regressions in Chromium during the 1.5 years of Puppeteer existence.


> but I believe there are some changes that look superficial enough that if we stuck with CDP naming it could perhaps let puppeteer share more with firefox-puppeteer and cause less divergence between the browers?

I don't think code reuse is necessary and that much of a goal. From code complexity standpoint, the whole Puppeteer is less than 8KLOC - and there's *a lot* of sugar APIs in there. I expect puppeteer-firefox to be even less since there's a possibility of a more high-level protocol on the Firefox side. IMO both Puppeteer and Puppeteer-Firefox are quite small and low-effort projects.

When it comes to API compatibility, we have to rely on compatibility test suite rather then on code reuse. This is the main reason we want puppeteer-firefox to be part of Puppeteer repo - this way we can effectively reuse all our testing infrastructure for both Puppeteer and Puppeteer-Firefox.
Flags: needinfo?(aslushnikov)
Comment on attachment 9029755 [details]
Introduce "juggler" remote debugging protocol

I’ve studied the Juggler code in detail and I think it holds great
potential!

The overarching concern I have is similar to bgrins’.  This looks
(as you would expect!) a lot like an implementation of CDP and it
is unclear to me if the minor deviations from that are necessary
to serve the goal of supporting Puppeteer.  Reusing the CDP APIs
directly rather than making minor changes, would have a benefit for
long-term maintainability.

The way Juggler uses RDP is incompatible with how the devtools actor
hierarchy works, which I assume is why it spins up its own TCP port.
It is simultaneously incompatible with the Marionette protocol that
we use for WebDriver and various automation such as mochitests,
reftests, and WPT.  The only thing these three protocols have in
common is that they share the same transport layer (JSON over TCP)
and the packet sizing and framing approach provided by
DebuggerTransport.jsm.

Adding another remote protocol to support a single client application
seems like a bad engineering choice, as it is unlikely to gain any
wide-spread adoption or give us compatibility with other CDP-based
tooling, such as VSCode.  For similar reasons I am also concerned
that having a specific client application for Firefox increases
maintenance costs (any API change will need to be made in two places)
and is therefore likely to lead to deviations in functionality.

Drawing from personal experience as one of the primary authors of
the WebDriver standard, this approach seems analogous to technical
choices made early in the history of Selenium that it took years
to recover from.  Rather than browser-specific backends, we today
have a standard that provides a contract and a transparent protocol
more in the spirit of the open web.

Another technical observation is that Juggler stores a lot of state
in content frame scripts.  This is risky since content frame scripts
get reloaded when content browsers change processes.  As remoteness
changes are bound to get more frequent in the short-to-medium term
with Fission (https://wiki.mozilla.org/Project_Fission), we should
make every precaution that any new code that we land doesn’t add
to the laundry list of existing code we need to fix.

Please note that I will be away for the rest of the year, but before
I take a much-deserved holiday, I want to stress that I’m overall
impressed with the work done here. (-:
Attachment #9029755 - Flags: feedback?(ato)
> The way Juggler uses RDP is incompatible with how the devtools actor hierarchy works, which I assume is why it spins up its own TCP port.

That's the main concern I have with the current patch.
It isn't quite CDP, nor is it RDP. So that it makes the whole testing/juggler code really specific to puppeteer and only puppeteer.
It sounds unlikely that any of this is going to be reused by any other project.

I do understand that integrating with existing chrome Domains (Page, Target, Browser, ...) is a much larger work and as you said, could force us to expose chrome specifics. It would be easier to expose brand new APIs/Domains, specific to Puppeteer. That's exactly what you did here, but with a custom protocol and with some confusion by reusing existing CDP domain names and methods.

> IMO it's crucial for the success of puppeteer-firefox to use a protocol that's natural to Firefox. This not only keeps things simple on the implementation side, but gives all the leverage to address user needs.

As of today, this protocol is RDP. But I'm not sure that's the key here.

> I expect puppeteer-firefox to be even less since there's a possibility of a more high-level protocol on the Firefox side.

The main benefit you got from the current patch is about writing code in Firefox that matches exactly puppeteers APIs. That's what you call "high-level protocol". That's exactly what makes puppeteer-firefox simple, simplier than the chrome equivalent. It isn't so much about using RDP versus CDP. If you start using RDP existing actors, you are most likely going to leak firefox specifics into puppeteer-firefox and complexify it.

> Many Chrome-specific things that leak big times through CDP (e.g. out-of-process-iframes...

A quick note about out of process iframes. We are actively working on that. This is the "Fission" project already mentioned in the previous comment. This is going to require some changes to FrameTree component as the way to iterate over DocShells and connecting to them is going to change. FrameScripts are going to be replaced by some other API.


To me, the best option from Firefox engineering standpoint would be to have puppeteer-firefox to reuse existing RDP actors and contribute to them any missing feature. We would not have to maintain: another protocol, new code to iterate over tabs and frames, as well as code to inspect them by piercing process boundaries. It would be an equivalent of what is being done for chrome in puppeteer, but using RDP actors instead of CDP domains.

Now, that's a suggestion I made by ignoring the intent of Mozilla [1] to start exposing a CDP server.
It may be a good time to actually start having a base CDP server and have puppeteer be the first usecase for it.
Andreas recently demonstrated that we can reasonably have such base CDP server in-tree.
May be we should land such base server and rebase your patch on top of it?

[1] https://groups.google.com/forum/#!msg/mozilla.dev.platform/4-4A8W-nP5g/Y9C9UkWTAAAJ
Attachment #9029755 - Flags: feedback?(poirot.alex)
Attachment #9029755 - Flags: feedback?(bgrinstead)

I’m tying up a few loose ends today by making sure Bugzilla reflects
reality.

As a result of the hard work by aslushnikov et al. on this patch and
in fact since discussions going as far back as to TPAC two years ago,
we have had multiple Mozilla-internal discussions debating how best to
tackle the challenge of gaining Puppeteer support in Firefox. It
should be abundantly clear to all parties involved that Puppeteer is
increasingly making its way into web authors’ toolboxes as a preferred
browser automation tool, and that its bidirectional and asynchronous
API is appealing to many.

Not supporting Puppeteer also poses significant web interoperability
risk to Mozilla. As websites migrate automation to Puppeteer, the
lack of cross-browser support in many cases leaves Firefox untested,
forcing Firefox to play a catch-up game addressing web compatibility
issues.

Simultenously Mozilla’s ambitions stretch beyond Puppeteer’s feature
set. Puppeteer is backed by the Chrome DevTools Protocol (CDP), which
also drives an armada of other tools. The simlarities between the
Juggler protocol presented in this patch and CDP are uncanny, and we
believe that it is in the best interest of the open web for us to
instead implement the subsets of CDP that allows us to support the
Puppeteer API surface.

To this effect, we have filed
https://bugzilla.mozilla.org/show_bug.cgi?id=puppeteer as a tracking
bug for gaining Puppeteer support in Firefox. The scope of this work
is limited to the subset of CDP that is necessary to support
Puppeteer’s API.

Due to the astonishing progress made on
https://github.com/aslushnikov/gecko-dev/tree/master/testing/juggler,
we would like to uplift and port large parts of the existing Juggler
implementation into our CDP implementation in Firefox.

I see great potential for collaboration in this area.

Have a great weekend, and I’ll see you all in the meeting on Tuesday!

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: