headless screenshot is too quick: the page hasn't finished loading

UNCONFIRMED
Unassigned

Status

()

Firefox
Headless
--
major
UNCONFIRMED
29 days ago
8 days ago

People

(Reporter: Richard Neill, Unassigned)

Tracking

57 Branch
x86_64
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

29 days ago
The new firefox --headless screenshot is wonderfully snappy. In fact, it's so fast that it captures the page before it is ready.

An example of where this fails is:
  firefox --screenshot map.png "https://www.openstreetmap.org/#map=19/52.20505/0.10824"

In my own codebase, I have the same problem: the structure of my page is such that, it draws the skeleton, and then in response to  $(document).ready,  it makes an AJAX request back to the server to populate the page with data (this usually takers < 0.1 s extra).

Firefox 57's --screenshot jumps the gun, however, and records a snapshot of the page before it has made the AJAX request - the result is not therefore correct.

One solution would be similar to wkhtmltopdf's "--javascript-delay 200", while another would be the way phantomjs/slimerjs do it, such as:

page.open(address, function (status) {
 ...
 window.setTimeout(function () {
   page.render(output);
   phantom.exit();
 }, 200);
}

For myself, I currently solved it (in a very ugly way) by appending this to the end of my document:
 <iframe src='sleep.php?ms=1200' width=0 height=0></iframe>
where the server takes 1200ms to respond on this iframe; thereby allowing the document to finish drawing but delaying the screenshot.
Experimentally, the AJAX request takes ~ 100ms, but if I reduce this sleep below 1000ms, the workaround fails.

So, my suggestion would be that firefox adds an optional parameter of how long extra to wait, with a 200ms default. Or alternatively, it could wait till all AJAX requests complete, (with a 10 second limit, to handle the case of a site like Facebook, which does endless loading, and would never terminate).

I'm marking this as major, because it is broken for a lot of sites, not just my own. It also fails on openstreetmap, google maps.
(In reply to Richard Neill from comment #0)
> The new firefox --headless screenshot is wonderfully snappy. In fact, it's
> so fast that it captures the page before it is ready.

Indeed, this seems likely to be a common problem, given how many websites do additional work that affects their appearance after their DOMContentLoaded event is dispatched.


> One solution would be similar to wkhtmltopdf's "--javascript-delay 200",
> while another would be the way phantomjs/slimerjs do it, such as:
> 
> page.open(address, function (status) {
>  ...
>  window.setTimeout(function () {
>    page.render(output);
>    phantom.exit();
>  }, 200);
> }

For the --screenshot feature, which is intended to be a simple way to accomplish a specific goal (like wkhtmltopdf) as opposed to an API for accomplishing general goals (like PhantomJS/SlimerJS), a solution like wkhtmltopdf's is preferable.  Ideally, however, we would be able to detect page completion without requiring the user to specify a heuristic like a time delay.


> So, my suggestion would be that firefox adds an optional parameter of how
> long extra to wait, with a 200ms default. Or alternatively, it could wait
> till all AJAX requests complete, (with a 10 second limit, to handle the case
> of a site like Facebook, which does endless loading, and would never
> terminate).

Harald, you've been involved in metrics for Firefox Quantum, including measurements of page load performance. Is there a way to tell that all AJAX requests have completed; or, perhaps even better, that the page's appearance has "settled" (isn't being updated)?  If so, that might be an ideal time to take the screenshot.
Flags: needinfo?(hkirschner)
Webpagetest & lighthouse both combine timeouts after onload and network idle detection (max 2 inflight requests). Lighthouse adds CPU idle time.

Lighthouse's implementation can be found at https://github.com/GoogleChrome/lighthouse/blob/master/lighthouse-core/gather/driver.js#L519

Marionette might also have prior art for this with its wait implementation; which might be reusable for our case.
Flags: needinfo?(hkirschner)
Puppeteer's page.goto method <https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagegotourl-options> (and related methods) has a *waitUntil* option that can be set to either `load` (the default), `domcontentloaded`, `networkidle0` ("consider navigation to be finished when there are no more than 0 network connections for at least 500 ms"), or `networkidle2` ("consider navigation to be finished when there are no more than 2 network connections for at least 500 ms").

Getting Started with Headless Chrome <https://developers.google.com/web/updates/2017/04/headless-chrome#puppeteer> suggests there's also a `networkidle` value, although that documentation may be out-of-date.
You need to log in before you can comment on or make changes to this bug.