Open Bug 1412061 Opened 7 years ago Updated 1 year ago

headless screenshot is too quick: the page hasn't finished loading

Categories

(Firefox :: Headless, defect, P3)

57 Branch
defect

Tracking

()

People

(Reporter: rn214, Unassigned)

References

Details

The new firefox --headless screenshot is wonderfully snappy. In fact, it's so fast that it captures the page before it is ready.

An example of where this fails is:
  firefox --screenshot map.png "https://www.openstreetmap.org/#map=19/52.20505/0.10824"

In my own codebase, I have the same problem: the structure of my page is such that, it draws the skeleton, and then in response to  $(document).ready,  it makes an AJAX request back to the server to populate the page with data (this usually takers < 0.1 s extra).

Firefox 57's --screenshot jumps the gun, however, and records a snapshot of the page before it has made the AJAX request - the result is not therefore correct.

One solution would be similar to wkhtmltopdf's "--javascript-delay 200", while another would be the way phantomjs/slimerjs do it, such as:

page.open(address, function (status) {
 ...
 window.setTimeout(function () {
   page.render(output);
   phantom.exit();
 }, 200);
}

For myself, I currently solved it (in a very ugly way) by appending this to the end of my document:
 <iframe src='sleep.php?ms=1200' width=0 height=0></iframe>
where the server takes 1200ms to respond on this iframe; thereby allowing the document to finish drawing but delaying the screenshot.
Experimentally, the AJAX request takes ~ 100ms, but if I reduce this sleep below 1000ms, the workaround fails.

So, my suggestion would be that firefox adds an optional parameter of how long extra to wait, with a 200ms default. Or alternatively, it could wait till all AJAX requests complete, (with a 10 second limit, to handle the case of a site like Facebook, which does endless loading, and would never terminate).

I'm marking this as major, because it is broken for a lot of sites, not just my own. It also fails on openstreetmap, google maps.
(In reply to Richard Neill from comment #0)
> The new firefox --headless screenshot is wonderfully snappy. In fact, it's
> so fast that it captures the page before it is ready.

Indeed, this seems likely to be a common problem, given how many websites do additional work that affects their appearance after their DOMContentLoaded event is dispatched.


> One solution would be similar to wkhtmltopdf's "--javascript-delay 200",
> while another would be the way phantomjs/slimerjs do it, such as:
> 
> page.open(address, function (status) {
>  ...
>  window.setTimeout(function () {
>    page.render(output);
>    phantom.exit();
>  }, 200);
> }

For the --screenshot feature, which is intended to be a simple way to accomplish a specific goal (like wkhtmltopdf) as opposed to an API for accomplishing general goals (like PhantomJS/SlimerJS), a solution like wkhtmltopdf's is preferable.  Ideally, however, we would be able to detect page completion without requiring the user to specify a heuristic like a time delay.


> So, my suggestion would be that firefox adds an optional parameter of how
> long extra to wait, with a 200ms default. Or alternatively, it could wait
> till all AJAX requests complete, (with a 10 second limit, to handle the case
> of a site like Facebook, which does endless loading, and would never
> terminate).

Harald, you've been involved in metrics for Firefox Quantum, including measurements of page load performance. Is there a way to tell that all AJAX requests have completed; or, perhaps even better, that the page's appearance has "settled" (isn't being updated)?  If so, that might be an ideal time to take the screenshot.
Flags: needinfo?(hkirschner)
Webpagetest & lighthouse both combine timeouts after onload and network idle detection (max 2 inflight requests). Lighthouse adds CPU idle time.

Lighthouse's implementation can be found at https://github.com/GoogleChrome/lighthouse/blob/master/lighthouse-core/gather/driver.js#L519

Marionette might also have prior art for this with its wait implementation; which might be reusable for our case.
Flags: needinfo?(hkirschner)
Puppeteer's page.goto method <https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagegotourl-options> (and related methods) has a *waitUntil* option that can be set to either `load` (the default), `domcontentloaded`, `networkidle0` ("consider navigation to be finished when there are no more than 0 network connections for at least 500 ms"), or `networkidle2` ("consider navigation to be finished when there are no more than 2 network connections for at least 500 ms").

Getting Started with Headless Chrome <https://developers.google.com/web/updates/2017/04/headless-chrome#puppeteer> suggests there's also a `networkidle` value, although that documentation may be out-of-date.
Priority: -- → P3
Another option is something similar to what rendertron (https://github.com/GoogleChrome/rendertron#explicit-rendering-flag) does.

It let you control when the page is "visually complete", by setting a javascript variable, this of course only work for sites your control.
CutyCapt is a minimal command line tool to capture screenshots.
http://cutycapt.sourceforge.net/

It has a delay option which solves this problem and seems like it would be easy to implement.
 --delay=<ms>                   After successful load, wait (default: 0)
See Also: → 1546062
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: x86_64 → All

It's still the case in 71.0

(Btw, for ubuntu users, this bug became more urgent, because with the recent decision to use "snap" packaging for Chromium the chromium screenshot feature is broken for all automated testing)

Please also consider the other use-case where --timeout is needed, for self-tests.
We need to use the browser to run a complex UI Javascript, but we might not actually care about the output screenshot itself, only the fact that, when the page loads, JS has side-effects we want to trigger on the server.

(In reply to Richard Neill from comment #7)

It's still the case in 71.0

(Btw, for ubuntu users, this bug became more urgent, because with the recent decision to use "snap" packaging for Chromium the chromium screenshot feature is broken for all automated testing)

snap Chromium also does not support speechSynthesis.getVoices() or speechSynthesis.speak().

Download Chromium from https://download-chromium.appspot.com/. One caveat is MP4 encoding and decoding is not support at default Chromium build https://bugs.chromium.org/p/chromium/issues/detail?id=601636#c29.

In this case :screenshot at Web Console https://developer.mozilla.org/en-US/docs/Tools/Web_Console and Firefox Screenshots https://screenshots.firefox.com/ work, --headless --screenshot only consistently hangs and consumes computing power while doing so.

(In reply to guest271314 from comment #8)

In this case :screenshot at Web Console https://developer.mozilla.org/en-US/docs/Tools/Web_Console and Firefox Screenshots https://screenshots.firefox.com/ work, --headless --screenshot only consistently hangs and consumes computing power while doing so.

If you don't specify a URL to screenshot you will indeed hang at the moment. See also the observation on bug 1651542 comment 92. The patch on bug 1588152 should fix this shutdown hang hopefully soon.

If you don't specify a URL to screenshot you will indeed hang at the moment.

A URL is provided. Have been testing with Nightly 83 and MDN pages

~$ firefox/firefox --screenshot https://developer.mozilla.org/en-US/docs/Tools/Taking_screenshots
*** You are running in headless mode.

until CTRL+C

^CExiting due to channel error.
Exiting due to channel error.

Oh, then it's most likely bug 1563725. Specifying a temporary profile location with -profile should give you a workaround. But that's actually all a bit off-topic.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #11)

Oh, then it's most likely bug 1563725. Specifying a temporary profile location with -profile should give you a workaround. But that's actually all a bit off-topic.

FWIW installed Firefox as a .deb to test what was different from using Nightly, as had previously successfully captured screenshots at default Firefo install at *nix.

Isolated the issue to

"Always ask you where to save files" 

being set at Preferences instead of

"Save files to" </path/to/directory>

This should be conspicuously documented somewhere in Mozilla publications.

This bug has been open for 4 years.

How about adding a "--delay=<ms>" command-line flag?

Could this be an example of the quote "The perfect is the enemy of good"? A long time ago, someone wrote "Ideally, however, we would be able to detect page completion without requiring the user to specify a heuristic like a time delay." It sounds complicated. How about a simple flag, which causes a simple delay, and then headless screenshots will work.

If you control the webserver, you can bodge a timeout by embedding a 1 pixel iframe with display:none, and then make the iframe contain an image, and make the server delay the image for a couple of seconds. e.g.
<iframe src='htto://localhost/delay.php?sleep=5000ms' style='display:none' width=1 height=1></iframe>

Can confirm this is still reproducible on the latest Firefox Nightly 97 version, tested on MacOS 10.15.
The screenshot is done before the page is properly loaded. Updating to the current severity rankings.

Severity: major → S3

I can still confirm this for Mozilla Firefox 91.10.0esr on Oracle Linux 8.6
Without adding a delay command line option the screenshot feature seems to be pretty useless.

Can we please get something done about this? Even if it's as clunky as hard-coding a one second delay, that would be greatly appreciated. As it is now, there are certain webpages that one simply cannot capture, because the capture occurs before the page content has fully loaded. It captures the text and framework of the page, but all responsive content is blank, because the event that triggers such content to start loading is the same one that triggers the screenshot to be taken, thinking that the page has fully loaded.

If it helps, a somewhat clunky, but reliable workaround can be done by using xwd (x-window-dump).

Here is a way I found of doing it which is reasonably reliable. I use palemoon rather than firefox below, partly because it is not snap-packaged (which means it can run under an account whose homedir is outside of /home, e.g. invoked via apache) and partly because of the need to killall to make it exit - which means that we don't want to kill the desktop firefox. It should work similarly with firefox. All the wrappers are there to make sure that multiple instances run consecutively, never concurrently. There should be no linebreaks in this command; I've added them for legibility...

mkdir /tmp/fake_home_for_this timeout/ ;
export HOME=/tmp/fake_home_for_this timeout/ 60 xvfb-run -a -s '-screen 0 1600x1200x24'
flock -w 50 /tmp/fake_home_for_this timeout_flockfile
sh -c "palemoon --new-instance --no-remote --setDefaultBrowser --width 1600 --height $1200 $url &
sleep 6; xwd -root | convert xwd:- output_file.pdf;
killall palemoon;" 2>&1 ;
rm -rf /tmp/fake_home_for_this timeout/

The key pieces are:

  1. You need a dummy home, so that firefox/palemoon/chromium/midori don't clutter it up with prefs and old data.
  2. Flock, so that only one instance runs at a time.
  3. xvfb-run lets it run in a dummy X-session that isn't your own one (and set the virtual screensize)
  4. make the browser launch the same window-size as the screen size
  5. xwd | convert in order to screengrab and convert. (imagemagick can do pdf, jpg,png etc according to extension).
  6. sleep is a guess, enough to start the browser and load the content.
  7. Ensure the $url is quoted to protect it from the shell.

It's brutally ugly, but it does work reliably. HTH!

You need to log in before you can comment on or make changes to this bug.