Open Bug 1407238 Opened 7 years ago Updated 23 days ago

Cli argument for print page or generate pdf command in Firefox headless mode

Categories

(Firefox :: Headless, defect, P3)

57 Branch
defect

Tracking

()

UNCONFIRMED

People

(Reporter: murznn, Unassigned)

Details

Attachments

(1 file)

1.31 KB, application/x-shellscript
Details
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0 Build ID: 20170828095647 Steps to reproduce: Please add command line argument for print page (or export to PDF) in Firefox headless mode, like 'screenshot' function. Printing and generating pdf are more useful that pixelized screenshot.
Google Chrome already have command for generate PDF: https://developers.google.com/web/updates/2017/04/headless-chrome chrome --headless --disable-gpu --print-to-pdf https://bugzilla.mozilla.org
Here https://addons.mozilla.org/en-US/firefox/addon/cmdlnprint/ is addon for printing in Firefox from command line, but it isn't work on fresh Firefox versions.
Component: Untriaged → Headless
Priority: -- → P3
Suggestion: this should also use "@media print" for the CSS rules. (That's what all the other tools: chromium, wkhtmltopdf, phantomjs, slimerjs do]
Oooh this would definitely be a nice to have. The problem is that of IE, Firefox and Chrome, chrome was definitely in last place when it came to PDF generation. (IE did a bit better than Firefox when it came to layout it seemed, at least on the Windows machines we had to support but Firefox was close second) Chrome's main fail is multi-page tables with page-break-before rules inside the table to ensure sections were on same page. Or just sane rendering of thead period. It would overlap content and other absurdities. Ended up hacking in fragile multi-table fragmentation of long tables just to get something chrome could manage to print. Firefox' long-standing print fail has been iframes, but that's rarely an issue with the kind of documents I'd need to export to PDF, and is much much more easily worked around in a JS iframe remover. So, yeah, if I had a headless Firefox option it would be a lovely replacement for all the other ones mentioned here which rely on webkit.

This would be really useful. I've built a little app to help me test CSS fragmentation properties, which I have up and running with Puppeteer and also wkhtmltopdf. I had hoped I'd be able to take a similar approach to that of Puppeteer with headless firefox.

Is this going to be added at some point in the future? In the meantime I am having to do some pretty hacky things to get a PDF.

My current workaround is as follows:

firefox --screenshot page.png --window-size=1500 https://mysite.com/page.html
convert page.png -crop 1500x1766 page.tiff
convert page.tiff page.pdf

This, of course, is less-than-ideal since it is a flat image which has been split into pages and it has no real text in it.

(In reply to sputnick1124 from comment #6)

Is this going to be added at some point in the future? In the meantime I am having to do some pretty hacky things to get a PDF.

It has been assigned a priority of P3 (Backlog). This means developers want to integrate the feature but there's no ETA. In other word, it's in the bucket "would be nice to have when we have the time".

This, of course, is less-than-ideal since it is a flat image which has been split into pages and it has no real text in it.

Currently, the only way to have a text pdf is to print it via File > Print (Ctrl+P) > General tab > Print to file.

I hope that's answered your question :).

Meanwhile, we can automate Firefox using "xdotool" (available on Linux) to automate a monkey to do the job for us.

An example script to start off: https://askubuntu.com/a/612510

It's hacky, but it may solve an immediate problem.

(In reply to jman from comment #8)

Meanwhile, we can automate Firefox using "xdotool" (available on Linux) to automate a monkey to do the job for us.

Oh I completely forgot about xdotool. Thanks for the suggestion :D.

Yeah, xdotool + xvfb has been the way to automate firefox since its existence, but proper printing support would allow plugging in firefox to existing PDF generating servers more easily, and in a cross-platform fashion.
And, yeah, would love to have that for the far-improved firefox table support. Chrome continues to mangle splitting table content across multiple print pages.

Good tips on using xdotool! Unfortunately my usecase is part of an automated CI pipeline with no X, but I'll definitely file this away for the future.

If your automated CI is running on linux you can do xvfb headless no problem.
For better performance you might have to set gfx.xrender.enabled;true - I know I've had to do that for things like ssh -YC or xrdp in the past. I haven't tested that recently however with xvfb.

Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 11 votes.
:Amir, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(ahabibi)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(ahabibi)

(In reply to Release mgmt bot (nomail) [:suhaib / :marco/ :calixte] from comment #14)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

I really believe it is very relevant!
I'm looking for it for a very long time, relying on wkhtmltopdf now, but I'd love to use firefox rendering skills (what I need is to 'print' or render it in the user side, so maybe a javascript option could solve my problem too).

Yes, it would definitely be good to get this fixed - because most of the other tools have now become unmaintained.

  • SlimerJS, PhantomJS, CasperJS are all unmaintained.
  • Wkhtmltopdf is very outdated, advises "Do not use wkhtmltopdf with any untrusted HTML", and chokes on many common JS functions that are part of ECMA-6.
    Thanks :-)

Using a headless browser to generate PDFs has been common practice for over a decade, since browsers solve the hard part of handling PDFs: declarative layout. Firefox has better paged media support than Chrome, where any improvements to CSS for paged media have been blocked on switching to a new layout engine, which only landed in recent weeks (for print at least). Firefox could be a more attractive option for this use case, but the lack of an easy CLI interface makes this needlessly hard.

This would be a very useful addition. At the moment, to programmatically generate PDF from HTML/CSS/JS with support for paged media CSS rules, headless Chrome CLI is basically the only option, as others have noted. Doing this in Firefox with a simple CLI option would open up a lot of use cases, and provide a (possibility already better) much needed alternative for a common use case.

With the Firefox CLI, it would be amazing if PDF-printing could always run cleanly in its own process (i.e. don't share, modify, or lock against the user's Firefox's profile, even if invoked more than once before the previous job has finished) and also please ensure that it works ok even when packaged as snap (unfortunately, snap packaging breaks everything in the (common-for-automation) situation where the user is a daemon and does not have a home-directory under /home). Finally, please make sure we can define the screensize/page-size. Thanks!

Please also consider one other use-case: we may want to trigger PDF generation, not to get the PDF, but because it makes Firefox download the URL and execute the JS within it. This is useful for testing some applications.

Btw, as a workaround at present, I have to run this, which works acceptably well (or at least did, until snap broke everything).

mkdir /tmp/firefox_tmp_home
export HOME=firefox_tmp_home
timeout 60 xvfb-run -a -s -screen 0 1280x1024x24 flock -w 50 /tmp/firefox_tmp_home firefox --headless --no-remote --window-size 1448,1024
--screenshot PDF_NAME THE_URL

The temporary home is so that it doesn't interoperate with my desktop browser.
flock is so that multiple instances of the screen-shotting script don't fight, but are forced to serialise.
The xvfb-run is needed when your script is not running under X (and doesn't somehow inherit an X-environment).
Neither the screen size nor the --window-size really work well enough to correctly define the output pdf's dimensions.
Timeout forces this to exit if it gets stuck.

HTH - Thanks.

(In reply to Richard Neill from comment #19)

mkdir /tmp/firefox_tmp_home
export HOME=firefox_tmp_home
timeout 60 xvfb-run -a -s -screen 0 1280x1024x24 flock -w 50 /tmp/firefox_tmp_home firefox --headless --no-remote --window-size 1448,1024 --screenshot PDF_NAME THE_URL

Would it make it slightly cleaner a temporary Firefox profile directory? Example:

PROFILEDIR=$(mktemp -p ~/tmp -d tmp-fx-profile.XXXXXX.d)
timeout 60 xvfb-run firefox --profile $PROFILEDIR ... (other params)
rm -rf $PROFILEDIR

Screenshotting is not a real solution, the resulting PDF won't have selectable text or clickable links.

Screenshotting is not a solution, however you can use xvfb to puppet the firefox pdf print dialog.
It just makes the process a little more complicated. A dedicated way without fake X sessions would undoubtedly be easier/more reliable/faster.

Attached file FIREFOX-HTML-TO-PDF.sh

Here is a simplified version of (X)HTML->PDF conversion from my BASH script for converting formats that uses FIREFOX via XDOTOOL. The script uses a directory you provide as first CLI argument and recursively converts all (X)HTML files to PDF files. Note that PDF files are not saved next to (X)HTML files, but are saved to the last used save directory depending on your FIREFOX configuration. Using XDOTOOL to edit the save location is probably possible, but I preferred to keep the script simple instead. The script uses EXO-OPEN to start FIREFOX, but you can easily adapt the launch command if you do not use XFCE.

If CLI option for printing is implemented by FIREFOX I would appreciate it as it would greatly simplify the (X)HTML->PDF conversion.

I used Puppeteer to make it work. I'd recommand the following code as you need to use page.setContent and not page.goto to make it work using firefox product on Puppeteer.

const puppeteer = require("puppeteer");
const path = require("path");
const fs = require("fs");
const filePath = path.resolve(__dirname, "index.html");

(async () => {
  try {
    const puppeteerVersion = require('puppeteer/package.json').version;
    console.log(`Using Puppeteer version ${puppeteerVersion}`);
    console.log(`This is the path: ${filePath}`)

    const browser = await puppeteer.launch({
      product: 'firefox',
      headless: true, // Use true to run headless, not 'new'
      dumpio: false // Error logging
    });
    const page = await browser.newPage();
    var contentHtml = fs.readFileSync(`${filePath}`, 'utf8');
    await page.setContent(contentHtml);
    // await page.goto(`file:${filePath}`);
    await page.pdf({
      path: "output.pdf",
      format: "A4",
      displayHeaderFooter: false,
      margin: { top: 0, right: 0, bottom: 0, left: 0 },
      // preferCSSPageSize: true,
      // printBackground: false, // Change to true if you want to include background
    });
    await browser.close();
    console.log("✅ PDF built");
  } catch (error) {
    console.error("❌ Error building PDF:", error);
    process.exit(1);
  }
})();

I'm using WSL2 Ubuntu latest version.

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy

For using my exact setup :

  1. Install WSL by using wsl --install for having the last stable ubuntu version
  2. Used curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.5/install.sh | bash to install nvm
  3. Used nvm install --lts to install npm
  4. Installed resume-cli using npm install -g resume-cli
  5. Cloning my resume repo using git
  6. Installed puppeteer by using PUPPETEER_PRODUCT=firefox npm install puppeteer

Another reason this is useful (besides firefox doing better generation at times - we encountered yet another chrome print to pdf generating unselectable unusable tables in PDF editors today which reminded me of this) is that chrome also has a data uri length limit that Firefox does not.
If Firefox had the print to pdf option on the commandline like chrome did, it would allow doing headless firefox pdf generation using data URIs (no temp files) which can be convenient and cleaner if you have the ram (and a recent linux 2.6 which derives the argument length from the stack size which can be set with ulimit -s ☺ )

https://stackoverflow.com/questions/74218933/chromium-headless-pdf-generation-in-java-using-string-instead-of-temp-file-pag

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: