Open Bug 1824470 Opened 2 years ago Updated 2 years ago

pages with extremely long titles or URIs (e.g. data URIs) fail to print with some printers/drivers, presumably due to absurdly long job-title as shown in printer queue

Categories

(Core :: Printing: Output, defect)

x86_64
Windows 10
defect

Tracking

()

REOPENED

People

(Reporter: mt, Unassigned, NeedInfo)

References

Details

Attachments

(8 files)

I attempted to print a PDF that was served as a data: URI and while the preview worked perfectly, nothing was ever sent to the printer. Saving the PDF to a file and printing from there worked fine.

Background: The Australian government provides internationally recognized COVID vaccine certificates on its website in PDF form. When you "download" the certificate, they serve a page in which the PDF is rendered in a frame that takes up much of the page (so that it almost looks like only the PDF was downloaded), except that the iframe src is set to a ludicrously large data: URI.

The PDF renders perfectly and, when I hit the print button, the print dialog is shown, with a functional preview. Choosing to print using either the Firefox print button or the Windows dialog ("print using the system dialog") seems to work. However, nothing was ever sent to the print queue (which I watched closely in a separate window).

Hmm, that's really odd, is there something interesting in the browser console (Ctrl+Shift+J)? If you could attach (as a private attachment) a repro it'd be amazing, but I understand if that was not possible.

Flags: needinfo?(mt)

(I'm also curious if Firefox's "Save to PDF" works properly for this STR -- or if it fails, how/where it fails. But: as Emilio noted, hopefully there might be some relevant info about a particular error in the browser console.)

The browser console wasn't very helpful here.

I was able to use "Save to PDF" successfully, as well as "Microsoft Print to PDF". That latter is a printer driver, so I think that this is down to a quirk of the specific print driver that was involved (I have a Brother Laser printer). I can share the file privately if you want to investigate further, but unless this strikes again, I'm going to suggest that this isn't worth spending a ton of time on.

Flags: needinfo?(mt)

Can you check if the generated PDF has proper text selection etc? I wonder if we're generating a too big rasterized image or something that the printer driver fails to handle somehow.

The severity field is not set for this bug.
:dshin, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dshin)

Whoops - no ni? was set there.
:mt, could you check as per bug 1824470 comment 4?

Flags: needinfo?(dshin) → needinfo?(mt)

The PDFs generated through different techniques are different sizes, but they seem perfectly functional and I can't detect any loss of fidelity in either.

The Save to PDF option is a tiny bit smaller than the original (it's a 145k vs 147k, so it might be hidden metadata or padding being removed or other small efficiencies); using the Microsoft PDF print driver inflates it by 3x (446k), but I can't see any difference in the content.

Flags: needinfo?(mt)

Martin privately shared a data URI that reproduces the issue for him; I tested with Firefox Nightly on Linux, macOS, and Win11 (load data URI, print to my physical Brother HL-2280DW). I can't reproduce so far; it prints just fine.

Given that, I'll classify this as S3, since it's not generally reproducible. (requires specific not-entirely-understood configuration/hardware and a specific testcase). Hopefully not too many users are affected. Would definitely be great to fix/understand if we can get a better idea of what's going on, though.

The only notable differences we've discovered between my/Martin's setups are:

  • the exact printer (though both are Brother printers)
  • the driver (he's got an IPP driver, and mine is "Brother Laser Type1 Class Driver", which was installed automatically for me by Windows when I configured the printer for this machine's Win11 installation for the first time today.
Severity: -- → S3

After Daniel and I spent a bit of time on this, we got nowhere.

I tried alternative drivers (which ended up completely ruining my print queue to the extent that even a reboot wouldn't fix the problem). I then removed and re-added the printer (do not try IPv6 folks; IPv4 continues to work just fine) and the original large data: URI printed just fine. We had hoped that it was down to the file name being absurdly long, but now it is going to be hard to tell.

So let's resolve this for now in case we see other people with similar issues. It's terrible that a variation on "have you tried turning it off and on again" worked here, but that's how these things roll.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME

Just in case this crops up again, I'll preserve one possible insight:

Our working theory (before the bug stopped reproducing for Martin) was that it might have something to do with the fact that the print-job-name shown in the print queue is extremely long -- it seemed to include the full data URI, which was 200,084 characters long. (The UI shows it with an ellipsis, but it was at the window boundary which suggests that it's the print-queue UI that's doing that ellipsizing and that the internal representation of that print-job-title might really have the full data URI.)

When printing just the PDF file directly (which always worked for Martin), the print queue showed the PDF filename instead (as you'd expect).
And when printing from Edge (which also worked for Martin), the print queue showed a truncated version of the data URI.

So it's conceivable that a flaky print driver, or some other intermediate print-related mechanism, was getting confused by the extremely-long "job title" (or whatever the appropriate name would be). If that were the case, then we could take some actions on our side to truncate that title at some reasonable upper limit, to avoid tripping this driver bug (or whatever it is). But without the ability to reproduce the issue anymore, it's not worth pursuing that at this point.

Another addendum. This problem reproduced on two machines and one of them was updated with new printer drivers and whatnot. That one no longer tickles the bug, but the other one does. In that, a very long data URI jams the print queue pretty badly. The URI Daniel provided was:

data:text/html,Before html comment <!-- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...aaaa-->After HTML comment

With 200,000 repetitions of 'a' added to the comment. These end up in the title that is used to present the print job, which might be the source of the original problem.

Perhaps there is something to be done here after all.

Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---

Here's a text file with the data URI with 200,000 'a' characters that Martin mentioned still reproduces the issue on one of his machines.

Attachment #9328079 - Attachment description: testacse 1: text file with simple data URI that reproduces the bug on martin's printer (with extremely long html comment) → testcase 1: text file with simple data URI that reproduces the bug on martin's printer (with extremely long html comment)

If I'm counting correctly in the comment 13 screenshot, it looks like Chromium is truncating the job name at 80 characters (and inserting a center ellipsis as the 40th character).

Fun fact: it seems that if there's a <title>element, then that is what gets shown in the print-queue UI, and neither Firefox nor Chromium-based browsers show the URI there at all.

So I've added a <title> in this version of the testcase. Martin, could you see if this version reproduces the issue? (on your machine that's still able to hit this) I'm guessing this one works just-fine.

If this one doesn't repro but testcase 1 continues to do so, then I think we can be pretty confident that the long job title (as shown in print queue) is indeed the source of the problem.

Flags: needinfo?(mt)

Just for completeness, here's another testcase where there's a title but the title itself is extremely long (~200,000 characters). So we'll still end up showing something extremely-long as the job title and presumably trigger the same issue.

Interestingly, for testcase 3 (with an extremely long <title>), Chromium does the center-ellipsis thing as shown in the earlier screenshot, but for some reason, they still don't make it all the way to the end of the <title> element's text. (They still show "aaa" at the end, when really the title ends with ThatWasTheTitle</title>.

I wonder if that's due to a bug, or to another layer of truncation that happens earlier, or something else. shrug

It prints! That suggests that we just need to be more careful with how we populate the title of the print job.

Flags: needinfo?(mt)

Also, the very long title does not print. I do have another bug to file though as a consequence, but it is only tangentially related.

Great, thanks for confirming! I'm glad we got to the bottom of this.

(In reply to Martin Thomson [:mt:] from comment #20)

I do have another bug to file though as a consequence, but it is only tangentially related.

For other curious folks: the other bug was bug 1827778.

[clarifying bug summary now that we understand what's going on a bit better]

Summary: Printing PDFs from a data: URI → pages with extremely long titles or URIs (e.g. data URIs) fail to print with some printers/drivers, presumably due to absurdly long job-title as shown in printer queue

I recorded a pernosco trace of me printing the testcase 1 data URI on Linux:
https://pernos.co/debug/h5YJvi5IWhNniwt4nAaozw/index.html#f{m[BF0u,40w_,t[KQ,Au4H_,f{e[BF0u,4yw_,s{afzB7YAAA,bCfs,uESWv3g,oES1kEw___/

It gives us an idea of what happens on Windows, though things aren't exactly the same.

On Linux, in my pernosco trace at least:

  1. We start printing in nsPrintJob::Print
  2. Several layers down, we hit nsPrintJob::SetupToPrintContent, which calls GetDisplayTitleAndURL which is what gets the title and URL (also used in several other places, e.g. page headers). We pass eDocURLElseFallback which is what says "use the URL as the title, if there's no title" which is why testcase 1 shows the URL and testcase 2-3 shows the document title in the print queue.
  3. This calls nsDeviceContext::BeginDocument, passing in the title.
  4. That gets proxied over to the parent process via its mDeviceContextSpec which is a nsDeviceContextSpecProxy
  5. So then the parent process ends up calling nsDeviceContext::BeginDocument
  6. That calls two important things, passing the title to both of them:
    A. mPrintTarget->BeginPrinting()
    B. mDeviceContextSpec->BeginDocument()

On Linux, the mPrintTarget->BeginPrinting() call is a no-op, and the mDeviceContextSpec->BeginDocument() call makes a note of the title and truncates it for old GTK versions

On Windows, I think it's the opposite (just based on code inspection). The mPrintTarget->BeginPrinting() call makes a note of the title and supposedly truncates it to a length of MAX_PATH - 1, and the mDeviceContextSpec->BeginDocument() call is a no-op.

So it looks like PrintTargetWindows::BeginPrinting is where this is supposed to be handled, and it's already got some code there that seems to be intended to handle this. Needs further investigation to see what MAX_PATH is, why the truncation isn't sufficient, and to confirm that the title we set there is actually what shows up in the print queue, to confirm my analysis here.

Looks like https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry says "In the Windows API [...] MAX_PATH [...] is defined as 260 characters."

And that's believable -- in my screenshot, we are actually only showing about ~220 characters (the text area is 803px wide; each "a" character is 7.4px wide based on measuring the width of 5 of them; and so 803/7.4 = ~109 'a' characters could fit on a line.

So my Windows screenshot probably shows ~220 characters total, which is probably an ellipsized-by-Windows version of our own ellipsized 259-character-long string (256 characters plus 3 dots that we added).

It's suspicious that this is just over 255. I wonder if the print driver allocates space for 255 or 256 characters and we're overshooting that and it's getting upset with us.

Looks like our usage of MAX_PATH dates back to bug 531606 comment 16:

The maximum length of print job title is MAX_PATH (260) per MSKB.
http://support.microsoft.com/kb/281128

That knowledge-base link is 404 but fortunately we have wayback machine cache:
https://web.archive.org/web/20080416134556/http://support.microsoft.com/kb/281128

It does indeed say:
"The StartDoc function validates its parameters by checking that the length of the lpszDocName and the lpszOutput members of the DOCINFO structure are less than MAX_PATH. "

So what we're doing does seem to be kosher according to that documentation, at first glance. It's still possible there's a bug in the print driver where it expects the title to be below some lower threshold like 255, though.

Depends on: 531606

Martin, could you try these testcases? If your troublesome print-driver has an automagic threshold in the neighborhood of 255 or 260, these would be on either side of it.

(A) This has a 261-character-long title [maybe "bad"?]
data:text/html,<title>LongTitleaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</title>

(B) This has a 254-character-long title [maybe "good"?]
data:text/html,<title>LongTitleaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</title>

I predict that (A) might be "bad" and (B)" might be "good". If so, perhaps you could try adding characters to the "good" one 1 at a time and see how many you have to add for it to trip over a limit into badness?

(Note, presumably we truncate the first data URI from 261 to 259 (MAX_PATH-1) chars before the driver gets to see it. I just picked 261 as a simple "clearly larger than MAX_PATH" example, making it hypothetically equivalent to the troublesome testcases that we've been using so far.)

Flags: needinfo?(mt)

A 255 character title prints. A 256 character title doesn't. That seems to me like the problem is not MAX_PATH, but an 8-bit field length overflowing.

Flags: needinfo?(mt)

Great. Let's just reduce the clamp a little to 255, then, to mitigate this issue with drivers that can't handle the full MAX_PATH length.

Flags: needinfo?(dholbert)

One more thing that'd be worth testing: a title with length < 255 but with non-ASCII characters.

Here's a data URI with 254 characters, but each character is encoded in multiple (2 I think?) bytes via UTF:
data:text/html,<meta charset='utf-8'><title>問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問問</title>

I wonder if the troublesome driver is happy with that or not... Depends on whether it's checking the wide string's length vs. its storage requirements.

Before this patch, we used MAX_PATH (260) as our upper limit, based on an old
Microsoft knowledge base article[1]. But it seems some print drivers fail to
print if we print with a title longer than 255 characters. So let's use that
as the upper limit.

[1] see https://bugzilla.mozilla.org/show_bug.cgi?id=1824470#c25

Patch posted to just take the straightforward approach; but I'm curious what Martin sees when testing the URI in comment 29. (If Martin's printer rejects that value, due to e.g. flattening to UTF-8 or ASCII internally, we may need a slightly more nuanced fix.) Hence, patch posted as "wip" for now.

Flags: needinfo?(dholbert) → needinfo?(mt)

No surprise here: the printer chokes on the title at 254 characters. Those encode to three bytes each, so I also tested 85 of those plus an "a" (256 bytes in UTF-8), which also failed. If I drop the "a", that printed, which would seem to confirm the "bytes-of-UTF-8" hypothesis.

Flags: needinfo?(mt)
Attached image saveas.png

There might be other benefits to cutting the length to 255 bytes.

Flags: needinfo?(dholbert)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: