Closed Bug 1661157 Opened 4 years ago Closed 4 years ago

Multiple seconds of delay (with little CPU activity) before print dialog appears

Categories

(Core :: Printing: Setup, defect, P1)

defect

Tracking

()

RESOLVED FIXED
82 Branch
Tracking Status
firefox80 --- disabled
firefox81 --- disabled
firefox82 --- fixed

People

(Reporter: dholbert, Assigned: alaskanemily)

References

Details

(Whiteboard: [print2020_v82][old-ui-])

Attachments

(6 files)

STR:

  1. Ensure you have print.tab_modal.enabled = true in about:config
  2. View some simple page, e.g. https://www.example.org/
  3. Ctrl+P (or Cmd+P on mac)

EXPECTED RESULTS:
Print dialog (or some visual feedback) should appear with no delay or minimal delay.

ACTUAL RESULTS:
Multiple seconds go by before Firefox shows any sort of UI / any sort of visual change to register the fact that the user pressed a key command.
[EDIT/UPDATE: After bug 1661361's improvement landed, we now fortunately do show the print UI quickly; but it's not usable for a little while. It has a loading indicator for a suspiciously long period of time.]

I'm hitting this on latest Linux Nightly (and have noticed it over the past few days at least).
Hiro says he can't reproduce, also on Linux Nightly, so there may be some sort of environmental requirements -- e.g. in my case, my system has two network printers configured, one of which is not currently present on my network.

If I turn off the print.tab_modal.enabled pref, I can no longer reproduce the issue - the system dialog appears immediately with Ctrl+P, and Print Preview appears immediately with "Alt+F, V".

So I believe this is a case where the new print UI is introducing a regression in behavior.

Profile: https://share.firefox.dev/3jeXtPz
I do Ctrl+P at around the 3.2s mark, and nothing seems to happen until 5.6s when we finally show the print dialog.

Whiteboard: [print2020][old-ui-]

Here's an even-worse profile: https://share.firefox.dev/3gx3sgK

I hit Ctrl+P around the 1.3s mark, and the dialog doesn't appear until ~7.6s (over six seconds later).

(And during the wait time, there's no visual indication that anything is going on.)

That's not ideal…
Tagging Emma, who's done a bunch of perf work, and JWatt cause it might be something on the platform side…

Severity: -- → S1
Priority: -- → P1
Whiteboard: [print2020][old-ui-] → [print2020_v81][old-ui-]

This is particularly bad the first time I print in a given browsing session. After that, there's still a delay but it's less than a second long.

FWIW, here are the printers that Ubuntu knows about.
As noted in the screenshot:

  • the top printer and the bottom printer are actually the same printer (but the bottom is a version that I configured, whereas the top is some auto-discovery magic).
  • the middle printer ("Rumpus-Printer" is not currently present on my network)

Also: Rumpus-Printer was my default, and I thought maybe that was causing the problem - but I suspect that's not the whole story, because I still see multi-second delays after changing my default to "SoquelPrinter" (which is present on my network), including in a brand-new fresh profile that I created after changing my default.

(Note that the profile shows a suspiciously long "firstcontentfulpaint" bar for the content process -- this bar seems to be bogus, and I filed bug 1661166 on cleaning that up.)

I suspect that this is probably due to the frontend waiting until the Promise returned by nsIPrinterList.printers resolves before showing the UI. Bob is working on adding some telemetry for the various Promise based APIs we're using in bug 1660686. Once that lands hopefully about:telemetry will give you an indication of what's slow for you.

I built locally with https://phabricator.services.mozilla.com/D88302 applied and then with NS_DISPATCH_EVENT_MAY_BLOCK manually added as an arg as shown in https://phabricator.services.mozilla.com/D88327 (I didn't directly apply that patch, because it collides with the other one, but I'm pretty sure it's just adding , NS_DISPATCH_EVENT_MAY_BLOCK so I did that locally).

I then used the resulting opt build to view https://example.org/ and did Ctrl+P, and 5-10 seconds went by before the dialog appeared. Here's a screenshot of the resulting about:telemetry#search=print_ metrics for that slow operation, in case it helps.

If I use the "copy" button that appears on hover, I see that precise value for the 1 sample here is 7,401 (i.e. there was a 7.4 second delay, I think), for "Printers". (It's 7,401 for the top-right histogram as well as the bottom-right histogram.)

See Also: → 1661361

Here's another screenshot after I've done Ctrl+P a bunch of times on a bunch of domains (canceling the dialog each time), using the same build as before (after quitting & restarting).

The initial print was the longest one here (it's responsible for the 3744 result here), and then most of the other operations were short, with a few ~2s-long operations sprinkled in.

The overall pattern I've observed here is that the initial print operation is pretty much always the longest, and after that things are typically faster (though occasionally there are a few slow operations).

(In reply to Daniel Holbert [:dholbert] from comment #7)

I'm pretty sure it's just adding , NS_DISPATCH_EVENT_MAY_BLOCK

Correct.

(In reply to Daniel Holbert [:dholbert] from comment #1)

Here's an even-worse profile: https://share.firefox.dev/3gx3sgK

Can you add the IO threads manually in about:profiling and redo the profile? The CUPS task that is blocking is run on that thread pool but by default the profiler doesn't include it.

Flags: needinfo?(dholbert)

Sure, here you go -- this profile was captured using latest nightly, with IO threads profiled:
https://share.firefox.dev/3hHkNFq

I did "Ctrl+P" just after the 7s timestamp, and the UI didn't appear until just before the 19s timestamp.

Flags: needinfo?(dholbert)

Thanks. Do you have the NS_DISPATCH_EVENT_MAY_BLOCK change to make us use the IO thread pool in your tree currently? The profile isn't showing anything much in the call tree for the "Gecko_IOThread" thread.

(In reply to Jonathan Watt [:jwatt] from comment #12)

Thanks. Do you have the NS_DISPATCH_EVENT_MAY_BLOCK change to make us use the IO thread pool in your tree currently?

I did have that change, yeah. (Not applied myself, but it was included via having landed, in time to make it into my Nightly.)

In particular: I was using Nightly, built from https://hg.mozilla.org/mozilla-central/rev/44ee384376ce8b20ef64db6e4fef50af983c2088
...and as of that revision, we do have the NS_DISPATCH_EVENT_MAY_BLOCK line:
https://hg.mozilla.org/mozilla-central/annotate/44ee384376ce8b20ef64db6e4fef50af983c2088/widget/PrintBackgroundTask.h#l64
(note the revision ID in this ^ URL is the revision that my Nightly was built from)

Anyway - as an additional data point / in case it helps, here's another profile that I just captured from the slightly-newer now-latest Nightly (from the "later-in-the-day" nightly release); this one was built from https://hg.mozilla.org/mozilla-central/rev/109f3a4de56716d3e1162c2368fe37b8a377f72f
Here's that new profile: https://share.firefox.dev/2FZO145

For this new profile, I also checked three non-default IO-related boxes in the profiler settings, in case they help:
Main Thread File IO
Profiled Threads File IO
All Threads File IO

One other thing I'm noticing: in all of the profiles here, for the whole time between Ctrl + P and the print dialog appearing (when nothing is visually happening), there's oddly quite a lot of activity on the Compositor thread.

This is especially noticeable in the latest profile from comment 13, where the activity is high enough to be flagged with red jank bars. Here's a view of that profile, with the Compositor thread selected, zoomed to the relevant time range: https://share.firefox.dev/31BCfoQ

I wonder if we're rendering the "print preview is loading" animated throbber graphic in a hidden document or something? That's my best explanation. Anyway, this is kind of a tangent, but it stands out as another interesting thing in these profiles / during this time range when we're spinning our wheels.

Another profile (using same latest nightly build as in the "new profile" in Comment 13): https://share.firefox.dev/3b5hPaW

(Here I did Ctrl+P at around the 5.6-second timestamp; the dialog appears at 13.7 seconds, i.e. there's an ~8 second delay.)

I added some logging to narrow in on the slow part here, and it looks like it's this line in particular (which, under the hood, is required to resolve the printerList.printers promise for our JS code):

    cups_dinfo_t* ownedInfo =
        sCupsShim.cupsCopyDestInfo(CUPS_HTTP_DEFAULT, ownedDest);

https://searchfox.org/mozilla-central/rev/0c682c4f01442c3de0fa6cd286e9cadc8276b45f/widget/nsPrinterListCUPS.cpp#53-54

(CC'ing emilio who has hg-blame on that ^ function-call, in case he can weigh in with ideas here).

Also: interestingly, the above-quoted function-call returns null after it finishes this slow operation (i.e. ownedInfo is null at the end of the slow operation). For other loop iterations, it does return non-null values -- but for the slow one, it returns null.

Inspecting ownedDest tells me that this is happening for my network printer named Brother_MFC_J6535DW which might be a weird faux-printer... There is a real printer of that type that does exist on my network, but I have that printer manually configured under a different name. So I think this version might be an auto-discovered duplicate version and could conceivably be semi-broken via Ubuntu misconfiguring it on my behalf, perhaps.

If I look at this printer (the one with name Brother_MFC_J6535DW) in my Ubuntu printer settings, I see several interesting things:

  • Settings-Gear-Icon | "Printing Options" produces an empty dialog.
  • Settings-Gear-Icon | "Printing Details" tells me Address: Brother MFC-J6535DW._ipp._tcp.local:631 and Driver: Brother MFC-J6535DW

The more old-school / poweruser "Additional Printer Settings" does not list this version of the printer at all, nor does the CUPS web interface ( http://localhost:631/printers/? ). So there's clearly something slightly-broken/misconfigured about this printer entry.

Here's a screenshot showing my various print settings dialogs discussed in previous comment.

As you can see, Brother_MFC_J6535DW is only listed in the first one (the main user-facing Ubuntu print settings dialog).

(In reply to Daniel Holbert [:dholbert] from comment #16)

(CC'ing emilio who has hg-blame on that ^ function-call, in case he can weigh in with ideas here).

I just moved that function from the nsPrinterCupsConstructor on the main thread to somewhere else. The call was introduced in bug 1654678.

Emily, any ideas? Seems like this printer is not reachable and we probably have a pretty high timeout to confirm it.

Err, see above.

Flags: needinfo?(emcdonough)

I suspect this is because of the delay on getting the "readable" or "localized" names of printers in nsIPrinter::DefaultSettings. Doing so requires actually contacting the printer (see https://searchfox.org/mozilla-central/source/widget/nsPrinterCUPS.cpp#60 )

We do this on both Linux and OS X, although it's only needed on OS X and isn't actually necessary on Linux. I think it's possible we could get around this by using Cocoa to fetch the readable name, which I believe is cached at the Cocoa level, and just not bothering to do this at all on Linux since we use the Unix/CUPS name there. It might be possible to get this information in another way, since the readable name is definitely known to the CUPS server, but I do not know where it would be stored (perhaps another un(der)documented CUPS dest option).

I would personally be happier if we just never used the readable name at all, but it seems this is the normal thing to show on OS X at least.

Flags: needinfo?(emcdonough)

(In reply to Emily McDonough [:alaskanemily] from comment #21)

I suspect this is because of the delay on getting the "readable" or "localized" names of printers in nsIPrinter::DefaultSettings. Doing so requires actually contacting the printer (see https://searchfox.org/mozilla-central/source/widget/nsPrinterCUPS.cpp#60 )

I don't think so. From watching my logging & from single-stepping in GDB, it seems that it's specifically the call to cupsCopyDestInfo that's slow for me (see comment 16).

I added logging around the cupsConnectDest call that you linked to, and it doesn't seem to ever be called for me.

Ah, OK. It should be possible to make the info be lazily retrieved into the nsPrinterCUPS when we need it, although pretty much any property except the name will need it. But that would at least get it out of the call to get all printers.

I had thought that would be a simple getter, but it does accept an http_t and so was clearly marked as being able to do network IO.

See Also: → 1662360
Component: Printing → Printing: Setup
Product: Toolkit → Core

(In reply to Daniel Holbert [:dholbert] from comment #16)

Also: interestingly, the above-quoted function-call [cupsCopyDestInfo] returns null after it finishes this slow operation (i.e. ownedInfo is null at the end of the slow operation).

FWIW, I'm trying this again today, and today I'm actually seeing this function call (cupsCopyDestInfo) return something non-null -- though it still takes several seconds to return (as observed via gdb single-stepping, as well as via printf logging before/after). I don't know what's responsible for the non-null vs. null return value, or how much to worry about that; there are several possible explanations. (The printer in question has since been rebooted, and I also had to reinstall my OS after it became corrupted yesterday afternoon.) In any case, I'm glad that I can still reproduce the slowness, at least.

Also: bug 1661361 has made this bug somewhat less painful/confusing, because now I at least get some visual feedback that my Ctrl+P action was accepted. (This still feels like it's probably a P1 bug, though, I think.)

Assignee: nobody → emcdonough
Status: NEW → ASSIGNED

I can confirm that the issue is not present on macOS 10.14 with the latest 81.0b5. However I am still able to reproduce it on macOS 10.15 on the same device (new profiles)

Just wondering whether there is any suspicious error messages in /var/log/cups/error.log (on Ubuntu)?

(In reply to Hiroyuki Ikezoe (:hiro) from comment #27)

Just wondering whether there is any suspicious error messages in /var/log/cups/error.log (on Ubuntu)?

Nothing especially suspicious, no.

(I only see one error there, which is related to my recent OS-reinstall that I mentioned in comment 25 & hence probably wasn't an error in my logs back when I filed this. The error is saying that I'm now missing a print driver for an old not-actually-on-my-network printer that I have configured in my [restored] cups settings. But that printer isn't the one that was experiencing the slow cupsCopyDestInfo calls that I noted above.)

I did push a try build with a speculative change which hopefully improves this issue. What I did is what Chrome does. Chrome ignores scanners, faxes and CUPS_PRINTER_DISCOVERED. To be honest I don't quite understand what CUPS_PRINTER_DISCOVERED means, but according to the comment in the Chrome code, it sounds an important flag. (Unfortunately I can't see the Chromium bug in the comment)

Here is a mac build, and here is a linux64 build.

Vlad, can you please test this binary on your Mac?

Also Lee if you are interested in, please test the linux binary on your environment (I am hoping this fixes the fetchPaperMargins error on Linux).

Flags: needinfo?(vlad.lucaci)
Flags: needinfo?(6dnail)
Whiteboard: [print2020_v81][old-ui-] → [print2020_v82][old-ui-]

This may also help with the crashes in bug 1663410 and bug 1663412

FWIW: I did a quick run of hiro's linux try-build from comment 29, and it does indeed seem to be free of any serious delays. The print dialog comes up consistently in less than a second for me, using that build.

(I haven't built/tested Emily's patches, but I suspect they would help for me as well, given that they also skip [or rather, defer] the cupsCopyDestInfo call, which is the slow part for me for at least one printer, per comment 16.)

I suspect that if ignoring those types of devices as in comment 29 changes anything it's not because those types of devices are slow but because of how matching devices are specifically configured on your machine.

Pushed by emcdonough@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0060f1eee073
part 1 - Wrap nsPrinterCUPS::mPrinterInfo in a data mutex r=emilio
https://hg.mozilla.org/integration/autoland/rev/72e213184129
part 2 - Fetch CUPS printer info as it is needed by nsPrinterCUPS r=emilio

(In reply to Hiroyuki Ikezoe (:hiro) from comment #29)

I did push a try build with a speculative change which hopefully improves this issue. What I did is what Chrome does. Chrome ignores scanners, faxes and CUPS_PRINTER_DISCOVERED. To be honest I don't quite understand what CUPS_PRINTER_DISCOVERED means, but according to the comment in the Chrome code, it sounds an important flag. (Unfortunately I can't see the Chromium bug in the comment)

Here is a mac build, and here is a linux64 build.

Vlad, can you please test this binary on your Mac?

Also Lee if you are interested in, please test the linux binary on your environment (I am hoping this fixes the fetchPaperMargins error on Linux).

If the build you mentioned is 20200909080727, I see no difference - print still hangs every time.

Flags: needinfo?(6dnail)

(In reply to Lee McFarland from comment #36)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #29)

I did push a try build with a speculative change which hopefully improves this issue. What I did is what Chrome does. Chrome ignores scanners, faxes and CUPS_PRINTER_DISCOVERED. To be honest I don't quite understand what CUPS_PRINTER_DISCOVERED means, but according to the comment in the Chrome code, it sounds an important flag. (Unfortunately I can't see the Chromium bug in the comment)

Here is a mac build, and here is a linux64 build.

Vlad, can you please test this binary on your Mac?

Also Lee if you are interested in, please test the linux binary on your environment (I am hoping this fixes the fetchPaperMargins error on Linux).

If the build you mentioned is 20200909080727, I see no difference - print still hangs every time.

No, I meant the binary link in the comment.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #37)

(In reply to Lee McFarland from comment #36)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #29)

I did push a try build with a speculative change which hopefully improves this issue. What I did is what Chrome does. Chrome ignores scanners, faxes and CUPS_PRINTER_DISCOVERED. To be honest I don't quite understand what CUPS_PRINTER_DISCOVERED means, but according to the comment in the Chrome code, it sounds an important flag. (Unfortunately I can't see the Chromium bug in the comment)

Here is a mac build, and here is a linux64 build.

Vlad, can you please test this binary on your Mac?

Also Lee if you are interested in, please test the linux binary on your environment (I am hoping this fixes the fetchPaperMargins error on Linux).

If the build you mentioned is 20200909080727, I see no difference - print still hangs every time.

No, I meant the binary link in the comment.

The link in comment 37 results in build 20200909080727, which spins forever if I attempt to print

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch
Blocks: 1664009
Flags: needinfo?(vlad.lucaci)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: