Open Bug 1875547 Opened 2 years ago Updated 1 year ago

Slow (single-threaded) PDF rendering for documents with lots of vector shapes

Categories

(Firefox :: PDF Viewer, defect, P3)

Firefox 121
All
Unspecified
defect

Tracking

()

Performance Impact low

People

(Reporter: nekohayo, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [pdfjs-performance])

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0

Steps to reproduce:

Go to the STM transport network maps page and click the PDF of the whole "System map"

(or you can download that PDF and open it locally with Firefox, to rule out the download times from your measurement)

Actual results:

After the download is complete, and render begins, on a reasonably fast CPU with 8 threads (in my case: an Intel Xeon W3520, but it would also happen on more modern multi-core CPUs like an 8th-gen i5 or i7), it takes 9 to 10 seconds to render the map.

I tested this on Fedora Linux 39, on the Wayland version of GNOME 45.3 (but I think it would also affect Xorg) with open source Radeon R9 270 graphics.

During that time, if you observe the CPU's usage, you will see that only 1 of your threads/cores is being used (at 100%), the rest mostly sit idle.

Expected results:

Firefox should use all my CPU threads/cores for rendering a PDF page's contents; doing so would presumably allow rendering such complex documents in 1 to 2 seconds instead of 9-10.

Component: Untriaged → PDF Viewer
Keywords: perf
Hardware: Unspecified → All

Could you share a profile: https://profiler.firefox.com/ ?

Unfortunately I cannot; whenever I try to record a profile (with the default "Web Developer" preset), (re)loading that PDF freezes Firefox entirely, and the process then needs to be killed.

:julienw, do you know how Jeff could get a profile ?

Flags: needinfo?(felash)

Is it the Firefox package coming with Fedora, or a Firefox coming from Mozilla directly?
Can you please share with us this information:

  • Troubleshooting information: Go to about:support, click "Copy raw data to clipboard", paste it into a file, save it, and attach the file here.

Thanks

Flags: needinfo?(felash) → needinfo?(nekohayo)
Attached file about_support.txt

It's the package coming from the Fedora 39 repositories. Attached is the about:support output for my regular non-flatpak version from the Fedora repositories.

Now, you made me realize now that I could also try on a Debian 12 machine that has Firefox flatpaked from flathub; that machine has the same CPU, and a similar GPU (still a radeonsi), so I logged into that one over RDP and sure enough, it encounters the same slow rendering of that PDF, but this time I was able to record a Firefox performance profile from the flatpak version without crashing, so here is the performance profile from it:
https://share.firefox.dev/4927dXo

I recorded the profile by loading the locally downloaded PDF file, so this eliminates the download time from the website.

Flags: needinfo?(nekohayo)

Thanks for the profile !

I don't see anything obvious in the profile, apart maybe some GC happening in the main process. It could be good to profile again with the "native allocations" feature, to see where the memory is allocated and if we can avoid some of it.

Locally I tested the same pdf with firefox and evince, this took approximately the same amount of time to display it. Only mupdf is significantly faster, but the rendering is also not perfect.
Do you see also this on your computer? That Firefox isn't significantly slower than evince?

It could be good to profile again with the "native allocations" feature, to see where the memory is allocated and if we can avoid some of it.

For some reason, on that Debian + Flatpak version, "Native allocations" is in the "Disabled" features section of the profiler, I am not able to activate that checkbox...

Do you see also this on your computer? That Firefox isn't significantly slower than evince?

Indeed, but Evince (poppler) is also known to be pretty slow I think, and there are a bunch of issues in the poppler issue tracker about slow performance. All in all, my hope is that Firefox can be faster than Evince/Poppler anyway :)

In both cases, my hunch is that the apps are trying to render such PDFs on a single thread. My theoretical idea here is that maybe there isn't something inherently wrong with the rendering process itself, but that it should be made multi-threaded, because multi-core CPUs are pretty much the norm nowadays, which wasn't the case way back when. Being able to use my 8 logical CPUs, for example, would make any such PDF render in less than two seconds, I'd presume.

Status: UNCONFIRMED → NEW
Performance Impact: --- → ?
Ever confirmed: true

(In reply to Jeff Fortin from comment #7)

It could be good to profile again with the "native allocations" feature, to see where the memory is allocated and if we can avoid some of it.

For some reason, on that Debian + Flatpak version, "Native allocations" is in the "Disabled" features section of the profiler, I am not able to activate that checkbox...

Ah yeah I believe you need to use a Nightly version of Firefox for this to work.

It looks like the team managed to do some fixes on their end, let's see how it goes :-)

(In reply to Jeff Fortin from comment #7)

In both cases, my hunch is that the apps are trying to render such PDFs on a single thread. My theoretical idea here is that maybe there isn't something inherently wrong with the rendering process itself, but that it should be made multi-threaded, because multi-core CPUs are pretty much the norm nowadays, which wasn't the case way back when. Being able to use my 8 logical CPUs, for example, would make any such PDF render in less than two seconds, I'd presume.

My understanding is that pdf.js uses 2 threads already: the decoding happens in a worker, while the rendering itself is in the main process. Spawning more workers is an interesting idea, but I'd be curious to see a wasm version of the worker (a big task though!)

Before the eventual hard task that is making this in Wasm, would it be reasonably straightforward/easier to try the "spawn more workers depending on the amount of logical CPUs available" approach to start with?

Jeff, how are things looking in the latest Firefox Nightly?

Flags: needinfo?(nekohayo)
Performance Impact: ? → pending-needinfo

The pdf is parsed and the images are generated in a worker.
In general I'm not sure that it's possible to execute the drawing instructions in different threads without knowing if the results of those instructions are overlapping or not. We could compute the bbox of each instructions and based on that we could make some chunks containing instructions which are not intersecting other ones in other chunks, but it'd take some time to do that.
An other idea could be to split the canvas in different areas and then execute all the instructions in each area in using a clipping path and we could win something if checking if a path is inside the clip path is less expensive than just drawing it. That said we'd need to take into account the overhead of spawning a worker etc...

Whiteboard: [pdfjs-performance]

(In reply to Marco Castelluccio [:marco] from comment #12)

Jeff, how are things looking in the latest Firefox Nightly?

I've been a bit under the water, sorry!

Presuming that the latest stable 128 version (still from Fedora, this time under Fedora 40 with the latest Mesa graphics drivers, kernel, etc.) is equal or superior to February's nightly in terms of performance fixes, I can say that it didn't noticeably change anything on my end; with version 128, I still see 7 to 9 seconds to render the map linked above (loaded from the local filesystem to rule out download times).

Flags: needinfo?(nekohayo)
Severity: -- → S3
Priority: -- → P3

The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Page load impact: Severe
Websites affected: Rare

Performance Impact: pending-needinfo → low

(In reply to Calixte Denizet (:calixte) from comment #13)

In general I'm not sure that it's possible to execute the drawing instructions in different threads without knowing if the results of those instructions are overlapping or not. We could compute the bbox of each instructions and based on that we could make some chunks containing instructions which are not intersecting other ones in other chunks, but it'd take some time to do that.

After https://github.com/mozilla/pdf.js/pull/19128 this might be more easily feasible.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: