Closed Bug 1325032 Opened 7 years ago Closed 7 years ago

[Discussion] The way to integrate PDFium library

Categories

(Firefox :: PDF Viewer, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: brsun, Unassigned)

References

Details

Based on current understanding, PDFium library would be used for more than one projects:
 - Mortar project would use PDFium library to render contents of PDF files, and PDFium library would be wrapped into PPAPI interfaces by using existing solutions from chromium. (bug 1273059)
 - Printing project would use PDFium library to convert PDF into EMF format on Windows. (bug 1321496)

In Mortar project, the PDf viewer would be packed as one system add-on. As part of the add-on, PDFium library and the PPAPI wrapper would be built into one combined dynamic library, and the add-on would only use PPAPI interfaces to communicate with PDFium library indirectly. In Printing project, however, PDFium library would be used directly by Gecko.

This bug is used to bring up as many solutions as possible to integrate PDFium among these two projects.
There are some possible solutions with their pros and cons:
 - a. Mortar project builds the combined dynamic library (PDFium + PPAPI wrapper) by itself. Printing project lands the sources of PDFium into Gecko and use its functions directly.
   = Pros: Each project maintains its own version of PDFium. Version changes of PDFium in one project won't affect the other project.
   = Cons: There will be two separated PDfium binary chunks contained in our browser. One exists as parts of the combined dynamic library in the system add-on, another one exists within Gecko itself. It would be kind of waste regarding to the disk space and the memory footprint.
 - b. Mortar project does the cleanup of PPAPI wrapper and removes dependencies of chromium out from the wrapper (bug 1273377). Then Mortar project lands the sources of PDFium and the "thinner" PPAPI wrapper into Gecko. And Printing project uses the same PDFium sources as what Mortar project uses.
   = Pros: There will be only one PDFium binary chunk exists in our browser.
   = Cons: Although all dependent modules are listed on bug 1273377 already, the real efforts are still unknown due to the scale of chromium itself. It might not be a trivial task to cleanup PPAPI wrapper.
 - c. Printing project lands the sources of PDFium into Gecko. Mortar project removes all dependencies of PPAPI interfaces and uses PDFium functions directly from the sources of PDFium.
   = Pros: There will be only one PDFium binary chunk exists in our browser.
   = Cons: The efforts to use PDFium library directly in the system add-on are unknown. It might not be a trivial task because we might need to re-implement most of the logic behind the PPAPI wrapper by ourselves.
 - d. Mortar project build PDFium and the PPAPI wrapper as two separated dynamic libraries. The separated PPAPI wrapper dynamic library contains no implementation of PDFium, it uses PDfium dynamic library instead. Printing project uses the same PDFium dynamic library as the one used by Mortar project.
   = Pros: There will be only one PDFium binary chunk exists in our browser.
   = Cons: The efforts to separate PDFium library from the PPAPI wrapper are not investigated yet. The feasibility of this option might highly depend on the capability of GN build scripts. Security concerns might be involved because the chrome process will load one dynamic library and call its functions directly.
 - e. Mortar project build one combined dynamic library (PDFium + PPAPI wrapper) with additional settings to export not only PPAPI interfaces but also PDFium functions. Printing project uses PDFium functions exported from the combined dynamic library to convert EMF from PDf on Windows.
   = Pros: There will be only one PDFium binary chunk exists in our browser.
   = Cons: Security concerns might be involved because the chrome process will load one dynamic library and call its functions directly.

Any other ideas are welcome.
No longer depends on: 1324708
Hi Mike,

I was told that you are the go-to person for dll loading. If not, feel free to clear ni or point to someone else. Thanks! :-) 

Please see problem description and options we have in comment 0 and comment 1. (A formatted version of comment 1 - https://pastebin.mozilla.org/8955954)

Since the code will be copied from chromium codebase, we also need to consider maintenance effort in the future. More modification and cleanup indicate higher possibility to conflict if upstream code changes (e.g. for security update). Therefore I personally prefer option (e) if there is no security concerns.

What do you think?
Flags: needinfo?(mh+mozilla)
Group: mozilla-employee-confidential
Sorry for the delay.

> Cons: Security concerns might be involved because the chrome process will load one dynamic library and call its functions directly.

I don't see how e and d are any different on that matter.

a is definitely not a good idea. b and c, I can't tell, but it's definitely additional work on the implementation side.

Side note: does this mean we're going to have a *fourth* copy of (some of) the chromium base stuff? We already have one for ipc, one for webrtc, and one for sandbox ; this is getting ridiculous.
Flags: needinfo?(mh+mozilla)
Hi Mike,

Is there any guideline to mitigate the unease feeling of integrating third-party module? For example, we have to make sure there are no malicious functions calls within that module, or we have to use some restriction similar to sandbox to isolate that module, or we prefer source codes over binary libraries, or anything else.

By the way, how do you think about d and e? Good ideas or bad ideas?
Flags: needinfo?(mh+mozilla)
(In reply to Bruce Sun [:brsun] from comment #4)
> Hi Mike,
> 
> Is there any guideline to mitigate the unease feeling of integrating
> third-party module? For example, we have to make sure there are no malicious
> functions calls within that module

Ideally, third party code needs to be reviewed.

> or we have to use some restriction
> similar to sandbox to isolate that module

That's usually a good idea.

> or we prefer source codes over
> binary libraries, or anything else.

We never import binary libraries, we always build from source.

> By the way, how do you think about d and e? Good ideas or bad ideas?

d and e are the least bad of the bunch, and they are essentially both the same thing.
Flags: needinfo?(mh+mozilla)
Update: The PDFium binary library used in solution d and e are built outside mozilla build system. Source codes of PDFium are not integrated into m-c regarding to solution d and e. After re-explaining these proposed approaches to Mike again, he thinks solution d and e are not good at all. And he suggests that everything needs to be built from m-c.
There are three proposals with pros and cons for PDF printing(Windows only).
1. We found PDFium could generate the bitmap, so we plan to generate bitmap in Plugin Binary process, then send to Chrome process for printing.
  [Pros] a. we don’t need extra dll to deal with PDF.
         b. It doesn’t need DC.

  [Cons] a. For web page printing, we would apply SkiaPDF. In this case, it has to involve Plugin Binary process.
         b. Need to consider if bitmap binary of one page is too large(under check).

2. Generate EMF in Plugin Binary Process, then send to Chrome process for printing.
  [Pros] We don’t need to load extra dll.

  [Cons] a. Web page printing(SkiaPDF) has to involve Plugin Binary Process.
         b. Need to integrate EMF code to Plugin Binary Process, create EMF DC in Plugin Binary Process.

3. Create a sandboxed process to load PDFuim.dll, generate EMF in this process, then print EMF in chrome process.
   [Pros] SkiaPDF printing doesn’t rely on Plugin Binary Process.

   [Cons]a. Need to create an extra process and an extra dll in gecko.
         b. Need to handle PDFium.dll release or integrate PDFium source code into Gecko.
(In reply to Farmer Tseng[:fatseng] from comment #7)
> 2. Generate EMF in Plugin Binary Process, then send to Chrome process for
> printing.
> [Pros] We don’t need to load extra dll.
> 
> [Cons] a. Web page printing(SkiaPDF) has to involve Plugin Binary Process.
> b. Need to integrate EMF code to Plugin Binary Process, create EMF
> DC in Plugin Binary Process.
> 
> 3. Create a sandboxed process to load PDFuim.dll, generate EMF in this
> process, then print EMF in chrome process.
> [Pros] SkiaPDF printing doesn’t rely on Plugin Binary Process.
> 
> [Cons]a. Need to create an extra process and an extra dll in gecko.
> b. Need to handle PDFium.dll release or integrate PDFium source
> code into Gecko.

For proposal 2 and 3, I will check if allow to create EMF DC in sandboxed process.
I will try it in Content process in advance.
No longer blocks: 1329881
(In reply to Farmer Tseng[:fatseng] from comment #7)
> There are three proposals with pros and cons for PDF printing(Windows only).
> 1. We found PDFium could generate the bitmap, so we plan to generate bitmap
> in Plugin Binary process, then send to Chrome process for printing.
>   [Pros] a. we don’t need extra dll to deal with PDF.
>          b. It doesn’t need DC.
> 
>   [Cons] a. For web page printing, we would apply SkiaPDF. In this case, it
> has to involve Plugin Binary process.
>          b. Need to consider if bitmap binary of one page is too large(under
> check).
> 
One A4 page bitmap is around 139 MB. It should not be accepted.
I think we've got a conclusion that we will slim down the code and check both PDFium + PPAPI wrapper into m-c, and expose PDFium interface for printing usage.

So shall we close this bug?
Flags: needinfo?(brsun)
OK.

The final decision is to integrate PDFium and PPAPI wrapper codes directly into m-c. Close this discussion.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(brsun)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.