Open Bug 583451 Opened 14 years ago Updated 2 years ago

Provide option to save self-contained static copy of webpages (like PDF but with HTML)


(Firefox :: File Handling, enhancement)





(Reporter: bugzilla, Unassigned)



User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729)
Build Identifier: 

Instead of relying on PDF to save static copies of webpages (see Fennec's Save as PDF) it would be interesting to provide a functionally-equivalent way of doing the same thing but with standard web technologies (HTML, CSS, etc.)

What this would mean is take a (possibly interactive) webpage, take a snapshot of the DOM, strip out all interactive pieces (e.g. <script>), include within the document all external resources needed to display the webpage (e.g. external stylesheets -> embedded stylesheets, linked images -> base-64 encoded images, etc.), (optionally) conform the document to be valid HTML (to avoid rendering errors in future browser versions or across different browsers), (optionally) optimize out dead "code" (unused CSS, invisible DOM nodes, etc.), save the document as a _single_ _HTML_ file.

There obviously are some issues to be addressed - such as how to base-64 encode images bigger than the standard limit or if animated images/videos/streaming videos should be embedded, etc. The upside would obviously be not depending on PDF viewers to open saved webpages (e.g. symbian still has no built-in PDF viewer) and, at the same time, pushing open web technologies.

Reproducible: Always
This is a dupe of one of these: bug 40873, bug 121793, bug 381413
Closed: 14 years ago
Resolution: --- → DUPLICATE
Not really. The difference is that the other bugs try to save the original HTML and embedded resources (including javascript) thus _maintaining_ interactivity. What I'm trying to propose here is a _static_ copy of a webpage, i.e. with _no_ interactivity, functionally equivalent to a PDF or printed copy of a webpage but built only out of standard-compliant HTML and data:-encoded embeds.
Resolution: DUPLICATE → ---
I forgot to add something in the description: ideally, since it's supposed to be for non-interactive archival purposes, the print-specific stylesheet of the page could be optionally used.
Alright, setting to NEW, since there don't seem to be other bugs about the same idea.

Now this idea needs a volunteer to implement it. If the goal is to be included in the core product (not as an extension), the volunteer should first check with the module owners if this could be accepted into the core.
Blocks: 82118
Component: General → File Handling
Ever confirmed: true
QA Contact: general → file-handling
Version: unspecified → Trunk
Product: Core → Firefox
Version: Trunk → unspecified

Can this sentence be considered also suitable for the following: "make it possible to save web pages to single page PDF"? That is, the resulting PDF document will have a single page, no matter how long the original web page is. Currently I am forced to take a screenshot of a web page (sometimes tens of thousands of pixels high) and then convert it to PDF with OCR. It takes longer and at the same time the quality of the appearance of all elements decreases significantly: letters, thin lines, graphics.

Or should I create a separate bug for my proposal?

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.