All users were logged out of Bugzilla on October 13th, 2018

Provide option to save self-contained static copy of webpages (like PDF but with HTML)

NEW
Unassigned

Status

()

--
enhancement
8 years ago
2 years ago

People

(Reporter: bugzilla, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

8 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729)
Build Identifier: 

Instead of relying on PDF to save static copies of webpages (see Fennec's Save as PDF) it would be interesting to provide a functionally-equivalent way of doing the same thing but with standard web technologies (HTML, CSS, etc.)

What this would mean is take a (possibly interactive) webpage, take a snapshot of the DOM, strip out all interactive pieces (e.g. <script>), include within the document all external resources needed to display the webpage (e.g. external stylesheets -> embedded stylesheets, linked images -> base-64 encoded images, etc.), (optionally) conform the document to be valid HTML (to avoid rendering errors in future browser versions or across different browsers), (optionally) optimize out dead "code" (unused CSS, invisible DOM nodes, etc.), save the document as a _single_ _HTML_ file.

There obviously are some issues to be addressed - such as how to base-64 encode images bigger than the standard limit or if animated images/videos/streaming videos should be embedded, etc. The upside would obviously be not depending on PDF viewers to open saved webpages (e.g. symbian still has no built-in PDF viewer) and, at the same time, pushing open web technologies.

Reproducible: Always

Comment 1

8 years ago
This is a dupe of one of these: bug 40873, bug 121793, bug 381413
Status: UNCONFIRMED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 121793
(Reporter)

Comment 2

8 years ago
Not really. The difference is that the other bugs try to save the original HTML and embedded resources (including javascript) thus _maintaining_ interactivity. What I'm trying to propose here is a _static_ copy of a webpage, i.e. with _no_ interactivity, functionally equivalent to a PDF or printed copy of a webpage but built only out of standard-compliant HTML and data:-encoded embeds.
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
(Reporter)

Comment 3

8 years ago
I forgot to add something in the description: ideally, since it's supposed to be for non-interactive archival purposes, the print-specific stylesheet of the page could be optionally used.

Comment 4

8 years ago
Alright, setting to NEW, since there don't seem to be other bugs about the same idea.

Now this idea needs a volunteer to implement it. If the goal is to be included in the core product (not as an extension), the volunteer should first check with the module owners if this could be accepted into the core.
Blocks: 82118
Status: UNCONFIRMED → NEW
Component: General → File Handling
Ever confirmed: true
QA Contact: general → file-handling
Version: unspecified → Trunk

Updated

2 years ago
Component: File Handling → File Handling
Product: Core → Firefox
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.