121793 - RFE: Save complete webpage in one file using data: protocol (RFC 2397)

Reporter

Description

•

23 years ago

Thanks to XML data source syntaxis, it's possible saving HTML document with images, embeds, external styles and scripts in one file. This can be made via Base64 encoding and using "data:" protocol. This option would be third point of drop down menu in "Save as" dialog, for example: Save file as type: Web page, complete, with separate files (*.htm, *.html) Web page, complete, in one whole file (*.htm, *.html) Web page, HTML only (*.htm, *.html) See demonstration of image embedding in attachment.

Manko

Reporter

Comment 1

•

23 years ago

Attached file Demonstration of embedding GIF image directly in document — Details

Markus Gerstel

Comment 2

•

23 years ago

This isn't XHTML-specific. NS4.x knows inline images as well

OS: Windows 2000 → All

Hardware: PC → All

Manko

Reporter

Comment 3

•

23 years ago

Yes, it's right :) But this is standart XML feature.

OS: All → Windows 2000

Hardware: All → PC

Manko

Reporter

Updated

•

23 years ago

OS: Windows 2000 → All

Hardware: PC → All

Markus Gerstel

Comment 4

•

23 years ago

IE doesn't show the image. Valid RFE anyway.

Status: UNCONFIRMED → NEW

Ever confirmed: true

xyzzy

Comment 5

•

23 years ago

Is this the same as bug 40873?

Boris Zbarsky [:bzbarsky]

Comment 6

•

23 years ago

It is. There is no reason to make up our own format when there is a standard format for this. *** This bug has been marked as a duplicate of 40873 ***

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → DUPLICATE

Manko

Reporter

Comment 7

•

23 years ago

No, in bug 40873 offering to use multipart MIME HTML documents with boundaries. But in this case, I propose to use XML "data:" protocol, without breaking document to parts. This feature allows to get fully W3C standart compliant document, which can be opened with any standart browser, placed to Web server etc. I think, this is more advanced and useful technology in comparison with MHTML.

Markus Gerstel

Comment 8

•

23 years ago

This isn't about RFC 2557, but RFC 2397. It's either wontfix or new, but it's not a duplicate.

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Summary: It's possible to save complete page with all embedded external objects as one whole file. → RFE: Save as RFC 2397 HTML; complete webpage in one file

Boris Zbarsky [:bzbarsky]

Comment 9

•

23 years ago

Ah. Ok, I had misunderstood... A point of reference (from rfc 2397): The "data:" URL scheme is only useful for short values. Note that some applications that use URLs may impose a length limit; for example, URLs embedded within <A> anchors in HTML have a length limit determined by the SGML declaration for HTML [RFC1866]. The LITLEN (1024) limits the number of characters which can appear in a single attribute value literal, the ATTSPLEN (2100) limits the sum of all lengths of all attribute value specifications which appear in a tag, and the TAGLEN (2100) limits the overall length of a tag. Thus if we do this we have to be careful to only offer it as an option in cases when all the linked content is smaller than the relevant limits.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 10

•

23 years ago

Summary doesn't match comments in this bug, changing

Summary: RFE: Save as RFC 2397 HTML; complete webpage in one file → RFE: Save complete webpage in one file using data: protocol

Markus Gerstel

Comment 11

•

23 years ago

So, where's the difference now ? Did you read RFC 2397 ? At least let the # stay in the summary, so someone can search for it.

Summary: RFE: Save complete webpage in one file using data: protocol → RFE: Save complete webpage in one file using data: protocol (RFC 2397)

Christian :Biesinger (don't email me, ping me on IRC)

Comment 12

•

23 years ago

Markus: sorry... I misread the summary to read MHTML, especially as it was marked a duplicate of that bug. I didn't look at the RFC numbers, but thought they were the same, so I didn't include it

Manko

Reporter

Comment 13

•

23 years ago

I think, LITLEN is not a very significant limitation. Quotation from RFC 2397: The effect of using long "data" URLs in applications is currently unknown; some software packages may exhibit unreasonable behavior when confronted with data that exceeds its allocated buffer size. If Mozilla will can read big images from "data:" without problems, all will be OK.

Boris Zbarsky [:bzbarsky]

Comment 14

•

23 years ago

You said: > This feature allows to get fully W3C standart compliant > document, which can be opened with any standart browser I was just pointing out that this is not exactly true. It's very likely to cause at least some standards-compliant browsers, especially a stricter browser on a more memory-limited platform, to do odd things....

Manko

Reporter

Comment 15

•

23 years ago

May be, you're right. But MHTML has a same limitation - size of MHTML file and size of file in RFC 2397 format are practically equal. And more, MHTML support is less obvious thing, that data: protocol. I don't know any browser to understand data: protocol except Mozilla and Netscape 4.x. If files in RFC 2397 will get prevalence, support of this standart-compliant format will be put in strict browsers. Actually, this feature isn't sophisticated and don't demand a lot of system resources: strict browser, gettind HTML data and having seen "data:" in object location, will cut base64 piece, save object file in temporary directory and substitute corresponding URL, then continue HTML parsing. This scheme, I think, occupies not so much memory and CPU time regarding usual HTML parsing. And, finally, that is just user's choice - use saving with separate files, use complicated MHTML format or use simple for interpretation, transparent RFC 2397 format. Mainly, Web page is being saved to local disk for private use, and later it will be opened by same browser.

Martin Kutschker

Comment 16

•

23 years ago

There are several problems with this approach: It makes only sense for images and other objects. Linked stylesheets and javascript would have to be 'included inline' (just as the C-preprocessor does). While this is probably ok for Javascript, I don't think it will work for (alternate) stylesheets and other linked resources. It will bloat a file which reuses images a lot. Think of these spacer and bullet GIFs. It will break on objects attached via stylesheets (eg list bullets).

Manko

Reporter

Comment 17

•

23 years ago

To Martin Kutschker: No problem. It's really possible use data: protocol within style sheet (both inline and separate file), for example: list-style-image:url(data:image/gif;base64,.....); It's really possible use data: protocol within <link>: <link href="data:text/css;base64,......" rel="stylesheet" type="text/css" /> And more, I had tested "russian matryoshka": <link> with data: protocol, with embedded image within CSS data. All was OK.

Manko

Reporter

Comment 18

•

23 years ago

Attached file Built-in GIF, <LINK> and <SCRIPT> demo — Details

Only view the source for this file ;)

Martin Kutschker

Comment 19

•

23 years ago

Amazing! Still a (implementation) problem are stylesheets that include other stylesheets and trusted Javascript that 'includes' JS-files via XPCOM (though they are a problem for any save-as-a-whole strategy). Has anyone tried saving this file? My Mozilla 0.9.7 on Linux always creates a directory and files for the embedded (!) images. It does it even when I save as "HTML only". Is there already a bug in this? So what is missing (?) is to reuse resources: What is working is this: <object style="display: none" id="embed" name="embed2" type="image/gif" data="data:image/gif;base64,R0lGODlhDwAPAJEBAAAAAL+/v/// AAAAACH5BAEAAAEALAAAAAAPAA8AAAIujA2Zx5EC4WIgWnnq vQBJLTyhE4khaG5Wqn4tp4ErFnMY+Sll9naUfGpkFL5DAQA7" /> <img src="javascript:this.src=document.getElementById('embed').data" id="test"> <script>document.getElementById('test').src = document.getElementById('embed').data</script> But this requires Javascript. Is there a better way to set the src/data of the image?

Boris Zbarsky [:bzbarsky]

Comment 20

•

23 years ago

> Has anyone tried saving this file? My Mozilla 0.9.7 on Linux Please don't test saving with 0.9.7. Your comment touches on 2 or 3 separate bugs in the save as impl in 0.9.7 (it all got completely rewritten right before the milestone, with the ensuing issues). All the bugs you mention are fixed in current nightlys.

Bill Law

Comment 21

•

23 years ago

will have to wait for a future release, post mozilla1.0

Target Milestone: --- → Future

sairuh (rarely reading bugmail)

Updated

•

23 years ago

QA Contact: sairuh → benc

marlon bishop

Updated

•

23 years ago

Blocks: 115634

marlon bishop

Updated

•

23 years ago

Blocks: 116008

Ian Pottinger

Comment 22

•

23 years ago

adding self to cc list

Ian Pottinger

Updated

•

23 years ago

Blocks: 82118

benc

Updated

•

23 years ago

QA Contact: benc → sairuh

sairuh (rarely reading bugmail)

Updated

•

23 years ago

QA Contact: sairuh → petersen

benc

Updated

•

22 years ago

Blocks: 144766

Alfonso Martinez

Comment 23

•

22 years ago

*** Bug 199757 has been marked as a duplicate of this bug. ***

Manko

Reporter

Comment 24

•

22 years ago

BTW, Opera 7.20 and later also supports data: URLs.

choi9999

Comment 25

•

21 years ago

Is this being explored for Firefox?

Manko

Reporter

Comment 26

•

21 years ago

This bug is unrelated to Seamonkey/Firefox fork.

Zvi Devir

Comment 27

•

21 years ago

(In reply to comment #25) > Is this being explored for Firefox? AFAIK, the data scheme is implemented in Gecko (Firefox, Mozilla etc.), and works under all Mozilla variants. I'm using the data scheme to save space of some html pages with a lot of tiny GIF's inside them. If anyone is interested, I can post a small perl script that does the trick. However, I don't think that this RFE should implemented in Mozilla/FireFox. It's more reasonable to implement it as an extension for Mozilla/Firefox.

Cees T.

Comment 28

•

19 years ago

(In reply to comment #27) > I'm using the data scheme to save space of some html pages with a lot of tiny > GIF's inside them. If anyone is interested, I can post a small perl script that > does the trick. You save space using the data: scheme? I'd like to see that. > However, I don't think that this RFE should implemented in Mozilla/FireFox. > It's more reasonable to implement it as an extension for Mozilla/Firefox. Mozilla Archive Format is a must-have to be able to read single file webpage formats like MHT (EML). http://maf.mozdev.org/ https://addons.mozilla.org/firefox/2925/

Worcester12345

Comment 29

•

19 years ago

(In reply to comment #28) ... > Mozilla Archive Format is a must-have to be able to read single file webpage > formats like MHT (EML). http://maf.mozdev.org/ > https://addons.mozilla.org/firefox/2925/ "MAF 0.7.0 is currently under development and will be compatible with Firefox 1.5 only." which means it is soon to become obsolete with Firefox 2.0 coming out soon, unless there is some sort of secret development of this going on, but usually open source is more, um, "open".

Doncho N. Gunchev

Comment 30

•

18 years ago

(In reply to comment #28) > (In reply to comment #27) > > I'm using the data scheme to save space of some html pages with a lot of tiny > > GIF's inside them. If anyone is interested, I can post a small perl script that > > does the trick. > > You save space using the data: scheme? I'd like to see that. > Try it with images that are < 512 bytes. Every small file eats at leaset one full sector/inode + metadata_size(filename...), so you CAN save space.

Timwi

Comment 31

•

17 years ago

Hm, no comment for over a year. That's unfortunate because I think this RFE is an extremely good idea and should be implemented. I think a perfect implementation would: * include JavaScript and CSS code inline (i.e. convert <script> and <style> tags with a 'src' attribute to tags containing the contents). There is no need for the data: URI here; base64 encoding would only make it use more space and remove readability. * recursively walk CSS @import clauses to include all the CSS. * convert images - whether <img> tags in HTML or url() clauses in CSS - to data: URIs. To clarify the distinction in the 'Save As' UI, I think the option that is currently called "HTML only" should perhaps be called "original HTML only" (to communicate that it's the unaltered HTML as output by the webserver). The one currently called "complete" could be renamed to "complete - multiple files", so the new option introduced by this RFE could then be called "complete - single HTML file".

Phil Ringnalda (:philor)

Updated

•

16 years ago

Assignee: law → nobody

QA Contact: chrispetersen → file-handling

Denis Washington

Comment 33

•

12 years ago

As all browsers support data URLs now, shouldn't this be relatively easy to implement? Or are there still unresolved issues?

pw

Comment 34

•

12 years ago

you also need to base64 encode any audio and video files how are really large files handled by base64?

Benjamin Smedberg

Updated

•

9 years ago

Product: Core → Firefox

Target Milestone: Future → ---

Version: Trunk → unspecified

Ion Chalmers Freeman

Comment 35

•

7 years ago

Firefox Quantum won't work with the MHTML file extensions, so it would be really nice to have this resolved.

:glob ✱

Comment 37

•

4 years ago

The content of attachment 9253576 [details] has been deleted for the following reason: Spam

jscher2000

Comment 38

•

3 years ago

There is a Recommended Extension along these lines:

https://addons.mozilla.org/firefox/addon/single-file/

That's not to say Mozilla should not implement a similar feature, but I'm not sure there is an advantage to building it into the product considering that the maintenance burden then would shift to Mozilla.

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Worcester12345

Comment 39

•

2 years ago

(In reply to jscher2000 from comment #38)

There is a Recommended Extension along these lines:

https://addons.mozilla.org/firefox/addon/single-file/

That's not to say Mozilla should not implement a similar feature, but I'm not sure there is an advantage to building it into the product considering that the maintenance burden then would shift to Mozilla.

You could also just "print" it to a .pdf file, for that matter. Strange that the software itself won't do this.

juan.lanus

Comment 40

•

2 years ago

(In reply to Worcester12345 from comment #39)

You could also just "print" it to a .pdf file, for that matter. Strange that the software itself won't do this.
IMO the whole point is not to use PDF.
It's a format that renders vertical (portrait) pages in usually wide screens.
A format that is not capable of reflowing text.
Etc, etc...

Demonstration of embedding GIF image directly in document 23 years ago Manko 699 bytes, text/html		Details
Built-in GIF, <LINK> and <SCRIPT> demo 23 years ago Manko 1.43 KB, text/html		Details
Demonstration of embedding GIF image directly in document 4 years ago Christine Brewer (deleted), text/plain		Details