RFE: Save complete webpage in one file using data: protocol (RFC 2397)
Categories
(Firefox :: File Handling, enhancement)
Tracking
()
People
(Reporter: sinchi, Unassigned)
References
Details
Attachments
(2 files, 1 obsolete file)
Thanks to XML data source syntaxis, it's possible saving HTML document with images, embeds, external styles and scripts in one file. This can be made via Base64 encoding and using "data:" protocol. This option would be third point of drop down menu in "Save as" dialog, for example: Save file as type: Web page, complete, with separate files (*.htm, *.html) Web page, complete, in one whole file (*.htm, *.html) Web page, HTML only (*.htm, *.html) See demonstration of image embedding in attachment.
Comment 2•23 years ago
|
||
This isn't XHTML-specific. NS4.x knows inline images as well
Yes, it's right :) But this is standart XML feature.
Comment 4•23 years ago
|
||
IE doesn't show the image. Valid RFE anyway.
![]() |
||
Comment 6•23 years ago
|
||
It is. There is no reason to make up our own format when there is a standard format for this. *** This bug has been marked as a duplicate of 40873 ***
No, in bug 40873 offering to use multipart MIME HTML documents with boundaries. But in this case, I propose to use XML "data:" protocol, without breaking document to parts. This feature allows to get fully W3C standart compliant document, which can be opened with any standart browser, placed to Web server etc. I think, this is more advanced and useful technology in comparison with MHTML.
Comment 8•23 years ago
|
||
This isn't about RFC 2557, but RFC 2397. It's either wontfix or new, but it's not a duplicate.
![]() |
||
Comment 9•23 years ago
|
||
Ah. Ok, I had misunderstood... A point of reference (from rfc 2397): The "data:" URL scheme is only useful for short values. Note that some applications that use URLs may impose a length limit; for example, URLs embedded within <A> anchors in HTML have a length limit determined by the SGML declaration for HTML [RFC1866]. The LITLEN (1024) limits the number of characters which can appear in a single attribute value literal, the ATTSPLEN (2100) limits the sum of all lengths of all attribute value specifications which appear in a tag, and the TAGLEN (2100) limits the overall length of a tag. Thus if we do this we have to be careful to only offer it as an option in cases when all the linked content is smaller than the relevant limits.
Comment 10•23 years ago
|
||
Summary doesn't match comments in this bug, changing
Comment 11•23 years ago
|
||
So, where's the difference now ? Did you read RFC 2397 ? At least let the # stay in the summary, so someone can search for it.
Comment 12•23 years ago
|
||
Markus: sorry... I misread the summary to read MHTML, especially as it was marked a duplicate of that bug. I didn't look at the RFC numbers, but thought they were the same, so I didn't include it
Reporter | ||
Comment 13•23 years ago
|
||
I think, LITLEN is not a very significant limitation. Quotation from RFC 2397: The effect of using long "data" URLs in applications is currently unknown; some software packages may exhibit unreasonable behavior when confronted with data that exceeds its allocated buffer size. If Mozilla will can read big images from "data:" without problems, all will be OK.
![]() |
||
Comment 14•23 years ago
|
||
You said:
> This feature allows to get fully W3C standart compliant
> document, which can be opened with any standart browser
I was just pointing out that this is not exactly true. It's very likely to
cause at least some standards-compliant browsers, especially a stricter browser
on a more memory-limited platform, to do odd things....
Reporter | ||
Comment 15•23 years ago
|
||
May be, you're right. But MHTML has a same limitation - size of MHTML file and size of file in RFC 2397 format are practically equal. And more, MHTML support is less obvious thing, that data: protocol. I don't know any browser to understand data: protocol except Mozilla and Netscape 4.x. If files in RFC 2397 will get prevalence, support of this standart-compliant format will be put in strict browsers. Actually, this feature isn't sophisticated and don't demand a lot of system resources: strict browser, gettind HTML data and having seen "data:" in object location, will cut base64 piece, save object file in temporary directory and substitute corresponding URL, then continue HTML parsing. This scheme, I think, occupies not so much memory and CPU time regarding usual HTML parsing. And, finally, that is just user's choice - use saving with separate files, use complicated MHTML format or use simple for interpretation, transparent RFC 2397 format. Mainly, Web page is being saved to local disk for private use, and later it will be opened by same browser.
Comment 16•23 years ago
|
||
There are several problems with this approach: It makes only sense for images and other objects. Linked stylesheets and javascript would have to be 'included inline' (just as the C-preprocessor does). While this is probably ok for Javascript, I don't think it will work for (alternate) stylesheets and other linked resources. It will bloat a file which reuses images a lot. Think of these spacer and bullet GIFs. It will break on objects attached via stylesheets (eg list bullets).
Reporter | ||
Comment 17•23 years ago
|
||
To Martin Kutschker: No problem. It's really possible use data: protocol within style sheet (both inline and separate file), for example: list-style-image:url(data:image/gif;base64,.....); It's really possible use data: protocol within <link>: <link href="data:text/css;base64,......" rel="stylesheet" type="text/css" /> And more, I had tested "russian matryoshka": <link> with data: protocol, with embedded image within CSS data. All was OK.
Reporter | ||
Comment 18•23 years ago
|
||
Only view the source for this file ;)
Comment 19•23 years ago
|
||
Amazing! Still a (implementation) problem are stylesheets that include other stylesheets and trusted Javascript that 'includes' JS-files via XPCOM (though they are a problem for any save-as-a-whole strategy). Has anyone tried saving this file? My Mozilla 0.9.7 on Linux always creates a directory and files for the embedded (!) images. It does it even when I save as "HTML only". Is there already a bug in this? So what is missing (?) is to reuse resources: What is working is this: <object style="display: none" id="embed" name="embed2" type="image/gif" data="data:image/gif;base64,R0lGODlhDwAPAJEBAAAAAL+/v/// AAAAACH5BAEAAAEALAAAAAAPAA8AAAIujA2Zx5EC4WIgWnnq vQBJLTyhE4khaG5Wqn4tp4ErFnMY+Sll9naUfGpkFL5DAQA7" /> <img src="javascript:this.src=document.getElementById('embed').data" id="test"> <script>document.getElementById('test').src = document.getElementById('embed').data</script> But this requires Javascript. Is there a better way to set the src/data of the image?
![]() |
||
Comment 20•23 years ago
|
||
> Has anyone tried saving this file? My Mozilla 0.9.7 on Linux
Please don't test saving with 0.9.7. Your comment touches on 2 or 3 separate
bugs in the save as impl in 0.9.7 (it all got completely rewritten right before
the milestone, with the ensuing issues). All the bugs you mention are fixed in
current nightlys.
Comment 21•23 years ago
|
||
will have to wait for a future release, post mozilla1.0
Updated•23 years ago
|
Comment 22•23 years ago
|
||
adding self to cc list
Updated•22 years ago
|
Comment 23•22 years ago
|
||
*** Bug 199757 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 24•21 years ago
|
||
BTW, Opera 7.20 and later also supports data: URLs.
Comment 25•20 years ago
|
||
Is this being explored for Firefox?
Reporter | ||
Comment 26•20 years ago
|
||
This bug is unrelated to Seamonkey/Firefox fork.
Comment 27•20 years ago
|
||
(In reply to comment #25) > Is this being explored for Firefox? AFAIK, the data scheme is implemented in Gecko (Firefox, Mozilla etc.), and works under all Mozilla variants. I'm using the data scheme to save space of some html pages with a lot of tiny GIF's inside them. If anyone is interested, I can post a small perl script that does the trick. However, I don't think that this RFE should implemented in Mozilla/FireFox. It's more reasonable to implement it as an extension for Mozilla/Firefox.
Comment 28•18 years ago
|
||
(In reply to comment #27) > I'm using the data scheme to save space of some html pages with a lot of tiny > GIF's inside them. If anyone is interested, I can post a small perl script that > does the trick. You save space using the data: scheme? I'd like to see that. > However, I don't think that this RFE should implemented in Mozilla/FireFox. > It's more reasonable to implement it as an extension for Mozilla/Firefox. Mozilla Archive Format is a must-have to be able to read single file webpage formats like MHT (EML). http://maf.mozdev.org/ https://addons.mozilla.org/firefox/2925/
Comment 29•18 years ago
|
||
(In reply to comment #28) ... > Mozilla Archive Format is a must-have to be able to read single file webpage > formats like MHT (EML). http://maf.mozdev.org/ > https://addons.mozilla.org/firefox/2925/ "MAF 0.7.0 is currently under development and will be compatible with Firefox 1.5 only." which means it is soon to become obsolete with Firefox 2.0 coming out soon, unless there is some sort of secret development of this going on, but usually open source is more, um, "open".
Comment 30•18 years ago
|
||
(In reply to comment #28) > (In reply to comment #27) > > I'm using the data scheme to save space of some html pages with a lot of tiny > > GIF's inside them. If anyone is interested, I can post a small perl script that > > does the trick. > > You save space using the data: scheme? I'd like to see that. > Try it with images that are < 512 bytes. Every small file eats at leaset one full sector/inode + metadata_size(filename...), so you CAN save space.
Comment 31•17 years ago
|
||
Hm, no comment for over a year. That's unfortunate because I think this RFE is an extremely good idea and should be implemented. I think a perfect implementation would: * include JavaScript and CSS code inline (i.e. convert <script> and <style> tags with a 'src' attribute to tags containing the contents). There is no need for the data: URI here; base64 encoding would only make it use more space and remove readability. * recursively walk CSS @import clauses to include all the CSS. * convert images - whether <img> tags in HTML or url() clauses in CSS - to data: URIs. To clarify the distinction in the 'Save As' UI, I think the option that is currently called "HTML only" should perhaps be called "original HTML only" (to communicate that it's the unaltered HTML as output by the webserver). The one currently called "complete" could be renamed to "complete - multiple files", so the new option introduced by this RFE could then be called "complete - single HTML file".
Updated•15 years ago
|
Comment 33•11 years ago
|
||
As all browsers support data URLs now, shouldn't this be relatively easy to implement? Or are there still unresolved issues?
Comment 34•11 years ago
|
||
you also need to base64 encode any audio and video files how are really large files handled by base64?
Updated•8 years ago
|
Comment 35•6 years ago
|
||
Firefox Quantum won't work with the MHTML file extensions, so it would be really nice to have this resolved.
Comment 37•3 years ago
|
||
The content of attachment 9253576 [details] has been deleted for the following reason:
Spam
Comment 38•2 years ago
|
||
There is a Recommended Extension along these lines:
https://addons.mozilla.org/firefox/addon/single-file/
That's not to say Mozilla should not implement a similar feature, but I'm not sure there is an advantage to building it into the product considering that the maintenance burden then would shift to Mozilla.
Updated•2 years ago
|
Comment 39•1 year ago
|
||
(In reply to jscher2000 from comment #38)
There is a Recommended Extension along these lines:
https://addons.mozilla.org/firefox/addon/single-file/
That's not to say Mozilla should not implement a similar feature, but I'm not sure there is an advantage to building it into the product considering that the maintenance burden then would shift to Mozilla.
You could also just "print" it to a .pdf file, for that matter. Strange that the software itself won't do this.
Comment 40•1 year ago
|
||
(In reply to Worcester12345 from comment #39)
You could also just "print" it to a .pdf file, for that matter. Strange that the software itself won't do this.
IMO the whole point is not to use PDF.
It's a format that renders vertical (portrait) pages in usually wide screens.
A format that is not capable of reflowing text.
Etc, etc...
Description
•