Open Bug 121793 Opened 19 years ago Updated 1 year ago
RFE: Save complete webpage in one file using data: protocol (RFC 2397)
Thanks to XML data source syntaxis, it's possible saving HTML document with images, embeds, external styles and scripts in one file. This can be made via Base64 encoding and using "data:" protocol. This option would be third point of drop down menu in "Save as" dialog, for example: Save file as type: Web page, complete, with separate files (*.htm, *.html) Web page, complete, in one whole file (*.htm, *.html) Web page, HTML only (*.htm, *.html) See demonstration of image embedding in attachment.
This isn't XHTML-specific. NS4.x knows inline images as well
OS: Windows 2000 → All
Hardware: PC → All
Yes, it's right :) But this is standart XML feature.
OS: All → Windows 2000
Hardware: All → PC
IE doesn't show the image. Valid RFE anyway.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Is this the same as bug 40873?
It is. There is no reason to make up our own format when there is a standard format for this. *** This bug has been marked as a duplicate of 40873 ***
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → DUPLICATE
No, in bug 40873 offering to use multipart MIME HTML documents with boundaries. But in this case, I propose to use XML "data:" protocol, without breaking document to parts. This feature allows to get fully W3C standart compliant document, which can be opened with any standart browser, placed to Web server etc. I think, this is more advanced and useful technology in comparison with MHTML.
This isn't about RFC 2557, but RFC 2397. It's either wontfix or new, but it's not a duplicate.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: It's possible to save complete page with all embedded external objects as one whole file. → RFE: Save as RFC 2397 HTML; complete webpage in one file
Ah. Ok, I had misunderstood... A point of reference (from rfc 2397): The "data:" URL scheme is only useful for short values. Note that some applications that use URLs may impose a length limit; for example, URLs embedded within <A> anchors in HTML have a length limit determined by the SGML declaration for HTML [RFC1866]. The LITLEN (1024) limits the number of characters which can appear in a single attribute value literal, the ATTSPLEN (2100) limits the sum of all lengths of all attribute value specifications which appear in a tag, and the TAGLEN (2100) limits the overall length of a tag. Thus if we do this we have to be careful to only offer it as an option in cases when all the linked content is smaller than the relevant limits.
Summary doesn't match comments in this bug, changing
Summary: RFE: Save as RFC 2397 HTML; complete webpage in one file → RFE: Save complete webpage in one file using data: protocol
So, where's the difference now ? Did you read RFC 2397 ? At least let the # stay in the summary, so someone can search for it.
Summary: RFE: Save complete webpage in one file using data: protocol → RFE: Save complete webpage in one file using data: protocol (RFC 2397)
Markus: sorry... I misread the summary to read MHTML, especially as it was marked a duplicate of that bug. I didn't look at the RFC numbers, but thought they were the same, so I didn't include it
I think, LITLEN is not a very significant limitation. Quotation from RFC 2397: The effect of using long "data" URLs in applications is currently unknown; some software packages may exhibit unreasonable behavior when confronted with data that exceeds its allocated buffer size. If Mozilla will can read big images from "data:" without problems, all will be OK.
You said: > This feature allows to get fully W3C standart compliant > document, which can be opened with any standart browser I was just pointing out that this is not exactly true. It's very likely to cause at least some standards-compliant browsers, especially a stricter browser on a more memory-limited platform, to do odd things....
May be, you're right. But MHTML has a same limitation - size of MHTML file and size of file in RFC 2397 format are practically equal. And more, MHTML support is less obvious thing, that data: protocol. I don't know any browser to understand data: protocol except Mozilla and Netscape 4.x. If files in RFC 2397 will get prevalence, support of this standart-compliant format will be put in strict browsers. Actually, this feature isn't sophisticated and don't demand a lot of system resources: strict browser, gettind HTML data and having seen "data:" in object location, will cut base64 piece, save object file in temporary directory and substitute corresponding URL, then continue HTML parsing. This scheme, I think, occupies not so much memory and CPU time regarding usual HTML parsing. And, finally, that is just user's choice - use saving with separate files, use complicated MHTML format or use simple for interpretation, transparent RFC 2397 format. Mainly, Web page is being saved to local disk for private use, and later it will be opened by same browser.
To Martin Kutschker: No problem. It's really possible use data: protocol within style sheet (both inline and separate file), for example: list-style-image:url(data:image/gif;base64,.....); It's really possible use data: protocol within <link>: <link href="data:text/css;base64,......" rel="stylesheet" type="text/css" /> And more, I had tested "russian matryoshka": <link> with data: protocol, with embedded image within CSS data. All was OK.
Only view the source for this file ;)
> Has anyone tried saving this file? My Mozilla 0.9.7 on Linux Please don't test saving with 0.9.7. Your comment touches on 2 or 3 separate bugs in the save as impl in 0.9.7 (it all got completely rewritten right before the milestone, with the ensuing issues). All the bugs you mention are fixed in current nightlys.
will have to wait for a future release, post mozilla1.0
Target Milestone: --- → Future
adding self to cc list
*** Bug 199757 has been marked as a duplicate of this bug. ***
BTW, Opera 7.20 and later also supports data: URLs.
Is this being explored for Firefox?
This bug is unrelated to Seamonkey/Firefox fork.
(In reply to comment #25) > Is this being explored for Firefox? AFAIK, the data scheme is implemented in Gecko (Firefox, Mozilla etc.), and works under all Mozilla variants. I'm using the data scheme to save space of some html pages with a lot of tiny GIF's inside them. If anyone is interested, I can post a small perl script that does the trick. However, I don't think that this RFE should implemented in Mozilla/FireFox. It's more reasonable to implement it as an extension for Mozilla/Firefox.
(In reply to comment #27) > I'm using the data scheme to save space of some html pages with a lot of tiny > GIF's inside them. If anyone is interested, I can post a small perl script that > does the trick. You save space using the data: scheme? I'd like to see that. > However, I don't think that this RFE should implemented in Mozilla/FireFox. > It's more reasonable to implement it as an extension for Mozilla/Firefox. Mozilla Archive Format is a must-have to be able to read single file webpage formats like MHT (EML). http://maf.mozdev.org/ https://addons.mozilla.org/firefox/2925/
(In reply to comment #28) ... > Mozilla Archive Format is a must-have to be able to read single file webpage > formats like MHT (EML). http://maf.mozdev.org/ > https://addons.mozilla.org/firefox/2925/ "MAF 0.7.0 is currently under development and will be compatible with Firefox 1.5 only." which means it is soon to become obsolete with Firefox 2.0 coming out soon, unless there is some sort of secret development of this going on, but usually open source is more, um, "open".
(In reply to comment #28) > (In reply to comment #27) > > I'm using the data scheme to save space of some html pages with a lot of tiny > > GIF's inside them. If anyone is interested, I can post a small perl script that > > does the trick. > > You save space using the data: scheme? I'd like to see that. > Try it with images that are < 512 bytes. Every small file eats at leaset one full sector/inode + metadata_size(filename...), so you CAN save space.
Assignee: law → nobody
QA Contact: chrispetersen → file-handling
As all browsers support data URLs now, shouldn't this be relatively easy to implement? Or are there still unresolved issues?
you also need to base64 encode any audio and video files how are really large files handled by base64?
Product: Core → Firefox
Target Milestone: Future → ---
Version: Trunk → unspecified
Firefox Quantum won't work with the MHTML file extensions, so it would be really nice to have this resolved.
You need to log in before you can comment on or make changes to this bug.