Closed Bug 18764 Opened 25 years ago Closed 9 years ago

Full rfc2557 MHTML multipart/related support in BROWSER

Categories

(Core :: Networking, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: sidr, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: helpwanted, Whiteboard: [Hixie-PF])

Attachments

(2 files)

The discussion in Bug 17309 "Wait for primary style sheets before constructing
frames" seems to have gotten bogged down - it seems obvious that no matter
what the final decision is on what to do and how to do it, it can't
be exactly what everyone would want.

Unless the decision is to always block display "forever" while waiting for all
primary style sheets to arrive, which would be sure to vex some, some pages
that cannot be properly displayed will be displayed anyway.

Ensuring that the browser has full rfc2557 MHTML support would provide a viable
mechanism for ensuring that LINKed stylesheets arrive with the documents
they apply to: those who absolutely *need* the stylesheets to arrive before
rendering could send an MHTML document instead on the basis of appropriate
content-negotiation. This would ensure that the HTML, and thinking forward,
XHTML and XML, documents could be made available with certainty that the
necessary stylesheets would be present at page layout time.

This would provide an "escape valve" of sorts to make sure that the issues
in bug 17309 do not become overpressurized. If this feature request is not
adopted, the "correct" fix for 17309 becomes very important due to the lack
of a viable alternative, on the page author's part, to hoping that the
browser does the right thing in even the most adverse conditions (say,
just after another railway crash that takes out a fiber run).

Whether any webserver available now can assemble an MHTML document on the fly
is unknown; whether authors would be willing to use tools to "precompile"
MHTML documents is also unknown.

Quoting from the RFC:
"While initially designed to support e-mail transfer of complete
multi-resource HTML multimedia documents, these conventions can also
be employed to resources retrieved by other transfer protocols such
as HTTP and FTP to retrieve a complete multi-resource HTML multimedia
document in a single transfer or for storage and archiving of
complete HTML-documents." <URL:http://www.faqs.org/rfcs/rfc2557.html>

Regardless of anything else, it is probably a good thing for the browser
to fully support rfc2557 MTHML for maximum flexibility and to open up
future options, particularly regarding XML.

I do see a downside: any simple way of implementing this would be very
cache-inefficient, and any way that would be cache-efficient could conceivably
add enough complexity to require extentions to HTTP.
Summary: RFE: Full rfc2557 MHTML multipart/related support in BROWSER → RFE: Full rfc2557 MHTML multipart/related support in BROWSER
Copied the following comment from bug 17309 (cc-ing contributor):
>------ Additional Comments From dbaron@fas.harvard.edu  11/13/99 07:03 ------
>Authors could add the proprietary "important" keyword to the list of keywords
>in the rel attribute of the link element, e.g., rel="important stylesheet" (or
>rel="stylesheet important") to do what you want without resorting to RFC2557.

Yes, they could, but they would have no guarantee that Mozilla or any other
browser would either interpret that they way they want or implement the
behaviour they want in response, nor that a future version would not do
something slightly or markedly different.

Providing an rfc2557 MHTML mechanism would take care of the extreme case,
leaving room for a reasonable policy for "important stylesheet" that would
not necessarily mean "absolutely required" from this point forward.

Having said that, I absolutely would not advocate MHTML in the browser as the
*only* mechanism provided to authors to indicate how important or necessary
a stylesheet is, lest this feature get thought of by anyone as the only way
to go. I'd go so far as to say don't add the feature if nothing else is
provided as a fix for bug 17309.
Target Milestone: M15
Bulk move of all Necko (to be deleted component) bugs to new Networking

component.
Assignee: gagan → ruslan
->ruslan
Keywords: beta2
I don't know how easy it'll be implement. Basically when we open the channel - 
we would ask for /foo.html. If that contains 3 htmls, but not one - we'll have 
to invent a way to deal with it.
Priority: P3 → P4
Target Milestone: M15 → M16
Per warren's decision -> nobody
Assignee: ruslan → nobody
Target Milestone: M16 → M20
Keywords: nsbeta2
Putting on [nsbeta2-] radar.  
Whiteboard: [nsbeta2-]
Blocks: 40873
Marking helpwanted since that's what I think was meant by "-> nobody".
Keywords: helpwanted
Open Networking bugs, qa=tever -> qa to me.
QA Contact: tever → benc
This sounds like it would be a MAJOR step toward being able to save (and send)
entire HTML pages as ONE file to disk - great.

Suggest keyword: mozilla0.9.2
Depends on: 82118
I created a tracking (meta) bug 82118 to track these kinds of bugs and to unify
the efforts.

Maybe a few duplicates will also become aparent this way - then we can assign
the keyword MostFreq.
Removing dependancy to bug 82118 as it should be the other way round (bug 82118
depends on this bug).
No longer depends on: 82118
Blocks: 82118
Whiteboard: [nsbeta2-] → [Hixie-PF]
I'll try to implement this, but I do not know yet if I really have the skills
to do it. Be prepared that I may have to give this back to nobody@mozilla.org.

My plan is roughly this:

- Implement a mhtml: protocol handler similar to the jar: handler.
- Implement a stream converter similar to the multipart/x-mixed-replace converter.
- Implement a method to control pending loads.

The stream converter would return the root resource within a mhtml channel and
put the other parts into a cache. On every page load we'd have to check if the
referring URI has a mhtml scheme and if so, translate the URI to be loaded into
a mhtml: URI.

The mhtml channel would simply fetch from cache if the requested resource is
available. If the containing multipart resource is still loading, it would
wait until it becomes available. If the requested resource wasn't included in
the multipart resource, try to get it using the original URI.

If the requested resource isn't in the cache and the containing multipart
resource is not currently loading, we'd have to load it using basically the same
mechanism the stream converter is using.
Status: NEW → ASSIGNED
Keywords: helpwanted, nsbeta2
Priority: P4 → P3
Target Milestone: --- → mozilla1.1
Assign to myself, not nobody.
Assignee: nobody → clarence
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Why do you need a new protocol handler? If I go to
   http://www.example.org/mydocument.mhtml
...I would want it to display right without changing the URI.

BTW, if you _do_ use your own protocol, then it should be called 'moz-mhtml' o
whatever, so as not to polute the protocol namespace.
Ian, somehow we must remember that we have an MHTML document if we don't want to
rewrite its links. URIs in MHTML documents can be the same as existing URIs
outside the MHTML document. If we rewrite the links (e.g. convert them to <cid:>
URIs) it is very likely that we break at least some JS.

It may be possible to keep the original URI for the root resource, but it would
require more changes to docshell. I do not intend to implement this in the first
step. Please file a bug on it once MHTML works.

If we display a resource other than the root resource (e.g. open a frame in a
new window), it does not make sense to keep the original URI and it does not
make sense to show the given URI, because the displayed document may be
different from a document with the same URI retrieved directly over the net.
Another approach would be to generate a Content-ID ourselves if the MHTML
document doesn't specify it and use <cid:> instead of <mhtml:>. But that would
be much more difficult to implement.

Name of the protocol: We do already pollute the protocol namespace (<jar:>,
<view-source:>, <about:>, <internal:>, <chrome:>, <resource:>, <javascript:>).
But if you think we shouldn't continue this it would be no problem to use
<moz-mhtml:>.
A clarification: If http://www.example.org/mydocument.mhtml has Content-Type:
multipart/related you could of course type that URI into the URL bar or use it
in a link. But it would then change to
mhtml:http://www.example.org/mydocument.mhtml!/ or
mhtml:http://www.example.org/mydocument.mhtml!/http://another.example.com/
(if the root resource has Content-Location: http://another.example.com/ ).
This is similar to an HTTP redirection.
Clarence, first, thanks for giving this a try. From a quick look at my inbox, 
at least some HTML mail uses multipart/related, instead of multipart/mixed,
so MailNews may already have some of the code you need.

As a start, try this LXR query:
  http://lxr.mozilla.org/seamonkey/search?string=multipart%2Frelated
and especially look at 
http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/mimemrel.cpp#29
and following, where rhp@netscape.com has some implementation notes about
how to handle multipart/related data.
I know the MHTML code for mail. But I think it needs nearly a complete rewrite
to work outside of mail and to support all HTML features (e.g. frames).
The first implementation note in mimemrel.cpp describes basically the way I'm
going to implement this.
*** Bug 108329 has been marked as a duplicate of this bug. ***
*** Bug 176054 has been marked as a duplicate of this bug. ***
*** Bug 177713 has been marked as a duplicate of this bug. ***
Is this networking of file handling?
Pardon me if I'm wrong, but it seems to be that this is under both networking
and filehandling.

My main want is to be able to open a single file (whether it be downloaded from
an ftp site or opened from my desktop) with all graphics, stylesheets, and html
included.
Is there a testcase available somewhere?

pi
Sorry, I haven't (time to) read RFC2557, but I'd like to make a wish:
when MHTML is opened (in browser), it would be nice if the From, Date, etc
fields aren't displayed.

Excuse-me if this wish isn't adequate and doesn't seem to be within topic of
this bug because my bug got being marked as a dup of this.
Attached file This Page as MS MHTML
This page saved with MS IExplorer (6)
After upgrading from MS Internet-Exploder to Mozilla (security reasons) I
really miss the MS-Feature of saving complete Webpages into one single File.
With a huge collection of documents on your hard drive it matters very much how
the files are organized and structured. I hope this issue will get a higher
priority. I wonder why Netscape/Mozilla did not make a progress in that
direction for years.
Frank
As a developer I can tell you it's not that easy ! However I think Mozilla
developers could use some help so here goes a useful link :
http://www.codeproject.com/shell/IESaveAs.asp

The article contains useful information about IE and its so famous (^^) save as
MHTML feature . Developers should also read the user comments. 
>somehow we must remember that we have an MHTML document if we don't want to
>rewrite its links. URIs in MHTML documents can be the same as existing URIs
>outside the MHTML document. If we rewrite the links (e.g. convert them to <cid:>
>URIs) it is very likely that we break at least some JS.

How does IE do it?
Me, and my employer, are interested in the implementation of this feature.
If there is no one working on it, or if there is someone working on this having
trouble, I'd like to give it a try.
So, feel free to contact me with pointers about how to go about it.
I changed from IE to Mozilla just for security reasons. I'm still missing this
nice MHTML feature. There are many HTML documents with important inline
graphics, like pages with embedded math formulas as GIF or graphs. 
For me it's not important that all features of a web page are preserved.
Javascript can be broken, that's not important to me as it's used mostly for
advertisements. Also I don't care much about CSS as the content is more
important to me as a correct layout. This topic is discussed now for more than 4
years. So, maybe a simple approach at the beginning would be sufficient. The
mail component is using already a similar functionality. Javacript/CSS, external
Link and Layout optimizations can be made later. 
(In reply to comment #37)
> I changed from IE to Mozilla just for security reasons. I'm still missing this
> nice MHTML feature. There are many HTML documents with important inline
> graphics, like pages with embedded math formulas as GIF or graphs. 
> For me it's not important that all features of a web page are preserved.

Actually Moz… is at least capable of viewing *.mht files created by IE. I tried
following.

1. in IE. Opened a web page with graphics
2. saved it as *.mht file
3. Opened a new message in Thunderbird
4. Attached the *.mht file
5. save the message as draft
6. view the saved draft message in preview pane
7. I am able to see the complete web page with graphics and css

If Thunderbird is capable of viewing a *.mht Mozilla.org has code to show it in
the browser.

But I dont whether there is code to save it!!

Alternative, Mozilla is capable of viewing contents inside a zip file (including
pages with graphics and css). So why not make a XPCOM component to update zip,
then extension developers can use that to make a single file achieving facility.

See topic http://forums.mozillazine.org/viewtopic.php?p=442473
Priority: P3 → --
Target Milestone: mozilla1.1alpha → ---
Summary: RFE: Full rfc2557 MHTML multipart/related support in BROWSER → Full rfc2557 MHTML multipart/related support in BROWSER
*** Bug 241240 has been marked as a duplicate of this bug. ***
re: comment 40:  zip format is discussed in bug 64286.

re: comment 43:  we need a complete re-write to support mhtml in the browser.
(see comment 18, where the bug assignee, Clarence, notes this.)

biggest problem with this bug is that nobody is actively working on it,
just a lot of us actively watching it as it sits there smiling back.
Assignee: c → cbiesinger
Status: ASSIGNED → NEW
Target Milestone: --- → mozilla1.9alpha
MAF (http://maf.mozdev.org/) is an extension which might be useful.
actually, I'm not going to continue working on this... I stopped when I realized
I'd need a streaming base64 decoder
Assignee: cbiesinger → darin
Did you already notice that MAF can load and save MHTML ?

http://maf.mozdev.org/

My IE-Saved files were displayed correctly and saved Files
could open in IE later.

So, the only thing to do is to integrate that basic function
into the core of Mozilla. Up to now it needs some external programs
and scripts.

Also a decision about a standard format would be nice.
I don't trust the MAFF format now, because who knows how
long this will be supported. MTHML has a disadvantage because
it's uncompressed. So a .mht.zip or .mhtz would be nice.
If this would dissapear, at least with unzipping the docs
could convert easily into the RFC-standardized MTHML.
(In reply to comment #47)
> actually, I'm not going to continue working on this... I stopped when I realized
> I'd need a streaming base64 decoder

Please check window.atob() and window.btoa() function are useful or not
*** Bug 268151 has been marked as a duplicate of this bug. ***
*** Bug 275302 has been marked as a duplicate of this bug. ***
*** Bug 278968 has been marked as a duplicate of this bug. ***
Blocks: majorbugs
No longer blocks: majorbugs
(In reply to comment #48)
> Did you already notice that MAF can load and save MHTML ?
> 
> http://maf.mozdev.org/
> 
Unfortunately MAF does not work with 1.5 Betas anymore.
(In reply to comment #53)
> (In reply to comment #48)
> > Did you already notice that MAF can load and save MHTML ?
> > 
> > http://maf.mozdev.org/
> > 
> Unfortunately MAF does not work with 1.5 Betas anymore.

It works if you override MAF's Firefox version checking. Nightly Tester Tools can help you do this. Developers of MAF still should update their package, though...
-> nobody
Assignee: darin → nobody
Keywords: helpwanted
Target Milestone: mozilla1.9alpha → Future
*** Bug 52386 has been marked as a duplicate of this bug. ***
Target Milestone: Future → ---
(In reply to comment #48)
> I don't trust the MAFF format now, because who knows how
> long this will be supported. MTHML has a disadvantage because
> it's uncompressed. So a .mht.zip or .mhtz would be nice.
> If this would dissapear, at least with unzipping the docs
> could convert easily into the RFC-standardized MTHML.
> 

MAF uses a custom "Save as, complete" function that puts the result and a metadata RDF file in timed folder in a ZIP file. If you want to view it in a different app, just unzip, enter the folder you want (you can add pages to an existing MAFF), and open index.html.

MAF's MHT output differs from IE though. There are some unexplained options to tweak that.
Why is this not a duplicate of bug 40873? I think networking downloads pages just fine. Or are the missing items supposedly fixed by https://addons.mozilla.org/firefox/2925/ the issue here?
(In reply to comment #60)
> Why is this not a duplicate of bug 40873?

that's about saving, this is about viewing.
BTW, Opera 9 has full support for .mht files.
Could someone please add "viewing" and "Display" to the subject, in order to avoid confusion between this bug (View/Display MHTML) and bug 40873 (Save MHTML).

Old: Full rfc2557 MHTML multipart/related support in BROWSER

New: Display/View MHTML in BROWSER (Full rfc2557)

BTW: I lost a potential convert from IE to Firefox today because he saves recipes from e.g., http://www.verybestbaking.com/recipes/detail.aspx?ID=18476 as MHTML files to his hard drive. Firefox can't do MHTML, and Fx offers a silly default filename ("detail.aspx.htm"). So I lost a "customer". :-(
I don't think it's necessary to change the title since a full RFC2557 support implies view/display.
I think what 石庭豐 means is that full RFC2557 support = view + save, so this bug bug covers bug 40873.
Yeah, the majority of RFC2557 talks about how to parse and interpret the content (eg section 8).  If that's not for display (and save), what's the use of support it? :)
This has also been reported to Launchpad bug tracker for Ubuntu.
https://bugs.launchpad.net/ubuntu/+source/firefox-3.0/+bug/240133
Actually, not only being able to save the page in mhtm format is helpful but even sending the page from the webserver is equally useful. This feature would drastically cut down time taken for loading multiple web requests for different resources and the overhead of creating and breaking down the connection. Considering it is such a useful and powerful feature, it is surprising that even google's chrome has not supported it.
Hi, is it correct that this bug is in the 'Networking' component?
Those who are still running into periodic need to view .mht files (like those my HR insists on sending me when they find a resume for me on the web) may be interested to know about this add-on:
http://www.unmht.org/unmht/en_index.html

I'm not sure when it came on the scene, but it's now indispensible... Seemed to work pretty well in all occasions I've had to try it so far.
Flags: wanted1.9.2?
QA Contact: benc → networking
Just tried the add-on.  I have to give it a "thumb up"!
... Although there's pitfall to avoid: conflict with "IE tab" and have to disable a special URL.  It's written in the webpage... at about the very last part of it (not easy to spot it if one has no idea what to look for)
I think this bug after 11 years should be WONTFIX, given the addon mentioned in comment 78 works great, and this is clearly not on the developers' priority list. I voted for it but I'm well aware that this is not a feature wanted or needed by the vast majority of users. Parity with IE is not always a good enough reason to spend time developing a feature. Especially when there are addons capable of doing the job.
(in reply to comment 84)
I doubt very much that a bug with as many as 165 votes, continuous requests for a period of 11 years so far and with recent duplicates still coming in is a likely candidate for wontfix. Maybe fix would be better.

Addons can't replace vital core functionality. Many users will never bother installing addons, but they still need and expect the functionality.

Michael, where's your data to show this isn't needed by many users?

Finally, please consider that the lack of "parity with IE" in this case means that users who already use Firefox might be tempted to switch back to IE both for viewing and saving all-in-one rfc2557 MHTML files. I'll take it a step further and state that both MHTML and .maff format should be natively supported by the browser. Only after many years of using FF did I discover .maff add-on, and I'm not very shy of addons. Current default of saving html pages as a "file + loads of files in subfolder" set is very impractical and resource-wasting. Let alone all the problems you can get when copying over-long file paths resulting from saved html files. Mozilla should really do better.
(In repetition of comment #76)
> Is it correct that this bug is in the 'Networking' component?
MHTML is not an approved standard. It is a Microsoft idea that other browser developers have followed. Whether Firefox follows the trend or not is a choice. If they don't, it is not a bug. We already have Zip to archive web pages and related objects. Using an add-on to do the archive from within Firefox is a convenience not a bug fix.
(In reply to comment #87)
> it is not a bug.
> is a convenience not a bug fix.

That is why this "bug" is an "enhancement" (with 164 votes).
Flags: wanted1.9.2?
The UnMHT extension does this work, can we integrate it into Firefox?
Provide a patch, write tests, ask for review, address review comments, let it land, done.
Before someone wastes their time I actually tried to implement this back in 2008 or 2009 and ran into unexpected problems. This task isn't as trivial as it at first seems. Let's just say that you can't just take UnMHT or Thunderbird and make it work in Firefox.
I've found that Chromium does have MHTML support[1], although it is still marked as experimental and require manual toggling in its configuration page. Supporting MHTML would allow us to easily make desktop HTML5 portable applications, and I think it would be more useful to our users and more reflecting our mission to support the web, than, for example, building our own built in PDF viewer. 

[1] https://codereview.chromium.org/7064044/ - They also have bunch of resolved and unresolved issues on https://code.google.com/p/chromium/issues/list?can=1&q=MHTML
I think this is a very useful feature. When you save a webpage with a view to read it outline or later, it is more convenient to have a single file.

(In reply to Lance Baker from comment #87)
> MHTML is not an approved standard. It is a Microsoft idea that other browser
> developers have followed.
Among the authors of the RFC2557, only one works for Microsoft. Moreover, it is not like if MHTML is a closed file format: the specification is public and part of IETF's work.
(In reply to Lance Baker from comment #87)
> MHTML is not an approved standard. It is a Microsoft idea that other browser
> developers have followed. Whether Firefox follows the trend or not is a
> choice. If they don't, it is not a bug. We already have Zip to archive web
> pages and related objects. Using an add-on to do the archive from within
> Firefox is a convenience not a bug fix.

It's a specification approved as "proposed standard" by the IETF. Just like many other things the internet runs on.
(In reply to ajf from comment #101)
> It's a shame that after 18 years, Mozilla still has no support for MIME
> HTML. If I knew C++ and the codebase, I'd write a patch, but alas.

There is actually an alternative solution and we don't need to know C++.  It was suggested in comment #78 and confirmed in comment #93: use the add-on called UnMHT (https://addons.mozilla.org/en-US/firefox/addon/unmht/)  I also can confirm it's working good.  I'm pretty sure it can be integrated inside Firefox setup so that it's enabled by default.  Look, Calendar project used to be an add-on for Thunderbird.  Now in TB 38, the add-on is integrated and works perfectly.  So why not UnMHT?

I've seen comment #95 saying that it does not work for him.  Maybe the commentator didn't use the right method?  We cannot put the xpi file like that inside "extension" folder.  In all cases, the file name has to be changed.  In some cases, it's also necessary to unpack the file.  For UnMHT, filename change is enough.  Here are the steps for FF 40:
1. Download unmht-8.0.0-an+sm+tb+fx.xpi from https://addons.mozilla.org/en-US/firefox/addon/unmht/
2. Change the name to {f759ca51-3a91-4dd1-ae78-9db5eee9ebf0}.xpi
3. Put it into "extension" folder.  There are two extension folders in Windows:
  a. System-wide:
     Put it in "C:\Program Files (x86)\Mozilla Firefox\browser\extensions" and whoever logs in the computer will get the add-on.
  b. User-wide:
     Put it in C:\Users\<user>\AppData\Roaming\Mozilla\Firefox\Profiles\<profile>\extensions so that only <user> will get it
4. In either case, it's still necessary to explicitly enable the add-on.
(In reply to 石庭豐 (Seak, Teng-Fong) from comment #102)
Teng-Fong, I believe you're mistaken. The core issue with MIME support is that it's based on code which is essentially a custom object system implemented in C, with polymorphic construction by class name, and other strangeness. It is very tricky to work with, and what's _really_ necessary is getting rid of it in favor of a proper C++'ish MIME library. Problem is, it's like a house of cards which collapses on top of you when you do that. Last decade I had tried to initiate this kind of a rewrite, but it didn't work out. See also Arho's comment #95.

Anyway, something like UnMHT are not really a solution, it's a workaround; and it would not be reasonable to integrate it. It would be yet another layer over the problematic core.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Patrick, could you please provide a link to the source of, or just state/quote the reasoning behind,  the decision to resolve this as WONTFIX?
Flags: needinfo?(mcmanus)
This isn't on anyone's roadmap and nobody has provided a patch for it in 15 years - so its just clogging up bugzilla. That doesn't mean it isn't a good idea - it just means nobody is planning on working on it. If someone works on it then this should be opened as a new issue.
Flags: needinfo?(mcmanus)
But marking it as WONTFIX means that there's been a conscious decision made not to implement it, right? Here it's just a case that nobody's done it. It might be done someday.
How will someone looking for things to work on find this request?
RFC 2557 MHTML is a standarized format for web page archive that deemed to be directly readable for web browsers and is already supported by many browsers such as IE and Chrome.

However, Firefox does not directly support it. Although we currently have addons like MAF (https://addons.mozilla.org/zh-tw/firefox/addon/mozilla-archive-format/) or UnMHT (https://addons.mozilla.org/zh-tw/firefox/addon/unmht/), they are written in XUL/XPCOM, which is going to be deprecated, and their functionality is currently not available in WebExtension, the exclusively addon system in the future.

Therefore, we need a support for RFC 2557 MHTML support, at least a directly reading, and saving at best. This could either be:
1. Directly support for MHTML reading and writing
2. Add enough API to allow an addon like MAF or UnMHT in WebExtension
Now that UmMHT etc. do not work anymore the only add-on supporting mht is Save Page WE, but it only has support for saving pages as mht, not for opening them.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: