Closed Bug 67646 Opened 23 years ago Closed 22 years ago

.xhtml to XML if pref is set (and other suggested hacks)

Categories

(Core :: Networking, defect)

x86
All
defect
Not set
normal

Tracking

()

VERIFIED WONTFIX
Future

People

(Reporter: rbs, Assigned: neeti)

References

Details

(Keywords: xhtml)

Attachments

(1 file)

Currently, local files with the .xhtml extension are treated as HTML by Mozilla.
Since .xhtml files are XML files, this bug is a request to let these files be
treatd as XML files.

Also, there is a long debate on n.p.m.mathml, under "Cannot render MathML" (from
30 Jan 2001 onwards), arguing that in order to support non-XML browsers and have 
portable documents, authors are serving their .xhtml documents with the
text/html MIME type. This is preventing Mozilla from correctly interpreting
fragments that are under different namespaces, such as that of MathML. 

This bug is also suggesting that documents with the .xhtml extension (regardless
the text/html MIME), be treated as XML since Mozilla has the ability to handle
mixed contents. One argument in favor of this is that it is not often possible
for authors to configure the servers to negotiate the desired MIME type
depending on the user-agent. But they are in control of their extension, and
if they use .xhtml, they certainly mean that the file is an XHTML/XML file.
So the support of XHTML this way will give an edge to Mozilla, while still
remaining spec-compliants.
Keywords: xhtml
Keywords: mozilla0.9
Updating the title from ".xhtml to XML" to ".xhtml to XML if pref is set",
to reflect that this behavior should happen only if a pref is set -- as per
the suggestion in bug 68421, tenthumbs@cybernex.net 2001-02-10 13:24.
Summary: .xhtml to XML → .xhtml to XML if pref is set
Target Milestone: --- → Future
Blocks: 15391
Keywords: mozilla0.9
One of the main reasons why people can't use text/xml is that ".xml" is not
sent as text/xml *because they can't configure their server*. Thus, relying
on an extension to do this is not suitable. 

How about this (quoting myself from mail):
>
> Ok, tell you what. (I'm serious here.) How about we add in support so that
> if Mozilla sees exactly the following string as the first thing in a
> text/html document, it treats it as text/xml?
>
>    <!-- Mozilla Magic Message: this document is XML. It should be sent
>         as text/xml, not text/html. However, for compatability with
>         Internet Explorer, which doesn't yet support text/xml properly,
>         I have sent it as text/html. Please ignore the actual Content-Type
>         and treat it as text/xml instead. Thanks. -->
>
> We could even put a URI in the string that points to a page on mozilla.org
> explaining the problem, and require this URI to be in the string (which
> maybe could then be shorter) to trigger our XML parser.

What do you think?
I thought April 1st was a month ago, wasn't it?
I am, actually, quite serious... It does what you want, no?
Ah. yep, it does. But how is that different from what I am suggesting:
that .xhtml goes to the XML parser (right from Necko) if a pref is set.

Your suggestion is that .html + <!-- magic string --> goes to the XML parser.

===
Needless to say, the right fix for everybody is a mod_rewrite in the server :-)
         +-- text/xml  if Mozilla/5 or above
        /
.xhtml <
        \
         +-- text/html otherwise

This is how Paul Gartside is serving the MoW (Markup of the Week). 
I've even written a little perl script for those who can't use mod_rewrite:
   http://software.hixie.ch/utilities/cgi/xhtml-for-ie/

The reason your suggestion is suboptimal, though, is that it requires someone
to change the server configuration to serve .xhtml as text/html in the first
place. One of the main arguments given in www-talk for having some magic hack
is that users can't touch their servers.
Summary: .xhtml to XML if pref is set → .xhtml to XML if pref is set (and other suggested hacks)
I read your page, yep it does provide a hack to the problem (albeit convulated 
and scary to the non expert...).

Back to my hack, I was saying that, as far as Mozilla is concerned, just the 
.xhtml extension would determine the action to be taken. Or are you saying that 
it may happen that the .xhtml file is been served as 'application/octet-stream' 
or something to IE because the server wouldn't know the .xhtml extension in the 
first place? In that case, a _one-time_ addition of 'AddType text/html xhtml' in 
their .htaccess would solve the problem and is easy to document to newbies.

It still seems to me that my hack is easier to implement & maintain for authors 
in the long run. And once IE fixes their code so as to treat .xhtml as XML, then 
it is easy to undo/change the AddType fallback. Maybe by then, the .xhtml 
extension would have already been widely adopted as an extension for XHTML 
(served as text/xml) so that servers would have the expected configurations, and 
authors won't have to do anything else with their existing .xhtml files (no need 
for them to edit and remove the magic string... no need to copy their probably 
growing legacy of XHTML files with .html to .xhtml, etc...). And in the other 
scenario where this takes long to happen, the AddType fallback would still 
be there.
Oh, I completely agree with you as far as that goes -- I'm just saying that
the www-talk crowd were saying that the primary reason they want us to sniff
for XML is that they can't change their server configurations to send back
files with .xml extensions as text/xml.

One problem with your proposal, though, is that there is no "opt out" clause.
How do you make sure that .xhtml files served up as text/html are in fact 
treated as text/html?
By whom? Not Mozilla I guess. Isn't the whole point of this to allow a smooth 
transition of _new_ content to XHTML? Hence there isn't a backward compatibility 
issue here and the thinking goes that .xhtml is for XHTML (i.e., in reality 
text/xml).

Part of the deal is that authors ought to move to XHTML while still been able to 
serve their content in older browsers for some time (not too long, I hope). If 
the goal isn't to encourage a smooth transition to XHTML (text/xml with the 
html doctype), then why bother. But I presume you asked your question just for 
completeness, right? If authors want the good o'ld HTML, let them use .html.
I guess that's ok... Seems a bit presumptious though... I can name images
".html" and serve them up as image/jpeg and expect them to go through the JPEG
decoder, I can serve .png files as video/mpeg and expect them to go through the
helper app code, but I can't serve .xhtml files as text/html and expect them to
go through the HTML parser?

I guess you're probably right, nobody in there right mind will do that...

Are we sure there are no HTML files out there that are named ".xhtml"? (I know
many are named ".shtml" or ".jhtml" or ".phtml"...)
This is a very special hack to xhtml :-) So that its deployment can get 
started... I have tried to think if the hack could backfire later, but nothing
has struk me (as yet). Do you foresee any harm that may occur?  

I haven't encountered any .xhtml that is HTML in my everyday browsing so far.
By the way, the official XHTML mime type is application/xhtml+xml. Mozilla and
Amaya support it, I do not know of others. I recently added support for that,
and I also mapped xht and xhtml file suffixes to that mime type, which should
cause them to go through the XML parser (if read from a local file system).
There is some weirdness here, though, since someone reported that on Windows at
least you can mess with some Windows settings which affect which parser we use...

Anyway, if we decide to go for content sniffing (which I do not like, and it
would also be darn tricky to implement in the current code base since by the
time we see some content we have already committed to the implementation
(XML/HTML) that handles the content), I think it would be best to follow what
XHTML defines a conformant document. Namely, the root element name must be html
and it must designate the XHTML namespace. In such a case, we should parse it as
XML.

http://www.w3.org/TR/xhtml1/


http://msdn.microsoft.com/library/default.asp?URL=/library/welcome/dsmsdn/dude03262001.htm
Oops, the latter link was not meant here. But it might interest you anyway: it
states, among other things, that IE6 passes the W3C CSS1 test suite 100%. How
does Mozilla handle it?
Here's an html page with a .xhtml extension:
http://bugzilla.mozilla.org/404.xhtml
mass move, v2.
qa to me.
QA Contact: tever → benc
To try things out:

- Apply the patch in mozilla/uriloader/base/ and rebuild

- Set the preference in your prefs.js
user_pref("advanced.xhtml_as_xml", true);

- Visit the MathML testsuite which has lots of .xhtml files served as text/html
http://www.w3.org/Math/testsuite/
Can someone _please_ tell me what's wrong with the following solution? It's 
completely standards-compliant, does the right thing, works on all browsers, and 
even makes your History look right. It means we don't have to implement nasty 
hacks (of the sort that web developers despise in IE, and the W3C hates).

You could probably eliminate the NOSCRIPT if you had the meta refresh.

<html>
  <head>
    <meta http-equiv="refresh" content="index.html; 1 second">
  </head>

  <body>

  <SCRIPT> <!--
  if (isMozilla) {
    document.location.replace("http://foo.com/index2.xml");
  } else {
    document.location.replace("http://foo.com/index2.html");
  }
  // -->
  </SCRIPT>

  <NOSCRIPT>
    <center>
      Please click <a href="http://foo.com/index2.html">here</a>.
    </center>
  </NOSCRIPT>
  </body>
</html>

Gerv
Yes, that hack works too. In fact, there are many ways to get out of this hook.
They differ in their implementation, considering both Mozilla's perspective and
authors' perspective.

The upshot of the maintainance entailed by your hack on authors is simply going
to be a nightmare (I thought that was argued on the newsgroup). Compared with
the few lines of my one-time, pref-controlled, hack in the Mozilla' side.
I don't think this is a good feature to implement. The W3C themselves urge
browser developers to obey Content-Type headers regardless of file extension
(see bug 68421).

If the idea is to have this feature controlled by a preference, I don't see how
it will ease the burden on authors, as is claimed in the earlier comment. An
author would have to serve up pages not knowing, and having no control over,
whether the browser is going to parse their documents as XML or HTML.

Maybe this should be "won't fix"?
I think Mozilla should parse local .xhtml files as XML, but I think overriding
the HTTP content type is a slippery slope. Parsing anything that looks like
XHTML but is sent as text/html as XML won't work. Reason:
http://www.oreilly.com/ and the like.

For people who don't have access to the full server configuration but who use a
reasonably cluefully configured Apache, there is a simple solution that required
no scripts. The content type and the markup language of
http://www.hut.fi/~hsivonen/test/xhtml-suite/xhtml-index depend on the Accept
header. It is implemented using Apache's content negotiation. No Perl. No PHP.
No JS.

http://www.hut.fi/~hsivonen/test/multitype/test.var supports three content types.
I'm going to be Evil Standards Bastard and WONTFIX this. After the disaster of
IE's file extension sniffing, people would jump all over us if we tried
something like this. Furthermore, I think that in practical browsing of the web,
this would mostly be annoying; people are already starting to slap XHTML
Transitional doctypes on top of their malformed tag soup (see MSN, for
instance), and in practice people would have to keep switching this pref back
and forth to alternately get their XHTML + namespaces to work and keep XHTML
Tag-Soup from breaking.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WONTFIX
1. IE does not render XHTML.
2. We can assume that any document with an XHTML doctype served as text/html
   is XHTML.

So let's decide based not on the extension but on the doctype  sniffed from the
document.  Discussion of this issue continues in bug 109837.
As Heikki mentioned above, Mozilla *is* parsing local .xhtml and .xht files as
application/xhtml+xml now.
And yes, disobeying MIME type provided by the server is *bad* idea.

verifying WONTFIX
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.