Closed Bug 201195 Opened 21 years ago Closed 17 years ago

Generic XML MIME Types (application/xml and text/xml) should have lower priority than specific XML MIME Types like XHTML (application/xhtml+xml) in HTTP Accept Request Headers

Categories

(Core :: Networking: HTTP, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: Christian.Hujer, Unassigned)

Details

(Whiteboard: looking for new networking owner)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030401
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030401

The HTTP Accept-Header of Mozilla's HTTP Requests currently looks like this:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1
That means that text/xml, application/xml and application/xhtml+xml are all
equally preferred.
That's not really true, not intended and not convenient. The generic types
text/xml and application/xml should not be preferred over the more specific type
application/xhtml+xml (and image/svg+xml in SVG-enabled browsers).
Instead they should have slightly lower priority.

I suggest changing the Accept header like this:
application/xhtml+xml,text/xml;q=0.95,application/xml;q=0.95,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1

Bye

Reproducible: Always

Steps to Reproduce:
1.Send an HTTP Request
2.Look at the Accept header


Actual Results:  
Mozilla prefers generic XML MIME types (application/xml and text/xml) with same
precedence as specific ones (e.g. application/xhtml+xml).

Expected Results:  
Specific XML-based MIME Types (e.g. application/xhtml+xml) should be preferred
over generic XML MIME Types (application/xml and text/xml).
Confirming as an apparent non-dup valid enh req.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Gerv, this is all you
Relevant comment from n.p.m.browser:

Seairth Jacobs wrote:
> I didn't say that one should be preferred over the other.  I said that
> "text/xml" should no be preferred over "text/html".  Give them the same
> quality value.  That way, the server can make the decision.  If a server
> decides to return text/xml plus stylesheet, it can.  If it still wants to
> return text/html, it can.
> 
> If you want to give text/xhtml+xml a higher quality than text/html, that
> makes sense.  In this case, you are specifying a preference between two
> specific vocabularies.
> 
> As for when "text/xml" should have a higher preference to "text/html", my
> answer is "never".  Like it or not, legacy HTML formats are going to be on
> the web for a long, long, long time to come.  Making them second-class
> citizens to the generic text/xml format will only cause users to find
> another browser that doesn't (again, imho).

Gerv
i'm keen on anything that will reduce the length of our Accept header (the fewer
bytes we send with each and every request the better), so if i follow gerv's
point, he's suggesting that we do away with the q=0.9 following text/html?  in
fact, i don't really understand why we make such a big deal out of saying we
prefer this type more than that type.  heck, we are simply able to handle any of
these types, so why differentiate?  we don't need to tell servers that xhtml is
better than html, they already know that.  /sigh/
Target Milestone: --- → Future
Why would you want to do content negotiation with a specific type vs. a generic type?Mozilla prefers XHTML over HTML in order to allow content negotiation of real XHTML + MathML vs. IE + ActiveX soup.OTOH, the XML content sink doesn't support incremental loading, yet. So in that sense it would make sense to prefer text/html.The entire HTTP request fits in one TCP packet and dropping " en-US;" from the UA string would save us 7 bytes. After all, sniffing for UI language is bogus, because servers should use Accept-Language.
The original point in this bug suggests a change to (with some edits to reduce
length):

application/xhtml+xml,text/xml;q=0.9,application/xml;q=0.9,text/html;q=0.8,
text/plain;q=0.7,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1

In other words, xhtml+xml first, then xml, then html, then plain.

The comment I quoted from the newsgroup suggests making text/xml (and presumably
application/xml) the same priority as text/html; i.e. changing the 0.8 to a 0.9
in the string above.

Re: the length, I agree (and have been working hard to make it so) that we
should keep the length down. But, we had a big discussion about this last time
round and the current set are all in there for good reasons.

Gerv
Re: length

I think saving a few bytes is not a good reason for fiddling with the q values.
There are better ways of saving a few bytes here and there such as omitting
Accept-Language on style sheet and script requests and omitting Accept-Charset
on image requests.

To put things into perspective, Mozilla's HTTP requests are tiny compared to the
HTTP requests WAP gateways can make. A Nokia cell phone accessing a Web site
through a WAP gateway can advertise dozens of MIME types in the Accept header!
(Take a look at http://nds.nokia.com/uaprof/N3650r100.xml and imagine most of
that information plus some more formulated as HTTP request headers.)

Of course, the difference is that Mozilla's HTTP headers travel end-to-end in
any case--even when Mozilla is behind an old-fashioned PPP dialup connection.


Re: what to accept and with what q values

Should Mozilla be advertising application/xml and text/xml in the Accept header
at all? As for sending out q values, does it make any sense to do content
negotiation with the other alternative being */xml?

Mozilla really should prefer text/html over arbitrary, a priori unknown in
meaning, make-up-tags-as-you-go XML. HTML tags carry meanings that are generally
known. (Arguably the meanings could be better defined, but that's not the
point.) OTOH, private XML vocabularies may seem to have meaning to whoever wrote
the document, but for something that doesn't have any idea of the meanings of
the elements a priori, the private vocabularies are effectively meaningless.
(Adding a CSS presentation to a document tree with unknown namespaces and
generic identifiers does *not* fix this basic problem.)

So *if* the generic XML content types are kept in the Accept header, it would
actually make sense to make their q values lower that the q value of text/html.

Is explicitly accepting */xml any more useful than explicitly accepting
application/octet-stream? If the resource typed */xml is the only resource
available and there's no negotiation, */* catches it anyway and the resource is
sent to Mozilla. But are there any good use cases where there are two or more
resources one of which has the type */xml and content negotiation makes sense?
WE could add multipart/x-mixed-replace.. Tahts somethign people are likly to
want to sniff on.

I really don't think that we're going to have people doing server side sniffing
for bmp files. We should only have html(/xhtml/etc), and then other 'uncommon'
file formats. Then in 5 years time when all browsers upport mngs, and noone
bothers to do server side parsing, we can remove that, and add something else.

That isn't the intent of the http sepc, mind you, but I think that it does make
logical sense. I think.
One thing about the length / size of the header: 
As long as the whole header does not exceed one MTUs / TCP packets, I really wouldn't 
bother making the Accept header a bit longer. 
The current header of Mozilla is this: 
GET / HTTP/1.1 
Host: localhost:3129 
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030401 
Accept: 
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1 
Accept-Language: de-de,de;q=0.8,en-gb;q=0.6,en;q=0.4,pl;q=0.2 
Accept-Encoding: gzip,deflate 
Accept-Charset: UTF-8,* 
Keep-Alive: 300 
Connection: keep-alive 
 
The header size is approx. 400 bytes. That's not half of an MTU. Wether the request header is 
400 or 800 bytes does not make so much difference. Even with an 28.800 bps Modem you get 
3600 bytes per second. 
 
So 100 bytes more take only 1/36 seconds for a 28.800 bps Modem, which is 28 ms. For 10 
subsequent requests, e.g. a page containing 9 images, it's 1000 bytes more, which still takes 
less than half of a second to transmit. 
 
I'd really not be picky about 10 or 20 bytes more of a header, especially if the difference 
between the current size and the previous size as less then the size of some URIs. 
 
 
Greetings 
 
/Christian 
> WE could add multipart/x-mixed-replace.. Tahts somethign people are likly to
> want to sniff on.

Problem is, most browsers which do support it don't advertise it.

It's becoming clear to me that the Accept: header should contain non-universal
formats, and should contain them as soon as we start supporting them (for
avoidance of the above situation.) I suppose this is an argument for retaining
MNG (see bug 189872).

Henri: no-one is suggesting fiddling with q values to save length; my slightly
reduced version and the original mean exactly the same thing.

Christian Wolfgang Hujer: the fact that it fits into a single packet is not the
end of the story. If you load a complex web page, we can issue a lot of requests
at once. On a slow connection, every byte counts a little bit. We should
certainly not say that we can keep filling up Accept: until we hit the MTU limit. 

Gerv
Gerv, 
 
even if you issue several requests at once using HTTP/1.1 keep-alive, the speed from the 
server to the client will always be the bottleneck, not the other way round. You issue the first 
request, and after issueing the 4trh request you're probably still receiving response data from 
the first request. 
I don't wanna say okay, let's create a 1 MB request, but let's calculate the difference between 
a 420 and a 440 bytes header: it's 20 bytes per request. On 100 requests, it is 2000 bytes 
difference, that's two kB, with an old 28.800 modem it's still less than a second. 
The bottleneck is the responses, not the requests. 
I didn't want to say let's fill up the MTU limit. I just wanted to say hey, don't be picky about 10 
or even 50 bytes in the Accept header. 
 
/Chris 
Without wanting to complicate things further... is there any chance the XUL
media type could be added? ;o)

There is presently a discussion on news:netscape.public.dev.xul on how to best
detect a XUL-capable software agent -- see
http://groups.google.co.uk/groups?as_umsgid=5IpDa.10080%24lL2.142082%40news.chello.at


At present, server-side code has to check the "User-Agent:" header; here's a
Java/J2EE example:

    String agentHeader = request.getHeader("USER-AGENT");
    if (null != agentHeader && -1 !=
          agentHeader.indexOf("Gecko/")) {
      out.println("XUL is supported.");
    } else {
      out.println("XUL is not supported.");
    }

A saner approach would be if we could examine the "Accept:" header instead:

    String acceptHeader = request.getHeader("ACCEPT");
    if (null != acceptHeader && -1 !=
          acceptHeader.indexOf("application/vnd.mozilla.xul+xml")) {
      out.println("XUL is supported.");
    } else {
      out.println("XUL is not supported.");
    }

Taking Gerv's suggestion, removing the presently defunct "video/x-mng" then
adding "application/vnd.mozilla.xul+xml" with the same level as
"application/xhtml+xml", we have:

application/xhtml+xml,application/vnd.mozilla.xul+xml,text/xml;q=0.9,
application/xml;q=0.9,text/html;q=0.8,text/plain;q=0.7,image/png,image/jpeg,
image/gif;q=0.2,*/*;q=0.1

Thanks

Dave

P.S. "OS" should be changed from "Linux" to "All"
Now that we have Gecko browsers without XUL support, I can see an argument for
adding XUL. The problem is, of course, that it's not there in all the previous
Mozillas, so checking for it won't be very sensible.

Checking for "Gecko/" and then !"Camino|Chimera" might do the trick, I suppose.

Have we missed the boat on adding the XUL media type?

Gerv
gerv: what about applications that embed gecko, but don't support XUL?  the list
is not limited to chimera and camino... more to the point, the list of non-XUL
apps could increase over time, so sniffing for the application name just doesn't
seem like a good solution at all to me.  sniffing for known applications works
if we can say that post |this version| of gecko, XUL will be advertized in the
Accept header.  that way, servers can read the Accept header to check for XUL,
and if it is not there, they can optionally fallback on guessing based on the UA
string.
It would be important to make sure that the Accept-string really matches the
capabilitites. For example, tehre's bug 188376, which wouldn't exist if
compile-time options could turn stuff on and off in the Accept header.

If XUL goes in the default Accept header, how do we make sure J. Random Embedder
takes it out if the embedding app doesn't do XUL?
> If XUL goes in the default Accept header, how do we make sure J. Random Embedder
> takes it out if the embedding app doesn't do XUL?

J. Random Embedder's browser doesn't work properly on sites which are sending
XUL to XUL browsers. J. Random Embedder's customers get irritated with J. Random
Embedder.

The same argument applies/applied to MNG, which could be compiled out for
embedding. Short of some very smart build-time way of building the header, we
just need to make sure it's documented that embedders should check that the
Accept header is still valid for their build.

Gerv
Sadly, it doesn't work that way. Even a product released by mozilla.org itself
(Chimera/Camino) had (still has?) a bogus Accept header for months.
Henri: so what do you suggest? Some way of assembling the Accept: header at
build time, based on the build options?

Gerv
Based on what I've seen people using or wanting to use, I'd say we need the
following types in the accept header:
XHTML (done)
MathML (unless --disable-mathml is specified)
XUL (unless --disable-XUL is specified)
SVG (if --enable-svg is specified)

Being able to detect on MathML is significant because not all XHTML browsers
support MathML (in fact, only Mozilla and Firefox support MathML; Camino builds
without and no other browser has support). Since MathML is namespaced directly
into the page, there is a strong need to do server side content-type negotiation
based on the presence, or not, of MathML. At present, this means doing
user-agent detection (see e.g.
http://golem.ph.utexas.edu/~distler/blog/archives/000309.html for the type of
hacks this involves). 

Arguing that not including XUL in the accept header because "some builds did
support XUL but don't have it in the accept header" seems like a fallacy to me.
Putting XUL in the accept header can't possibly make the situation worse; if
anyone is actually serving XUL, they are still free to detect based on UA or
whatever they are using at present. However putting XUL in the accept header
makes it easier for people to design cool XUL-enabled features and be sure that
people without XUL won't get junk.

If SVG support is compiled in, that should be advertised as well (perhaps with a
low q value); after all an SVG enabled Mozilla browser does accept SVG, even if
the support isn't perfect. It also gives those who would like to serve SVG the
oppertunity to do so; at present there is no obvious way to distinguish browsers
where it is supported from those where it is not. This also means that SVG sites
are more likely to work with Mozilla if SVG ever makes the default builds.
I filed Bug 234170 on allowing the accept header to be set at build time.
> XHTML (done)
> MathML (unless --disable-mathml is specified)

Reiterating comment #5:
Mozilla prefers application/xhtml+xml over text/html, because
application/xhtml+xml can contain MathML (see old n.p.m.mathml archives). When
MathML is not enabled, preferring application/xhtml+xml is (arguably) wrong,
because then application/xhtml+xml provides no added value but text/html does
provide added value (incremental rendering).

Also, I'm inclined to think application/xml and text/xml are non-sensical as
conneg alternatives.
> Mozilla prefers application/xhtml+xml over text/html, because
> application/xhtml+xml can contain MathML

But a build containing SVG but not MathML would also 'prefer'
application/xhtml+xml over text/html because it could accept SVG content in the
XHTML. Opera or Safari might decide to 'prefer' application/xhtml+xml over
text/html, especially if support for, say, SVG appears in those browsers (or
even if it doesn't). None of this helps someone who wants to send MathML to
browsers that support it and png to browsers that don't. In fact, one might have
three versions of a page, one with MathML equations, one with SVG equations and
one with PNG equations. Content negotiation should allow one to decide which of
the three pages a browser can actually handle. 

Having said that, I agree that, in general, application/xhtml+xml should be
prefered in builds that support MathML or SVG. I just don't think that's sufficient.
in all mozilla builds that don't use --disable-xul application/xhtml+xml can
contain xul as well...

(or so I hope. I haven't tested...)
See also bug 125682.

Gerv
Untill everything is sorted out to be perfect, it would be nice if text/html is
preferred over generic XML. Example where this is annoying:
<http://www.w3.org/2001/tag/doc/versioning>. It would also be nice if
application/xhtml+xml is preferred over text/html. (Like it is today.) Not only
for SVG and MathML builds but also because websites use it to serve Mozilla
XHTML with a somewhat more advanced style sheet and IE HTML.

Henri, are there bugs for Mozilla sending Accept-Language for style sheets,
scripts and such?
OS: Linux → All
Hardware: PC → All
Anne, I am not aware of a bug about Accept-Language. Strictly speaking, sending
Accept-Language is not wrong, because images, style sheets and scripts can
contain things that depend on the natural language skills of the reader and in
theory someone somewhere could use conneg with them.
We currently have the following:
|text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5|

How about changing that too:
|application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,application/xml;q=0.7,text/xml;q=0.6,image/png,*/*;q=0.5|

It is 12 characters longer and is a bit better IMHO. However, I think we can
drop application/xml and text/xml from the list. (At least, this solves the
problem when there is both a meaningless XML file and a semantic HTML one.)
Anne: but which is more likely - the case you mention, or the case where there's
both an legacy HTML and an modern XML version of a resource? That XML may well
not be XHTML.

I wouldn't want to rearrange the order of entries in the Accept: header (as
opposed to adding new ones) without a lot of discussion, because of potentially
breaking stuff - and it's really not a high priority right now...

Gerv
I think my case is more likely. There is no such thing as "modern XML". It is
useless for people to use XML without any semantics on the web without having a
HTML or XHTML alternative, which can be understanded by most browsers. (Althouh
Google only understands the semantics of HTML (text/html) documents.)

With who would this need to be discussed?
Assignee: darin → nobody
Flags: blocking1.9a1?
Flags: blocking1.8.1?
For what it's worth, the original rationale for these changes was bug 58040 comment 28.
Flags: blocking1.9a1? → blocking1.9+
This bug has been plussed for 1.9 but there doesn't seem to be any consensus on what change to make, if any...

Gerv
Whiteboard: looking for new networking owner
Fixed by checkins for bug 361892 and bug 309438
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.