Closed Bug 309438 Opened 19 years ago Closed 18 years ago

Accept: header too long on account of text types

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9alpha1

People

(Reporter: brendan, Assigned: Biesinger)

References

Details

Attachments

(1 file)

See bug 240493.  The text types are

text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8

These were added long ago, in a couple of steps.  One big jump came in bug
83458, but bug 58040 started the bloat from */*.  The comments are funny to read
after all these years -- WAP/WML?  It is to laugh.

[Ok, amazon.com did serve content, and may still for all I know, in some such
format, as well as in text/html -- but they did so based only on a WAP client
saying it accepts WML, I'm sure.  No way would amazon risk sending that stuff to
a typical browser user.]

What can be done, at this late date?  It's particularly grating to see the two
XML types (which is it, text or application?  Grrr).  application/xhtml+xml is
important to some who think the web can be tilted away from HTML by clients such
as Firefox (a pipe dream, IMHO).

q=0.9 on text/html is evidently required so we can advertize our virtuousness to
servers who can send XML in addition to more-or-less-the-same HTML. But naughty,
dirty, sinful HTML is not going away, ever; there are billions and billions of
pages of it out there.  So why are we burning cycles and fiber on
text/html;q=0.9?  What good are we doing by that little bandwidth "sin" tax?

IE sends no text types in its Accept header, but it's hardly the thing to
compare ourselves to, I know (yeesh: it sends a bunch of Office application/*
junk in my new XP box, after too many image types that are standard now).

Safari sends */*.  Good for it.

I don't have Opera handy.

Just to fan the flames, or quell them again, I'll repeat something I wrote in
bug 240493: client-driven content negotiation on the web is badly broken, and
not just due to evil/lazy browser implementors.

The protocol does not scale, so no one wants to let too many types creep in. 
This leads to staleness and (minor) bloat, with reform coming, if possible, only
once in a blue moon.  Is it time for reform, or are we stuck with all these text
types?

/be
(In reply to comment #0)
> q=0.9 on text/html is evidently required so we can advertize our virtuousness to
> servers who can send XML in addition to more-or-less-the-same HTML.

And we still don't load XHTML incrementally, so for our users, HTML is actually
better.
Opera 8.5 on Mac sends this:

text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg,
image/gif, image/x-xbitmap, */*;q=0.1
I’d say get rid of at least text/xml and text/html.

I’d personally like application/xhtml+xml to stay there. Aside from your
personal opinion of XHTML, there are many people who want to use XHTML and I’d
say it is the most prominent case where the Accept: header is actually *used*.
Also, without this particular part of the accept header, using XHTML becomes
impossible until all major browsers have implemented XHTML and are widespread
enough to use it. That, or serve XHTML as text/html, which I think you agree is
an even worse solution.


~Grauw
In the contrary to that last paragraph of mine, text/html serves no purpose at
all. It is supported by everyone. No-one checks the accept header for it. Same
for text/plain. As for text/xml, application/xml is already there, I do not see
why a deprecated (is it? anyways, the application/ one is preferred) MIME type
should be in the Accept header. I do not have a particular opinion about
application/xml.

~Grauw
The thing with regard to application/xhtml+xml is a fact, not an opinion.
text/xml is not yet deprecated. It is intended that the next XML media types RFC
will do so.

I think we should prefer text/html at least over application/xml. I have seen
some XML representations of documents on the W3C just because the HTML variant
is not preferred.
we should keep the XHTML mime type in some form, imo. it allows servers to send
us mixed-namespace documents if they so desire.

If you want to rip out anything that's not used by the majority of web content,
then we might as well remove most of our CSS and DOM support.
(In reply to comment #6)
> we should keep the XHTML mime type in some form, imo. it allows servers to send
> us mixed-namespace documents if they so desire.

Agreed.

> If you want to rip out anything that's not used by the majority of web content,
> then we might as well remove most of our CSS and DOM support.

That's fallacious, a straw man, since we're not talking about ripping out
anything substantive in the web platform.  We're talking about how to get rid of
ubiquitous and q=1 text types from Accept.  Let's stick to the subject.

How about this for text types?

  application/xml,application/xhtml+xml

Questions:

1.  Can we lose text/xml as Laurens proposes?

2.  What about q < 1 for the above?  As dbaron points out, we don't do
incremental layout for XHTML.  If the above two types are in our Accept: header,
is there some standard Apache configuration option that would tend to send us
XHTML instead of HTML in the absence of a lower q for XHTML?

/be
We would need to mention text/html as well then. Apache knows that when there
are two documents, foo.xhtml and foo.html, Mozilla prefers the .html (text/html)
variant. If we do not mention text/html, Apache thinks Mozilla does not support
it and gives back the foo.xhtml.

I also think we should prefer text/html and application/xhtml+xml over
application/xml. As those are more semantically rich document formats.
(In reply to comment #8)
> We would need to mention text/html as well then. Apache knows that when
> there are two documents, foo.xhtml and foo.html, Mozilla prefers the .html
> (text/html) variant.

Does it know this for Mozilla (Gecko) UAs, or all UAs?  I.e., is it hardwired to
send foo.html without any q parameter for either type in Accept:?

Should we Accept: application/xhtml+xml at q < 1 to prefer text/html to it?

> If we do not mention text/html, Apache thinks Mozilla
> does not support it and gives back the foo.xhtml.

Yeah, that makes sense.  Would it still do that with */* at q=1 at the end? 
Just asking to make sure I understand the RFC (and so does Apache).  Of course
we want q < 1 for */* at the end.

> I also think we should prefer text/html and application/xhtml+xml over
> application/xml. As those are more semantically rich document formats.

I agree that they are richer, but it still sucks to have to spell out text/html
just so we can talk about the other, far less common, types.

So we're down to eliminating text/plain and text/xml?  What a world.

/be
(In reply to comment #2)
> Opera 8.5 on Mac sends this:
> 
> text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg,
> image/gif, image/x-xbitmap, */*;q=0.1

It sounds like we want this minus the unnecessary (png, jpeg, gif, x-xbitmap
even [who cares!]) types, and with spaces squeezed out, and q=0.5 for */* (or is
there a good reason for Opera's 0.1?).

Then we'll be more bandwidth- frugal than Opera, and as XHTML-ready.

(Yet I repeat my cheer/taunt: Go Safari! :-P)

/be
(In reply to comment #9)
> Yeah, that makes sense.  Would it still do that with */* at q=1 at the end?

Not when application/xhtml+xml has a lower q value. This would work I guess:

# text/html,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7
How's this for an argument?

The "bandwidth tax" argument doesn't hold much water unless the increased length
causes a significant proportion of our requests to exceed the size of a single
packet when otherwise they wouldn't. If 99% of our requests are single-packet
already, there's not much gain in shrinking Accept: further.

Gerv
(In reply to comment #12)
> If 99% of our requests are single-packet already, there's not much gain in 
> shrinking Accept: further.

To know that wouldn't you need stats on plug-in installs in Firefox and
knowledge of what, if anything, those plug-ins add to Accept:. Does anyone have
anything like that?
I think we should do something like what comment 11 proposes for 1.9a1.

/be
Flags: blocking1.9a1?
I guess that pointing out that we prefer PNG over GIF (as the current header does) only is relevant if any browser in the world still doesn't support PNG. In 2005, I don't know of one.

So, as comment 11 says:
Accept: text/html,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7

We should be ready to add SVG when the SVG team thinks our support is solid enough.

Gerv
I agree that the content negotiation concept as defined in HTTP is broken.

Some historical perspective:
Even though Accept: application/xhtml+xml is most often used by XHTML fans merely to deprive Firefox users of incremental display and to show an occasional yellow screen of death, that is not the use case for which the type got its place in the header.

application/xhtml+xml was added to the Accept header at the time when MathPlayer in IE did not support the real XHTML type and this Mozilla-side change made negotiating XHTML+MathML between Mozilla and IE+MathPlayer possible using the usual Apache modules without CGI. The alternatives that were suggested were much worse.

The relevant historical references are:
http://groups.google.com/group/netscape.public.mozilla.mathml/browse_thread/thread/f0d7442075946397/
http://groups.google.com/group/netscape.public.mozilla.mathml/browse_thread/thread/ab83de837ff21576/
http://groups.google.com/group/netscape.public.mozilla.mathml/browse_thread/thread/a2dd34dc398590f2/

Nowadays, AFAIK, you can serve XHTML+MathML without content negotiation as application/xhtml+xml to both.
Assignee: darin → nobody
Flags: blocking1.9a1? → blocking1.9-
Assignee: nobody → cbiesinger
Attachment #241904 - Flags: superreview?(darin)
Attachment #241904 - Flags: review?(darin)
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.9alpha
Attachment #241904 - Flags: superreview?(dbaron)
Attachment #241904 - Flags: superreview?(darin)
Attachment #241904 - Flags: review?(darin)
Attachment #241904 - Flags: review+
Comment on attachment 241904 [details] [diff] [review]
patch per comment 11

sr=dbaron assuming we still send image/png for image requests.

Also, please file a bug on bumping application/xhtml+xml back to 1.0 once incremental loading of XML lands.
Attachment #241904 - Flags: superreview?(dbaron) → superreview+
yep, the accept header for images is at:
http://lxr.mozilla.org/seamonkey/source/modules/libpr0n/src/imgLoader.cpp#214

filed bug 361892

checked in:
Checking in all.js;
/cvsroot/mozilla/modules/libpref/src/init/all.js,v  <--  all.js
new revision: 3.662; previous revision: 3.661
done
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
I won't pretend to understand all the issues, but this caused bug 364352. 

Should the techniques people have been using to switch between HTML/XHTML depending on the accept header still work?

For example, see the Apache .htaccess rules as documented in "XHTML's Dirty Little Secret" no longer work -- article URL is 
http://www.xml.com/pub/a/2003/03/19/dive-into-xml.html
Depends on: 364234
(In reply to comment #20)
> I won't pretend to understand all the issues, but this caused bug 364352. 
> 
> Should the techniques people have been using to switch between HTML/XHTML
> depending on the accept header still work?

Not if it was broken.

> For example, see the Apache .htaccess rules as documented in "XHTML's Dirty
> Little Secret" no longer work -- article URL is 
> http://www.xml.com/pub/a/2003/03/19/dive-into-xml.html

Yes, those rules are buggy.  See http://www.intertwingly.net/blog/2006/12/12/Gran-Paradiso .

/be
> http://www.xml.com/pub/a/2003/03/19/dive-into-xml.html

From the article:

  RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0

As a substring match, that's spectacularly buggy.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: