Last Comment Bug 109553 - [FIX]Implement support for HTTP 1.1 Content-location header
: [FIX]Implement support for HTTP 1.1 Content-location header
Status: RESOLVED WONTFIX
:
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: Trunk
: All All
: P3 normal with 2 votes (vote)
: mozilla1.7alpha
Assigned To: Boris Zbarsky [:bz] (Out June 25-July 6)
: Hixie (not reading bugmail)
Mentors:
: 230035 303552 503078 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2001-11-10 14:05 PST by Andreas Otte
Modified: 2009-07-08 08:00 PDT (History)
12 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch to do what I suggested in comment 11 (3.21 KB, patch)
2004-01-05 08:50 PST, Boris Zbarsky [:bz] (Out June 25-July 6)
jst: review+
darin.moz: superreview+
Details | Diff | Review

Description Andreas Otte 2001-11-10 14:05:16 PST
Currently handling of HTTP 1.1 Header Content-Location from RFC 2616 is not
supported. It is similar to the handling of the deprecated Content-Base header,
but can also handle relative urls.

See bug 94096 about a related discussion based on the removal of Content-Base
support.
Comment 1 Christopher Hoess (gone) 2001-11-10 14:29:07 PST
->Networking: HTTP
Comment 2 Andreas Otte 2001-11-11 00:54:16 PST
Sorry, this has to be owned by whoever owns the HTMLContentSink and that is not
networking, back to layout.
Comment 3 Kevin McCluskey (gone) 2002-01-25 15:10:28 PST
Harish do you own the content sink?
Comment 4 timeless 2002-11-01 12:22:38 PST
from cvsblame it appears that jst owns that file.
Comment 5 Johnny Stenback (:jst, jst@mozilla.com) 2003-03-19 11:54:28 PST
Mass-reassigning bugs.
Comment 6 Hixie (not reading bugmail) 2003-08-15 00:59:07 PDT
qa -> me
Comment 7 Hixie (not reading bugmail) 2003-09-01 08:26:04 PDT
->parser. this isn't dom.
Comment 8 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-01-04 08:11:39 PST
*** Bug 230035 has been marked as a duplicate of this bug. ***
Comment 9 Anne (:annevk) 2004-01-04 14:09:04 PST
From the weblog of Mark Pilgrim
(<http://diveintomark.org/archives/2004/01/02/relative-uris>:

> Neither IE 6 SP1 nor Mozilla 1.6 Beta support the Content-Location: header,
> mainly because Microsoft web servers are so buggy
(<http://support.microsoft.com/default.aspx?scid=http://support.microsoft.com:80/support/kb/articles/q218/1/80.asp&NoWebContent=1>)

> that respecting the Content-Location: header would cause about 10% of
IIS-powered sites to break horribly
(<http://www.securityspace.com/s_survey/data/man.200312/firewalled_cloc.html>).

Maybe WONTFIX is good idea?
Comment 10 Hixie (not reading bugmail) 2004-01-04 14:19:49 PST
No. Opera supports this fine.
Comment 11 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-01-04 23:39:54 PST
So what happens with multiple content-location headers as follows:

Content-Location: http://foo.com/bar/baz.html

<base href="http://bar.com/foo/" />
<meta http-equiv="content-location" content="bar.html" />

Do we end up with the document basically having a base of
http://bar.com/foo/bar.html ?  Or what? It seems to me that this whole thing is
badly underspecified...

That is, would it be sufficient so simply resolve the content-location relative
to whatever the current base URI is and set that as the new base URI?
Comment 12 Anne (:annevk) 2004-01-05 01:45:21 PST
Another quote <http://diveintomark.org/archives/2004/01/02/relative-uris>:

> Section 14.14 of RFC 2616 defines the Content-Location: HTTP header. If an
> HTML document is served without a BASE element but with a Content-Location:
> HTTP header, then that is the base URI (test page). Just to make this more
> interesting, Content-Location: may itself be a relative URI, in which case it
> is resolved according to RFC 2396, with the URI of the HTML document as its
> base URI. The resolved URI then serves as the base URI for other relative URIs
> within the HTML document.

It looks to me like the BASE element is more important, since the quote states
that if the BASE element is not set the content-location must be used.
Comment 13 Hixie (not reading bugmail) 2004-01-05 03:01:24 PST
Let's please stick to quoting specs, not blogs. Blogs do not have any normative
status and will only confuse matters further.

RFC2396, section 5.1. "Establishing a Base URI", is what defines the
interactions of the various levels:
   http://www.ietf.org/rfc/rfc2396

In the presence of multiple sources at the same level, I would suggest doing
what bz suggested, namely just resolving each base relative to the previous base
and then setting that new URI as the new base for future elements.
Comment 14 Christoph Schneegans 2004-01-05 06:05:47 PST
Hixie, Opera only respects the "Content-Location" header if "protocol, server and port (...) match", see <http://groups.google.com/groups?selm=721jbvgrk9et2fpdan5m729spcfqbvke71%404ax.com>.
Comment 15 Hixie (not reading bugmail) 2004-01-05 07:11:14 PST
Yes, that is a known bug. I'm dealing with the bookwire people to get them to 
resolve their problem.
Comment 16 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-01-05 07:52:47 PST
Ian, you propose supporting relative values for <base href="">?  I don't believe
that's a good idea -- we don't support it now and neither do any other browsers;
furthermore the HTML spec clearly says the href must be an absolute URI.
Comment 17 Hixie (not reading bugmail) 2004-01-05 07:57:25 PST
No, indeed, I'd recommend keeping <base> working only for absolute URIs as per 
the spec.
Comment 18 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-01-05 08:50:59 PST
Created attachment 138422 [details] [diff] [review]
Patch to do what I suggested in comment 11

This passes the testcase mentioned in bug 230035
Comment 19 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-01-05 08:54:14 PST
Taking.
Comment 20 Hixie (not reading bugmail) 2004-01-05 11:28:29 PST
(Does that patch also implement the real HTTP Content-Location header? Or only
the META one? I hope it does both, because then that would mean our code was
well designed, but...)
Comment 21 Johnny Stenback (:jst, jst@mozilla.com) 2004-01-05 11:45:53 PST
Yeah, that'll do both headers and meta tags. Amazing isn't it? :-)
Comment 22 Johnny Stenback (:jst, jst@mozilla.com) 2004-01-05 11:46:50 PST
Comment on attachment 138422 [details] [diff] [review]
Patch to do what I suggested in comment 11

r=jst
Comment 23 Hixie (not reading bugmail) 2004-01-05 11:48:19 PST
woo :-)
Comment 24 Darin Fisher 2004-01-07 12:01:21 PST
Comment on attachment 138422 [details] [diff] [review]
Patch to do what I suggested in comment 11

sr=darin
Comment 25 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-01-07 21:23:03 PST
Checked in.
Comment 26 R.K.Aa. 2004-01-16 02:46:41 PST
Something went wrong here. Take a look at bug 231072.
Comment 27 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-01-16 09:41:22 PST
Nothing seems to have gone wrong that I can see, past a broken server....
Comment 29 R.K.Aa. 2004-03-25 09:22:06 PST
Also broke a bank: bug 238626

I really wish this fix was backed out!! It's a dreadful experience to wait over
10 minutes to load pages. In the case of bug 238626 one also suffers from ever
more timeout alerts after 5-10 minutes have passed, but the page does continue
"loading"/spinning. 

I suspect any other "fix" causing 10 minute performance hit per page, would have
been made blockers and the offending code backed out.
Comment 30 Hixie (not reading bugmail) 2004-03-25 12:45:56 PST
The bug you are seeing is that we block on stylesheets. That is what should be
fixed, this particular fix just made it more visible in certain cases where we
were doing the wrong thing before.
Comment 31 R.K.Aa. 2004-03-25 15:01:13 PST
When the page finally loads one ulcer later, it doesn't even USE a stylesheet.
It would have used one without this fix. So things got both slower and uglyer.

The problems this bug triggered just spawned bug 238654 btw, but more should
probably be filed.
Comment 32 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-04-29 14:14:15 PDT
I've backed out this patch due to all the servers that send bogus
content-location headers and the fact that the HTTP spec's treatment of
content-location for content-negotiated pages breaks anchor traversals on such
pages.  See bug 238654 and its various dependencies and dups.

The other option would be to implement what Opera implements, and that feels
like a lot of work for very low benefit, so I don't plan to do that.
Comment 33 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-04-29 14:19:48 PDT
I guess this is wontfix, instead.
Comment 34 Georg Maaß 2004-05-01 04:55:58 PDT
(In reply to comment #32)
> I've backed out this patch due to all the servers that send bogus
> content-location headers 

This is not an arugment to implement the client also buggy. If the server is
buggy, then the client may reject the server and display an alert asking the
user whether to send an email to the server administrator to replace the buggy
server by a server without bugs. It is not the job of the client to fix the bugs
of the server. If the server serves stuff that does not conform to the RFCs
instead of persenting an error message to the author, then this behaviour is a
bug in the server and does not target you, because your client simply has to
reject such buggy resonses.

> and the fact that the HTTP spec's treatment of
> content-location for content-negotiated pages breaks anchor traversals on such
> pages.  See bug 238654 and its various dependencies and dups.

This is also the responsibility of the page authors. If you can not detect the
problem automatically, then ignore it. It is not your duty to fix authors
errors. If you can automatically fix it, then you may implement an auto fix as
configurable goody. If you can detect the problem but can not fix it, then
simply reject such buggy pages and alert for whether to send an email to the
author or server administrator.
 
> The other option would be to implement what Opera implements, and that feels
> like a lot of work for very low benefit, so I don't plan to do that.

Emulation if other browsers may be an optional goody but nothing more. It is not
a duty. Duty is only to support and force the standards. If the standard is
buggy (i.e. unclear) then the standard must be reviewed and rewritten, which
results in a errata RFC or a completely new RFC.

If such a necessary review of a standard blocks fixing, then you should not set
the bug to WONTFIX but set the target milestone to FUTURE, because it is
unknown, when the fix will be done.
Comment 35 Hixie (not reading bugmail) 2004-05-01 05:09:47 PDT
George: when people browser the Web with Mozilla 1.7 and find that thousands of
pages render incorrectly but work fine in Mozilla 1.6, Opera, IE, Safari, and
every other browser, then they think it is a bug in Mozilla 1.7, and stop using
us. Support more standards is great, but only when it makes the user experience
better. When it makes things worse, it is a bad thing.

Note that not supporting this doesn't make us non-compliant, it just means we
don't support it. Since as far as I can tell nobody supports this (no browsers
do it right, no servers do it right), what's the point?
Comment 36 Christian :Biesinger (don't email me, ping me on IRC) 2004-05-01 05:55:05 PDT
(In reply to comment #34)
> This is not an arugment to implement the client also buggy. If the server is
> buggy, then the client may reject the server and display an alert asking the
> user whether to send an email to the server administrator to replace the buggy
> server by a server without bugs. 

1. that would annoy users
2. it is impossible to find the email address of the server administrator
3. it is impossible to detect whether the Content-Location the server sends is a
"good" one, or whether it is a "bad" one
Comment 37 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-05-01 09:49:41 PDT
> > and the fact that the HTTP spec's treatment of content-location [etc]

> If you can detect the problem but can not fix it, then simply reject such
> buggy pages

The point is, these pages ARE NOT BUGGY.  They are following the RFC to the
letter.  This header, if implemented by the server as specified in the RFC
(which it is in Apache) and if implemeneted in the client as specified in the
RFC (which it was in Mozilla) breaks all sorts of user expectations wrt to a web
site's URI.  Did you even read the bug I pointed to AND ITS DEPENDENCIES like I
asked?  Or did you just spew about standards on general principle?

The point, as I said in one of the bugs you clearly did not read, is that this
header is broken-by-design.  What the RFC specifies is simply not workable when
correctly implemented on both the server and client side.

> If such a necessary review of a standard blocks fixing

Then the bug is wontfix and a new bug should be filed if the standard is ever
updated.  Note that updating the standard would involve creating a new header
with a different name that does sorta what Content-Location does but is treated
differently by servers.  So the _HTTP/1.1_ header _Content-Location_ (what this
bug is about) will not be implemented no matter what.  I don't have time to deal
with yet another standards committee to resolve this issue; if you do, please
feel free to contact whoever is responsible for the HTTP RFC and talk to them
about the problem.

Ian, Apache does actually do this header right, which is what led us to find the
problem in the RFC in the first place.
Comment 38 Hixie (not reading bugmail) 2004-05-01 09:56:12 PDT
You're right; my bad.
Comment 39 Dave Townsend [:mossop] 2005-08-05 06:06:18 PDT
*** Bug 303552 has been marked as a duplicate of this bug. ***
Comment 40 Simon Kitching 2007-04-01 20:51:08 PDT
The post above from Boris on 2004-05-01 states that "the RFC is broken-by-design" but doesn't directly describe why. For future reference I believe the bugzilla entry that Boris refers to as describing the problem is 241981.

It's a real shame this functionality isn't available as it is extremely useful when internal forwards are occuring within a webserver. In particular JavaServer Faces applications perform internal forwards regularly; a post to /alpha/beta.jsf will often forward to /gamma/delta.jsp. The latter page often wants to reference resources using relative paths but cannot as the browser will resolve paths relative to /alpha (the last url it knew about), not /gamma. Sending a "redirect" is one solution but has significant implications, esp. with respect to "request-scope variables". The html BASE tag is of no use here as that requires a hostname and port which are not available to the code being executed. The Content-Location tag would solve this issue as it allows relative paths - and having this info in a header rather than an html tag is far more convenient too. Ah well...
Comment 41 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-04-01 20:59:25 PDT
> The html BASE tag is of no use here as that requires a hostname and port which
> are not available to the code being executed.

That's really odd; I can't think of a single sane server-side solution that doesn't know its own hostname and port.

But even if true, if you're willing to rely on JavaScript you can output JavaScript which will document.write() the relevant <base> tag.

Of course long-term the answer is still to get the HTTP folks to provide a better alternative....
Comment 42 Matthias Versen [:Matti] 2009-07-08 08:00:20 PDT
*** Bug 503078 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.