Closed Bug 109553 Opened 23 years ago Closed 20 years ago

[FIX]Implement support for HTTP 1.1 Content-location header

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
mozilla1.7alpha

People

(Reporter: andreas.otte, Assigned: bzbarsky)

References

Details

Attachments

(1 file)

Currently handling of HTTP 1.1 Header Content-Location from RFC 2616 is not
supported. It is similar to the handling of the deprecated Content-Base header,
but can also handle relative urls.

See bug 94096 about a related discussion based on the removal of Content-Base
support.
->Networking: HTTP
Assignee: attinasi → darin
Component: Layout → Networking: HTTP
QA Contact: petersen → tever
Sorry, this has to be owned by whoever owns the HTMLContentSink and that is not
networking, back to layout.
Assignee: darin → attinasi
Component: Networking: HTTP → Layout
QA Contact: tever → petersen
Harish do you own the content sink?
Assignee: attinasi → harishd
Target Milestone: --- → Future
from cvsblame it appears that jst owns that file.
Assignee: harishd → jst
Component: Layout → DOM HTML
QA Contact: petersen → stummala
Target Milestone: Future → ---
Mass-reassigning bugs.
Assignee: jst → dom_bugs
qa -> me
QA Contact: stummala → ian
->parser. this isn't dom.
Assignee: dom_bugs → harishd
Component: DOM HTML → Parser
*** Bug 230035 has been marked as a duplicate of this bug. ***
From the weblog of Mark Pilgrim
(<http://diveintomark.org/archives/2004/01/02/relative-uris>:

> Neither IE 6 SP1 nor Mozilla 1.6 Beta support the Content-Location: header,
> mainly because Microsoft web servers are so buggy
(<http://support.microsoft.com/default.aspx?scid=http://support.microsoft.com:80/support/kb/articles/q218/1/80.asp&NoWebContent=1>)

> that respecting the Content-Location: header would cause about 10% of
IIS-powered sites to break horribly
(<http://www.securityspace.com/s_survey/data/man.200312/firewalled_cloc.html>).

Maybe WONTFIX is good idea?
No. Opera supports this fine.
So what happens with multiple content-location headers as follows:

Content-Location: http://foo.com/bar/baz.html

<base href="http://bar.com/foo/" />
<meta http-equiv="content-location" content="bar.html" />

Do we end up with the document basically having a base of
http://bar.com/foo/bar.html ?  Or what? It seems to me that this whole thing is
badly underspecified...

That is, would it be sufficient so simply resolve the content-location relative
to whatever the current base URI is and set that as the new base URI?
Another quote <http://diveintomark.org/archives/2004/01/02/relative-uris>:

> Section 14.14 of RFC 2616 defines the Content-Location: HTTP header. If an
> HTML document is served without a BASE element but with a Content-Location:
> HTTP header, then that is the base URI (test page). Just to make this more
> interesting, Content-Location: may itself be a relative URI, in which case it
> is resolved according to RFC 2396, with the URI of the HTML document as its
> base URI. The resolved URI then serves as the base URI for other relative URIs
> within the HTML document.

It looks to me like the BASE element is more important, since the quote states
that if the BASE element is not set the content-location must be used.
Let's please stick to quoting specs, not blogs. Blogs do not have any normative
status and will only confuse matters further.

RFC2396, section 5.1. "Establishing a Base URI", is what defines the
interactions of the various levels:
   http://www.ietf.org/rfc/rfc2396

In the presence of multiple sources at the same level, I would suggest doing
what bz suggested, namely just resolving each base relative to the previous base
and then setting that new URI as the new base for future elements.
Hixie, Opera only respects the "Content-Location" header if "protocol, server and port (...) match", see <http://groups.google.com/groups?selm=721jbvgrk9et2fpdan5m729spcfqbvke71%404ax.com>.
Yes, that is a known bug. I'm dealing with the bookwire people to get them to 
resolve their problem.
Ian, you propose supporting relative values for <base href="">?  I don't believe
that's a good idea -- we don't support it now and neither do any other browsers;
furthermore the HTML spec clearly says the href must be an absolute URI.
No, indeed, I'd recommend keeping <base> working only for absolute URIs as per 
the spec.
This passes the testcase mentioned in bug 230035
Attachment #138422 - Flags: superreview?(darin)
Attachment #138422 - Flags: review?(jst)
Taking.
Assignee: harishd → bz-vacation
Priority: -- → P3
Summary: Implement support for HTTP 1.1 Content-location header → [FIX]Implement support for HTTP 1.1 Content-location header
Target Milestone: --- → mozilla1.7alpha
(Does that patch also implement the real HTTP Content-Location header? Or only
the META one? I hope it does both, because then that would mean our code was
well designed, but...)
Yeah, that'll do both headers and meta tags. Amazing isn't it? :-)
Comment on attachment 138422 [details] [diff] [review]
Patch to do what I suggested in comment 11

r=jst
Attachment #138422 - Flags: review?(jst) → review+
woo :-)
Comment on attachment 138422 [details] [diff] [review]
Patch to do what I suggested in comment 11

sr=darin
Attachment #138422 - Flags: superreview?(darin) → superreview+
Checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Something went wrong here. Take a look at bug 231072.
Nothing seems to have gone wrong that I can see, past a broken server....
Also broke a bank: bug 238626

I really wish this fix was backed out!! It's a dreadful experience to wait over
10 minutes to load pages. In the case of bug 238626 one also suffers from ever
more timeout alerts after 5-10 minutes have passed, but the page does continue
"loading"/spinning. 

I suspect any other "fix" causing 10 minute performance hit per page, would have
been made blockers and the offending code backed out.
The bug you are seeing is that we block on stylesheets. That is what should be
fixed, this particular fix just made it more visible in certain cases where we
were doing the wrong thing before.
When the page finally loads one ulcer later, it doesn't even USE a stylesheet.
It would have used one without this fix. So things got both slower and uglyer.

The problems this bug triggered just spawned bug 238654 btw, but more should
probably be filed.
I've backed out this patch due to all the servers that send bogus
content-location headers and the fact that the HTTP spec's treatment of
content-location for content-negotiated pages breaks anchor traversals on such
pages.  See bug 238654 and its various dependencies and dups.

The other option would be to implement what Opera implements, and that feels
like a lot of work for very low benefit, so I don't plan to do that.
I guess this is wontfix, instead.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 21 years ago20 years ago
Resolution: --- → WONTFIX
(In reply to comment #32)
> I've backed out this patch due to all the servers that send bogus
> content-location headers 

This is not an arugment to implement the client also buggy. If the server is
buggy, then the client may reject the server and display an alert asking the
user whether to send an email to the server administrator to replace the buggy
server by a server without bugs. It is not the job of the client to fix the bugs
of the server. If the server serves stuff that does not conform to the RFCs
instead of persenting an error message to the author, then this behaviour is a
bug in the server and does not target you, because your client simply has to
reject such buggy resonses.

> and the fact that the HTTP spec's treatment of
> content-location for content-negotiated pages breaks anchor traversals on such
> pages.  See bug 238654 and its various dependencies and dups.

This is also the responsibility of the page authors. If you can not detect the
problem automatically, then ignore it. It is not your duty to fix authors
errors. If you can automatically fix it, then you may implement an auto fix as
configurable goody. If you can detect the problem but can not fix it, then
simply reject such buggy pages and alert for whether to send an email to the
author or server administrator.
 
> The other option would be to implement what Opera implements, and that feels
> like a lot of work for very low benefit, so I don't plan to do that.

Emulation if other browsers may be an optional goody but nothing more. It is not
a duty. Duty is only to support and force the standards. If the standard is
buggy (i.e. unclear) then the standard must be reviewed and rewritten, which
results in a errata RFC or a completely new RFC.

If such a necessary review of a standard blocks fixing, then you should not set
the bug to WONTFIX but set the target milestone to FUTURE, because it is
unknown, when the fix will be done.
George: when people browser the Web with Mozilla 1.7 and find that thousands of
pages render incorrectly but work fine in Mozilla 1.6, Opera, IE, Safari, and
every other browser, then they think it is a bug in Mozilla 1.7, and stop using
us. Support more standards is great, but only when it makes the user experience
better. When it makes things worse, it is a bad thing.

Note that not supporting this doesn't make us non-compliant, it just means we
don't support it. Since as far as I can tell nobody supports this (no browsers
do it right, no servers do it right), what's the point?
(In reply to comment #34)
> This is not an arugment to implement the client also buggy. If the server is
> buggy, then the client may reject the server and display an alert asking the
> user whether to send an email to the server administrator to replace the buggy
> server by a server without bugs. 

1. that would annoy users
2. it is impossible to find the email address of the server administrator
3. it is impossible to detect whether the Content-Location the server sends is a
"good" one, or whether it is a "bad" one
> > and the fact that the HTTP spec's treatment of content-location [etc]

> If you can detect the problem but can not fix it, then simply reject such
> buggy pages

The point is, these pages ARE NOT BUGGY.  They are following the RFC to the
letter.  This header, if implemented by the server as specified in the RFC
(which it is in Apache) and if implemeneted in the client as specified in the
RFC (which it was in Mozilla) breaks all sorts of user expectations wrt to a web
site's URI.  Did you even read the bug I pointed to AND ITS DEPENDENCIES like I
asked?  Or did you just spew about standards on general principle?

The point, as I said in one of the bugs you clearly did not read, is that this
header is broken-by-design.  What the RFC specifies is simply not workable when
correctly implemented on both the server and client side.

> If such a necessary review of a standard blocks fixing

Then the bug is wontfix and a new bug should be filed if the standard is ever
updated.  Note that updating the standard would involve creating a new header
with a different name that does sorta what Content-Location does but is treated
differently by servers.  So the _HTTP/1.1_ header _Content-Location_ (what this
bug is about) will not be implemented no matter what.  I don't have time to deal
with yet another standards committee to resolve this issue; if you do, please
feel free to contact whoever is responsible for the HTTP RFC and talk to them
about the problem.

Ian, Apache does actually do this header right, which is what led us to find the
problem in the RFC in the first place.
You're right; my bad.
*** Bug 303552 has been marked as a duplicate of this bug. ***
The post above from Boris on 2004-05-01 states that "the RFC is broken-by-design" but doesn't directly describe why. For future reference I believe the bugzilla entry that Boris refers to as describing the problem is 241981.

It's a real shame this functionality isn't available as it is extremely useful when internal forwards are occuring within a webserver. In particular JavaServer Faces applications perform internal forwards regularly; a post to /alpha/beta.jsf will often forward to /gamma/delta.jsp. The latter page often wants to reference resources using relative paths but cannot as the browser will resolve paths relative to /alpha (the last url it knew about), not /gamma. Sending a "redirect" is one solution but has significant implications, esp. with respect to "request-scope variables". The html BASE tag is of no use here as that requires a hostname and port which are not available to the code being executed. The Content-Location tag would solve this issue as it allows relative paths - and having this info in a header rather than an html tag is far more convenient too. Ah well...
> The html BASE tag is of no use here as that requires a hostname and port which
> are not available to the code being executed.

That's really odd; I can't think of a single sane server-side solution that doesn't know its own hostname and port.

But even if true, if you're willing to rely on JavaScript you can output JavaScript which will document.write() the relevant <base> tag.

Of course long-term the answer is still to get the HTTP folks to provide a better alternative....
You need to log in before you can comment on or make changes to this bug.