Last Comment Bug 275689 - same-document references should work according to RFC 3986
: same-document references should work according to RFC 3986
Status: RESOLVED WONTFIX
: testcase
Product: Core
Classification: Components
Component: Networking (show other bugs)
: Trunk
: All All
: -- normal with 7 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 146441 270606 281463 659298 (view as bug list)
Depends on:
Blocks: 359190
  Show dependency treegraph
 
Reported: 2004-12-22 07:35 PST by Anne (:annevk)
Modified: 2016-01-26 13:19 PST (History)
21 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
fail (4 bytes, text/plain)
2004-12-22 07:38 PST, Anne (:annevk)
no flags Details
testcase (185 bytes, application/xml)
2004-12-22 07:39 PST, Anne (:annevk)
no flags Details
testcase for page without document URI (223 bytes, text/html)
2004-12-22 07:58 PST, Sjoerd Visscher
no flags Details
Link and From with empty URI (115 bytes, text/html)
2005-01-15 14:45 PST, Sjoerd Visscher
no flags Details

Description Anne (:annevk) 2004-12-22 07:35:15 PST
We are currently incorrectly resolving fragment identifiers. Per RFC 2396
section 4.2:

# A URI reference that does not contain a URI is a reference to the
# current document. In other words, an empty URI reference within a
# document is interpreted as a reference to the start of that document,
# and a reference containing only a fragment identifier is a reference
# to the identified fragment of that document. Traversal of such a
# reference should not result in an additional retrieval action.

Especially the last sentence is important.
Comment 1 Anne (:annevk) 2004-12-22 07:38:03 PST
Created attachment 169372 [details]
fail
Comment 2 Anne (:annevk) 2004-12-22 07:39:32 PST
Created attachment 169373 [details]
testcase
Comment 3 Sjoerd Visscher 2004-12-22 07:58:58 PST
Created attachment 169375 [details]
testcase for page without document URI

In fact, a fragment identifier should not be resolved at all. Usually this is
the same as resolving against the document URI, except when a document URI
doesn't exist. It should be applied to the current in-memory document. This is
important for documents that have been created with script, as in this
testcase.

This makes it a bit awkward to determine what should happen to the location.
(both in the DOM and in de location bar) The fragment identifier should
certainly be updated.
Comment 4 Boris Zbarsky [:bz] 2004-12-22 16:13:05 PST
That part of the URI spec doesn't make very much sense when different parts of the 
document have different base URIs, unfortunately...  The net result is that the behavior of 
fragment identifiers in various cases is rather underdefined and is often abused.  As a 
result there are compat issues involved with changing any aspect of it.

That said, please check for existing bugs on fragment identifiers when <base> is present.  
I'm pretty sure we have some.
Comment 5 Anne (:annevk) 2004-12-23 00:25:15 PST
There was bug 241981 comment 6. That bug was marked fixed however, since it was
depending on a different issue. 
Comment 6 Sjoerd Visscher 2004-12-23 05:38:45 PST
> That part of the URI spec doesn't make very much sense when different 
> parts of the document have different base URIs, unfortunately...

Why? It means that the base URI does not matter at all, so that makes a lot of
sense to me. 

Note that this only applies to fragment-id-only relative URIs. Not to URIs that
happen to be the same URI as the document.
Comment 7 Boris Zbarsky [:bz] 2004-12-23 22:28:43 PST
> Why? It means that the base URI does not matter at all, so that makes a lot of
> sense to me.

In an environment where a document is assembled from multiple pieces (one of the 
desiderata for XML), that makes no sense to me.  This is why XML allows setting the 
base URI on a per-element basis -- so elements can reference things relative to that 
base using relative URIs without having to hardcode the location of the document pieces 
in the pieces themselves (the xml:base insertion can happen at assembly time).  If the 
base URI is ignored for fragment ident resolution, how do you propose that this work?
Comment 8 Anne (:annevk) 2004-12-24 03:12:04 PST
Are there any examples of this already used on the web?

Anyway, the RFC mentioned in comment 0 is about constructs like |href="#foo"|.
If a link points to the current document, for example |href="foo#foo"| it has to
be resolved.
Comment 9 Sjoerd Visscher 2004-12-24 03:44:25 PST
It's the other way around: in an environment where a document is assembled from
multiple pieces, this bug has to be fixed!

Suppose you have wrapper.xhtml, which x-includes content.xhtml. Content.xhtml
contains an index with a link like: &lt;a href="#section1">. The include action
makes content.xhtml the base location for the index. If I click on the link in
the index in Mozilla, the link will take me to content.xhtml#section1 which is
clearly not intended!
Comment 10 Boris Zbarsky [:bz] 2004-12-24 23:48:23 PST
> Are there any examples of this already used on the web?

Yes, in any document that includes both the XUL and the XBL in a single file (and there 
are some floating around).

> If a link points to the current document, for example |href="foo#foo"|

Writing a link like that requires knowing the current document URI, though.

(In reply to comment #9)
The problem is that you want it to be possible for documents to make assumptions about 
how and from where they will be included and _im_possible for them to not make such 
assumptions...  From a general web architecture perspective, it is better if documents can 
avoid making such assumptions, so they are not limited in the ways in which they can be 
included.
Comment 11 Sjoerd Visscher 2004-12-25 02:24:23 PST
I keep seeing advantages where you see problems. Could you give an example (not
in words but in XML, just to be precise) where you think this will cause problems?

> so they are not limited in the ways in which they can be included.
Fixing this bug improves that. Same-document references are always used to
scroll to a different position. If you want to do this in a document that is
included from multiple other documents, it only works if you fix this bug.
Comment 12 Sjoerd Visscher 2004-12-25 02:30:06 PST
> > If a link points to the current document, for example |href="foo#foo"|

> Writing a link like that requires knowing the current document URI, though.

No, writing a link like that requires knowing the current *base* URI. (To be
clear: such links are not affected by fixing this bug, it's not a same-document
reference.)
Comment 13 Hixie (not reading bugmail) 2004-12-26 02:56:38 PST
According to RFC2396,

   <a href="">      - always references current document
   <form action=""> - always references base URI

(And since the fragment identifier is not part of the URI, "" and "#foo" are
always exactly equivalent as far as resolving a URI is concerned.)

Good luck implementing this without a headache.


HOWEVER, we already don't follow RFC2396 for resolving URIs. RFC2396 is not
compatible with a lot of content on the Web. We currently follow RFC1808, like
most modern browsers. I would recommend WONTFIX on the principle that we
shouldn't break compat with IE6.
Comment 15 Hixie (not reading bugmail) 2004-12-26 03:15:15 PST
Come to think of it, RFC2396 implies we'd use different URIs for:

   <link xlink:type="simple" xlink:href="">

...and

   <link xlink:type="embed" xlink:href="">

...which seems a bit silly. For example that would mean that in SVG the
following would be a link to the current document using an image from another 
document:

   <a xlink:href="#circle" xml:base="another-document">
     <image xlink:href="#circle" ...>
   </a>

...which seems a tad unintuitive. (Or would it? Does an embed could as something
that is "always intended to result in a new request"? Maybe it doesn't. Section
4.2 of RFC2396 is getting less and less clear the more I think of it.)
Comment 16 Boris Zbarsky [:bz] 2004-12-26 12:59:49 PST
(In reply to comment #11)
> I keep seeing advantages where you see problems

That's because you seem to think that fragment identifiers are only used for scrolling.  
They're used for a variety of other reasons too, including SVG resources, XBL bindings, 
etc, etc.  For many of these uses, using the current document is inappropriate -- for 
example XBL bindings are not parsed the same way as other XML documents, so putting 
the binding in the same document as your content, which we currently support and 
should continue to support absolutely requires reparsing the document....
Comment 17 Sjoerd Visscher 2004-12-27 05:25:27 PST
Re comment #13
> I would recommend WONTFIX on the principle that we shouldn't break compat with
IE6.

Note that we are breaking compat with IE6 when the base URI is unclear, like in
the 3rd attachment of this bug. I'm guessing we'd fix more pages than we'd
break, because people don't realize that adding a base href breaks internal links.

Re comment #15
Why would embed work any differently?

Re comment #16
> including SVG resources

OK, say I include a SVG resource, and it contains a same-document reference to a
path definition. Result: The SVG resource is retrieved again, instead of using
the included path definition. This bug should be fixed if you want to flexible
in ways SVG can be included.

> absolutely requires reparsing the document

... using the local source of the document! There's no need to resolve the URI
and use the cache or something like that. Again: not fixing this bug would limit
the use of embedded XBL bindings; you can not use it when there is a different
base URI.

And there have already been a few cases in practice where this bug (also in
other redering engines) made things complicated for me.

#1 HTML E-mail: I wanted to use the same XSLT template for online content and
E-mail content (so it is easy to mail a page to someone). To make all the links
work, I added a base href. But some pages also had an index of the page.
Clicking on a link in the index opened the browser, instead of scrolling to the
right location in the mail.
#2 Locally saved HTML. Same story actually.

To conclude: I can't think of any use case for same-document references being
resolved using the base URI.
Comment 18 Hixie (not reading bugmail) 2004-12-27 14:24:42 PST
> Note that we are breaking compat with IE6 when the base URI is unclear, like 
> in the 3rd attachment of this bug. I'm guessing we'd fix more pages than we'd
> break, because people don't realize that adding a base href breaks internal 
> links.

That's another bug, already filed separately.


> Why would embed work any differently?

See the spec. It explicitly says so.


> OK, say I include a SVG resource, and it contains a same-document reference to 
> a path definition. Result: The SVG resource is retrieved again, instead of 
> using the included path definition.

Not if the base URI is the same as the current document (which it is unless the
author changes it), since then the document is found to already be loaded and is
therefore simply reused.


> #1 HTML E-mail: I wanted to use the same XSLT template for online content and
> E-mail content (so it is easy to mail a page to someone).

Good lord. XSLT in HTML e-mail. What a heinous idea. This use case on its own is
the strongest argument so far for a WONTFIX.


> #2 Locally saved HTML. Same story actually.
> 
> To conclude: I can't think of any use case for same-document references being
> resolved using the base URI.

I agree with that, the problem is that the way the spec says to do it is simply
screwed up and wouldn't work in the real world. Handling the empty URI
differently based on context is just asking for trouble. If you can find a
better way to handle this, then I'm happy to listen, but the current proposal is
IMHO unworkable on the long run.
Comment 19 Anne (:annevk) 2004-12-27 14:27:41 PST
I believe Sjoerd talks about generating a HTML e-mail /with/ XSLT.
Comment 20 Sjoerd Visscher 2004-12-28 02:27:02 PST
(In reply to comment #18)
> That's another bug, already filed separately.

At least that one is fixed too if this is fixed. 
> 
> > Why would embed work any differently?
> 
> See the spec. It explicitly says so.

Do you mean this part: "However, if the URI reference occurs in a context that
is always intended to result in a new request, as in the case of HTML's FORM
element, then an empty URI reference represents the base URI of the current
document and should be replaced by that URI when transformed into a request."

I don't think that applies to embedding. FORM isn't even a real exception: You
either POST, or when you do a GET, the relative path automatically gets a query
part, so it's not a same-document reference anymore.

> 
> > OK, say I include a SVG resource, and it contains a same-document reference to 
> > a path definition. Result: The SVG resource is retrieved again, instead of 
> > using the included path definition.
> 
> Not if the base URI is the same as the current document (which it is unless the
> author changes it), since then the document is found to already be loaded and is
> therefore simply reused.

When the base URI is the same as the current document there's not a bug in any
of the cases mentioned here.

> > #1 HTML E-mail: I wanted to use the same XSLT template for online content and
> > E-mail content (so it is easy to mail a page to someone).
> 
> Good lord. XSLT in HTML e-mail. What a heinous idea. This use case on its own is
> the strongest argument so far for a WONTFIX.

:)
I meant server-side XSLT. This applies to any server-side method that tries to
generate html mail with the same code and templates as the online pages.
 
> > #2 Locally saved HTML. Same story actually.
> > 
> > To conclude: I can't think of any use case for same-document references being
> > resolved using the base URI.
> 
> I agree with that, the problem is that the way the spec says to do it is simply
> screwed up and wouldn't work in the real world. Handling the empty URI
> differently based on context is just asking for trouble. If you can find a
> better way to handle this, then I'm happy to listen, but the current proposal is
> IMHO unworkable on the long run.

My suggestion is to not look at the context at all. The empty URI is always the
current document, and a fragment-id-only URI is always a fragment of the current
document. FORM may look special, but it isn't, as explained above.
Comment 21 Boris Zbarsky [:bz] 2004-12-28 10:23:17 PST
(In reply to comment #17)

> ... using the local source of the document!

There is no such beastie; there is only cache (and network if the data is no
longer in cache).

Comment 22 Anne (:annevk) 2004-12-30 03:12:25 PST
I accidentally found this example today:
 <http://www.stadtaus.com/docu/gallery_script/index_en.html>

It shows exactly what people expect to happen. It is similar to attachment
169373 [details] only this is one from the real world. (They use the BASE element for
referring to IMG elements only it causes to break the rest of the document in
contradiction with RFC 2396.)
Comment 23 Hixie (not reading bugmail) 2004-12-30 11:34:21 PST
And the fact that it doesn't work in IE either suggests to me that we shouldn't
change our behaviour here.
Comment 24 Anne (:annevk) 2004-12-30 11:40:51 PST
So there is finally a case where we can be more standard compliant and support
something better than IE does and we do not, because we want to follow IE here?
Comment 25 Sjoerd Visscher 2004-12-30 12:18:27 PST
I have absolutely no problems with ignoring specs to save compatibility with IE.
But in this case I don't see the point. We really are fixing more existing pages
then we are breaking (...if any. Try to find a good use for the current
behavior, I couldn't find it.)

I couldn't care less if there was a work-around, but there isn't. If this stays
broken, there's simply *no way* to combine a base href or xml:base and
same-document references. In more detail: you could add the location of the
current document to your same-document links, but the whole point of a base href
and xml:base is exactly that you don't know the location of the current document.

What also isn't really mentioned yet is that the "fix" for bug 241981 can be
reverted. Boris says: "If all of our behavior is correct, the only option I see
is to back out support for content-location (perhaps leaving a comment
explaining that no one should ever try implementing it because it is
broken-by-design)".
Well, not all of our behaviour is correct, and there's no need to tell the world
to stop adoption of content-location headers.
Boris also mentions there that "We've had similar issues with not being able to
use fragment identifiers on pages where the document URI is something like
wyciwyg, by the way...."

So this really isn't just some freak side effect. It's about the essence of
same-document references.
Comment 26 Hixie (not reading bugmail) 2004-12-31 04:15:20 PST
We're already better than IE here, as the testcases in comment 14 show.

We _are_ compliant to a spec, as mentiond in comment 13. We're _already_
non-compliant with the new spec for other reasons, and that won't change, since
that _does_ break sites.

The new spec is extremely vague as to how it should be implemented, and there
are still unanswered questions (see comment 21 comment 16, comment 15, comment
13). There are also other problems, like the effect it should have on the DOM
(when you query a link for an absolute URI) and on window.location.

Yes, it would be great if attachment 169375 [details] worked. If someone can propose an
actual way of doing it that doesn't make matters worse, then I'd be all for it.
But we need an actual proposal on how to do it.
Comment 27 Sjoerd Visscher 2005-01-10 13:46:26 PST
Here's another example of this bug: Same-document references in Google Cache
pages don't work, because Google adds a base href to the document.

I agree that correctly solving this bug is complicated by details. Therefore I
propose we fix this bug by resolving same-document references with the document
uri. Any missing details should get their own bug. (If they haven't already been
filed, like the one Ian mentioned in comment #18.)

The wayback machine has the same problem, but oddly enough it works in Mozilla.
(Not in IE.) That may be a another uri resolving bug caused by the odd format of
the addresses, f.e.
http://web.archive.org/web/20040202074232/www.w3.org/TR/REC-html40/intro/intro.html
Comment 28 Boris Zbarsky [:bz] 2005-01-10 14:33:23 PST
> Therefore I propose we fix this bug by resolving same-document references with
> the document uri.

This isn't really a feasible approach -- it would require playing major
whack-a-mole, since the existing apis for resolving URI are frozen, so we'd have
to hardcode this at every single place where we resolve a URI.
Comment 29 Sjoerd Visscher 2005-01-10 14:43:23 PST
Is the api frozen, or also it's semantics? Because this wouldn't need an
interface change, only a change in the semantics.
Comment 30 Boris Zbarsky [:bz] 2005-01-10 14:46:54 PST
Both are frozen.  And yes, it would need an interface change, since resolving a
string to a URI would require three items -- string, document URI, base URI. 
Unless every single caller ends up having to do a check for URIs starting with
'#', or something equally silly.
Comment 31 Hixie (not reading bugmail) 2005-01-11 07:20:22 PST
If we do this at all, I would recommend doing one thing, and one thing only:

   When the user activates an <html:a href=""> link, if it starts with a "#" 
   character, just go to that location without doing a reload or anything else.

Don't fix it for XLinks, don't fix it for <link href="">, don't fix it for SVG,
definitely don't fix it for url(), don't fix it for _anything_ except <html:a>.
Comment 32 Boris Zbarsky [:bz] 2005-01-11 09:57:20 PST
Which means that we'd inconsistently resolve URIs, effectively?  That's not
happening, sorry.
Comment 33 Hixie (not reading bugmail) 2005-01-11 10:25:19 PST
I don't see any other way to do it, so I guess this is WONTFIX.

It does suck that we don't handle links well in document.written() pages,
though. (Along with the other cases raised on this bug.)
Comment 34 Sjoerd Visscher 2005-01-11 10:27:23 PST
Technically what Ian proposes is not "inconsistently resolve URIs". It does not
resolve URIs at all in a special case, i.e. bypassing the URI apis.

What about extending the api?
http://www.mozilla.org/projects/embedding/rev-interfaces.html#Extending
Comment 35 Sjoerd Visscher 2005-01-13 13:16:08 PST
How could we all have forgotten about rfc2396bis?
<a
href="http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html#same-document">It
defines same-document references completely differently</a>. The effect is the
same, or even better as more references are considered to be same-document
references. I think it also fits the current implementation much better.

At least the URI api doesn't have to change, but I doubt it will make the
implementation any easier. Bis sais: "When a same-document reference is
dereferenced for the purpose of a retrieval action, the target of that reference
is defined to be within the same entity (representation, document, or message)
as the reference; therefore, a dereference should not result in a new retrieval
action."

I hope this makes sense and that this is unifiable with how currently
same-document references are handled when there's no base ref, as they don't do
a new retrieval action either.
Comment 36 Sjoerd Visscher 2005-01-13 13:27:39 PST
I'm thinking the bug summary should be rephrased as something like: "The
decision to navigate to a different *document* is now made by comparing the
current document location with the link location. Instead the comparison should
be made by comparing the current *base* location with the link location."

It seems to be somewhat unintuitive, but I think this is how it should work.
Comment 37 Boris Zbarsky [:bz] 2005-01-13 14:44:28 PST
(In reply to comment #35)
What is a "retrieval action"?  Does this mean that URIs are resolved differently
depending on how you plan to use them?  Sorry, but that's no good either.

(In reply to comment #36)
What is "current base location" in a document in which every single node has a
different base URI?
Comment 38 Sjoerd Visscher 2005-01-13 15:06:24 PST
Good points. 

About "retrieval action". There's nothing you need to change here. Currently
there are times, including when clicking a link, where the reference is compared
to the document location and it is somehow decided to use the current document.
The only change is that the reference should similarly be compared to the base
location.

About which base location to use: it's the base location of the node containing
the same-document reference. (Easy answer, hard to implement probably.)
Comment 39 Boris Zbarsky [:bz] 2005-01-13 19:39:29 PST
So linking to an image is not a "retrieval action"?  What about doing a form
submission?

How is <a href="#foo" target="bar"> to be handled?

> About which base location to use: it's the base location of the node
> containing the same-document reference.

This wouldn't be too bad to do, actually, but it would mean, for example, that
one can't get the href of a link, set window.location to it, and have it do the
same thing as clicking a link... That would break some existing content, for sure.
Comment 40 Sjoerd Visscher 2005-01-15 06:05:36 PST
Just ask yourself the question: what happens if there is no base location (or
the base location is the same as the document location.)

Same-document references in combination with: 1. a base href or 2. no document
location is the main point of this bug. Changing anything else based on the new
rfc doesn't seem like a good idea to me.

Setting window.location should imho have the same semantics as just following a
link. Currently when you set window.location to a URI that is the same as the
document location (apart from the fragment identifier), only the fragment
identifier is changed. The same should happen when you set window.location to a
URI that is the same as the base location (again apart from the fragment
identifier).

A way to think about what should happen in each case is to remember that if you
have a working document without a base href, you should be able to move it
somewhere else, and add a base href equal to the original document location and
still have everything working. (Take the Google cache as an example.) You should
also be able to include it into another document, and have everything working if
you set xml:base on the included part.

(Taking this to an extreme would mean that in script window.location should
actually contain the base location of the script element.)
Comment 41 Sjoerd Visscher 2005-01-15 07:08:11 PST
*** Bug 146441 has been marked as a duplicate of this bug. ***
Comment 42 Boris Zbarsky [:bz] 2005-01-15 09:30:29 PST
> Same-document references in combination with: 1. a base href or 2. no document
> location is the main point of this bug. 

The whole point is that these concepts are not particularly well-defined...  Is
the example in comment 39 a same-document reference if "bar" is the frame the
anchor is in, for example?

In brief, it sounds like we're trying to have different behavior for identical
markup based on ambiguous criteria.  This is rather hard to implement in any
sort of logical setup, for obvious reasons.

Breaking window.location based on the base href is simply not an option.

Neither is treating two different URIs (document base URI and document URI)
identically when they are loaded an option.

I'm still tempted to mark this wontfix, due to a lack of clear understanding by
anyone, as far as I can see, of what the behavior should be.
Comment 43 Boris Zbarsky [:bz] 2005-01-15 09:36:40 PST
Oh, and also of note, see bug 146441 comment 6 and bug 146441 comment 8.  Note
that those are both retrieval actions, and are implemented in the same code in
Mozilla.  I see no reason for the arbitrary distinction the RFC makes between
those two cases, past the fact that they realized that breaking URI resolution
like they did would break real-world pages that used forms.
Comment 44 Sjoerd Visscher 2005-01-15 14:45:15 PST
Created attachment 171374 [details]
Link and From with empty URI

Guess what: a href="" does follow the base href, but form action="" doesn't.
Apparently the code is not that equal.
Comment 45 Sjoerd Visscher 2005-01-15 15:33:31 PST
> How is &lt;a href="#foo" target="bar"> to be handled?

First resolve the href like it is done now (using the base URI of the link
node). Compare it with the base URI of the link node. If they, aside from the
fragment identifier, match then this is a same-document reference. When clicked
it should set the location of the target window to the document location of the
current window, with the fragment identifier of the link. This works if the
target window happens to be the current window.

You are right that window.location shouldn't change. But that doesn't break any
exisiting pages. The only effect would be that currently broken pages because of
this bug, will still be broken when this bug is fixed if they use script to set
window.location. (Instead, <a
href="http://annevankesteren.nl/archives/2005/01/fragment-identifiers#comment-2974">people
now have to write script</a> to just get links working correctly.)
Comment 46 Boris Zbarsky [:bz] 2005-01-15 23:44:35 PST
> Guess what: a href="" does follow the base href, but form action="" doesn't.

That's a quirk thing for HTML documents only.  See comment in the code at
http://lxr.mozilla.org/seamonkey/source/content/html/content/src/nsHTMLFormElement.cpp#1327
and code following.  Note that this does NOT apply to href="#whatever".  I'd
rather not scatter this sort of crap all over the rest of our code, yes.  ;)

> First resolve the href like it is done now (using the base URI of the link
> node). Compare it with the base URI of the link node. If they, aside from the
> fragment identifier, match then this is a same-document reference.  When
> clicked it should set the location of the target window to the document
> location of the current window, with the fragment identifier of the link.

That's not how this situations behaves in practice; such code is often used to
scroll one frame from another one (generally a table of contents).  Breaking
this is not really acceptable -- a lot of websites depend on it.

Please do try to propose solutions that are not self-contradictory and don't
break existing content, ok?
Comment 47 Sjoerd Visscher 2005-01-16 13:27:29 PST
> Please do try to propose solutions that are not self-contradictory 
> and don't break existing content, ok?

Did you mean <a href="#foo" target="bar"> together with a base href?
Comment 48 Boris Zbarsky [:bz] 2005-01-16 15:52:58 PST
I mean that whatever solution you propose needs to handle all the possible
cases, including base hrefs, targeted links, links where just an anchor name is
listed, links with relative URIs, links with absolute URIs, form submission,
etc, in a reasonably consistent way.  The solution should also not break
significant amounts of existing content.
Comment 49 Sjoerd Visscher 2005-01-17 01:57:02 PST
If <a href="#foo" target="bar"> together with a base href is abused for TOCs,
then it's obvious you can't fix this bug in quircks mode, only in standards
compliant mode.
Comment 50 Boris Zbarsky [:bz] 2005-01-17 08:34:36 PST
It's used without a base href.  Who said anything about a base href?

And it's commonly used in standards-mode pages, since all browsers implement it
interoperably.

So what's the benefit of implementing a complicated, self-contradictory spec
that is incompatible with pretty much all the existing content that would be
affected by it?

Keep in mind -- just because someone called something a spec doesn't mean it
needs to be implemented...
Comment 51 Anne (:annevk) 2005-01-26 01:06:32 PST
The document just had an update: <http://www.ietf.org/rfc/rfc3986.txt>.
Comment 52 Boris Zbarsky [:bz] 2005-01-26 08:45:44 PST
So... the text quoted in comment 0 is gone.  The "abnormal resolution" examples
in RFC 3986 clearly indicate that '#foo' is resolved relative to the base URI. 
The long discussion of fragment identifiers makes it quite clear that they are
resolved...
Comment 53 Sjoerd Visscher 2005-01-29 08:23:29 PST
RFC 3986 and RFC 2396bis are the same. The text from comment 0 was already gone
in 2396bis. Let me quote a bit from the list of changes between 3986 and 2396:

   o  The determination of whether a URI reference is a same-document
      reference has been decoupled from the URI parser, simplifying the
      URI processing interface within applications in a way consistent
      with the internal architecture of deployed URI processing
      implementations.  The determination is now based on comparison to
      the base URI after transforming a reference to absolute form,
      rather than on the format of the reference itself.  This change
      may result in more references being considered "same-document"
      under this specification than there would be under the rules given
      in RFC 2396, especially when normalization is used to reduce
      aliases.  However, it does not change the status of existing
      same-document references.

That last sentence is important!

I think we need some pseudo code to make things precise.
(I used ECMAScript syntax and the names from nsIURI and nsIURL)

We have:
- relURI, the (possibly relative) URI of the link
- baseURI, an absolute URI, the base URI of the part of the document that
contains the link
- docURI, an absolute URI, the URI of the current document
- resultURI, an absolute URI, that is used for the final retrieval

/**
 * When a URI reference refers to a URI that is, aside from its fragment
 * component (if any), identical to the base URI (Section 5.1), that
 * reference is called a "same-document" reference.
 */
absURI = baseURI.resolve(relURI);
absURINoFragment = absURI.clone();
absURINoFragment.ref = "";
baseURINoFragment = baseURI.clone();
baseURINoFragment.ref = "";

if (absURINoFragment.equals(baseURINoFragment))
{
  /** 
   * This is a same-document reference!
   * When a same-document reference is dereferenced for a retrieval
   * action, the target of that reference is defined to be within the same
   * entity (representation, document, or message) as the reference;
   */
  resultURI = docURI.resolve(relURI);
}
else
{
  resultURI = absURI;
}

That's it. Now the last part from RFC 3986: "therefore, a dereference should not
result in a new retrieval action." is the most confusing. But I believe that
code is already in place; same-document references work fine when there is no
base href.

This code should run at the moment of dereference. (I.e. when the link is
clicked etc.) But f.e. when the href property of a link in Javascript is read
then the relURI can simply be resolved using the baseURI, like it is now. (This
also means that this code must also run on values that are assigned to
window.location, and no that doesn't brake existing code.)

(Note: this may not be completely compatible with comments I made before. Please
use this pseudo code as a basis for further discussion.)
Comment 54 Boris Zbarsky [:bz] 2005-01-29 10:33:50 PST
> However, it does not change the status of existing same-document references.

I don't see anything in RFC 3986 defining said same-document references, though.
 Could you please point to that part of the RFC?

> - docURI, an absolute URI, the URI of the current document

Which document is the "current document"?

>  resultURI = docURI.resolve(relURI);

Where is this code being run?

> (This also means that this code must also run on values that are assigned to
> window.location

It's not clear to me how that works, since at least one of the 4 objects your
code depends on is missing in that case.
Comment 55 Sjoerd Visscher 2005-01-29 13:10:14 PST
(In reply to comment #54)
> I don't see anything in RFC 3986 defining said same-document references,
> though. Could you please point to that part of the RFC?

Section 4.4. http://asg.web.cmu.edu/rfc/rfc3986.html#sec-4.4
 
> Which document is the "current document"?

The document the link is in.

> >  resultURI = docURI.resolve(relURI);
> 
> Where is this code being run?

"Where" is up to the person fixing this bug. "When" is at the time of
dereferencing. 
 
> > (This also means that this code must also run on values that are assigned to
> > window.location
> 
> It's not clear to me how that works, since at least one of the 4 objects your
> code depends on is missing in that case.

There are 3 objects, the 4th is the result. But you are right, the baseURI is
missing. But - thinking about it a little more - that doesn't really matter. It
means that nothing at all changes as far as Javascript is concerned. So the
worst that could happen is the rare case that pages that are currently broken
and use Javascript in a weird way will remain broken.
Comment 56 Boris Zbarsky [:bz] 2005-01-29 13:21:13 PST
> > Which document is the "current document"?
> The document the link is in.

That breaks a use case that is commonly used, as I pointed out in comment 46.

> "Where" is up to the person fixing this bug. "When" is at the time of
> dereferencing. 

There is no "time of dereferencing".  There is a "time when the link URI is
resolved to an absolute URI" and a "time when the absolute URI is loaded".  The
two happen at different times; at the latter time the base URI of the link may
be different from what it was at the former time.

> It means that nothing at all changes as far as Javascript is concerned.

Sure something changes.  The basic invariant that clicking a link with no
"target" attribute set and setting window.location.href to the .href of the link
have the same result is violated.
Comment 57 Sjoerd Visscher 2005-01-29 16:24:35 PST
(In reply to comment #56)
> > > Which document is the "current document"?
> > The document the link is in.
> 
> That breaks a use case that is commonly used, as I pointed out in comment 46.

Then I don't understand your use case. Could you elaborate?
 
> > "Where" is up to the person fixing this bug. "When" is at the time of
> > dereferencing. 
> 
> There is no "time of dereferencing".  There is a "time when the link URI is
> resolved to an absolute URI" and a "time when the absolute URI is loaded".  The
> two happen at different times; at the latter time the base URI of the link may
> be different from what it was at the former time.

When a base URI changes, all relative links should be reresolved. If that
doesn't happen those links are broken already and that doesn't change whether
this bug is fixed or not.

> > It means that nothing at all changes as far as Javascript is concerned.
> 
> Sure something changes.  The basic invariant that clicking a link with no
> "target" attribute set and setting window.location.href to the .href of the link
> have the same result is violated.

The invariant is only violated is pages which are currently broken allready
because of this bug. What's the problem?
Comment 58 Sjoerd Visscher 2005-01-29 16:28:44 PST
I reread your use case and I think you mean there a link <a href="#foo"
target="bar"> in say a.html, and there's a document b.html in the bar frame. And
then clicking that link should change the url of the bar frame to b.html#foo.

But I created a testpage and that doesn't happen. The url becomes a.html#foo.
Comment 59 Boris Zbarsky [:bz] 2005-01-30 08:53:58 PST
> Then I don't understand your use case. Could you elaborate?

Hmm...  I had a use case this broke; I'll try to recreate it.

> When a base URI changes, all relative links should be reresolved.

They are.  But I don't think you understood.  When a link is clicked, we resolve
the URI to load, then post an event to load it.  When the event fires, we load
it.  Your system assumes that the "resolve" and "load" stage happen at the same
time, but they do not.

> The invariant is only violated is pages which are currently broken allready
> because of this bug.

The point is, the invariant is violated.  Any time that happens, pages that
depend on the invariant will break.  Maybe that's acceptable in this case
because there are very few such pages; I'd have to see numbers to make that call.
Comment 60 Sjoerd Visscher 2005-01-31 09:16:17 PST
> > When a base URI changes, all relative links should be reresolved.
> 
> They are.  But I don't think you understood.  When a link is clicked, we resolve
> the URI to load, then post an event to load it.  When the event fires, we load
> it.  Your system assumes that the "resolve" and "load" stage happen at the same
> time, but they do not.

Maybe you can point to the implementations of these stages (for hyperlinks as an
example)? (I can't write C, but the Mozilla code is usually very readable.)
 
> > The invariant is only violated is pages which are currently broken allready
> > because of this bug.
> 
> The point is, the invariant is violated.  Any time that happens, pages that
> depend on the invariant will break.  Maybe that's acceptable in this case
> because there are very few such pages; I'd have to see numbers to make that call.

It is acceptable because from the pages that depend on the invariant, only the
ones that *are already broken* will break.
Comment 61 Boris Zbarsky [:bz] 2005-01-31 09:23:56 PST
> Maybe you can point to the implementations of these stages

Start at nsGenericHTMLElement::HandleDOMEventForAnchors at:
http://lxr.mozilla.org/seamonkey/source/content/html/content/src/nsGenericHTMLElement.cpp#1437

Go through the NS_UI_ACTIVATE case to nsGenericElement::TriggerLink at
http://lxr.mozilla.org/seamonkey/source/content/base/src/nsGenericElement.cpp#3151

This posts an event in nsWebShell::OnLinkClick.

When the event fires, we end up in nsWebShell::OnLinkClickSync which proceeds to
nsDocShell::InternalLoad
Comment 62 Sjoerd Visscher 2005-01-31 16:11:28 PST
(In reply to comment #61)
> Go through the NS_UI_ACTIVATE case to nsGenericElement::TriggerLink at
> http://lxr.mozilla.org/seamonkey/source/content/base/src/nsGenericElement.cpp#3151

Thanks, this helps a lot.

The NS_UI_ACTIVATE case seems to be the right spot. All the needed data is
available. TriggerLink does some security stuff, so there the URI needs to be
the real one. And most of the code before it is DOM related, so there the URI
still needs the be the one resolved to the base href.

The first line of the pseudo code can be skipped, because hrefURI already is
resolved to the baseURI. (Although it doesn't have to be skipped if that might
be convenient.)

It probably needs to be a utility function, as it is going to be called from
several different places, perhaps something like:

nsCOMPtr<nsIURI> retrievalURI = nsContentUtils::convertURIIfSameDocumentRef(
                                  hrefURI, GetOwnerDoc(), baseURI);

(OT: I see that the baseURI is passed as aOriginURI to
securityManager->CheckLoadURI. This sounds like it would be possible to
circumvent certain security restrictions by simply setting the baseURI to a more
trusted domain.)
Comment 63 Boris Zbarsky [:bz] 2005-01-31 16:24:39 PST
Note that not all nsIURIs support fragments.... This is an ongoing issue with
Mozilla code, unfortunately.

Again, this is just the code for clicked-on HTML links.  This doesn't handle
form submissions (how should those behave?), XLinks, etc.  Those are elsewhere
in the code, as you probably noticed.  So if we decide we want to do this, we'd
need an exhaustive list of places where URIs should be treated in this weird way.

As for baseURI, that URI is already security-checked.  See
nsDocument::SetBaseURI and nsGenericElement::GetBaseURI.
Comment 64 :Gavin Sharp [email: gavin@gavinsharp.com] 2005-02-07 20:05:08 PST
*** Bug 281463 has been marked as a duplicate of this bug. ***
Comment 65 :Gavin Sharp [email: gavin@gavinsharp.com] 2005-02-07 20:05:48 PST
*** Bug 270606 has been marked as a duplicate of this bug. ***
Comment 66 Arthur 2005-03-22 06:05:59 PST
I've just came along a real world example where this is a problem. The page
(http://m-portal.migros.ch/de/desktopdefault.aspx/tabid-138/) has a base URI
different to the URL it was loaded from. At the end of the document it says
<script language="javascript" type="text/javascript">
	location.href="#p";
</script>
with the intention to jump to #p in the already loaded document when having
finished loading. Konqueror and Mozilla go to BASE#p, IE 6 just jumps to #p in
the already loaded document. From the RFCs and the discussions here I'm not sure
what the "de jure" behaviour would be. Is <script .../> similar to "action" or
is this an internal link which shouldn't result in a refetch?
Comment 67 Arthur 2005-03-23 01:06:53 PST
The URL given in the last comment now works as intended by the designers. They
have added
<script type="text/javascript">var
baseurl='http://m-portal.migros.ch/de/';tabid=138;</script>
Comment 68 Darin Fisher 2006-06-21 16:11:18 PDT
-> default owner
Comment 69 Hixie (not reading bugmail) 2007-02-28 14:41:02 PST
It's not clear to me what the specs say, what browseers do, or what authors need. It's also not clear to me what the specs should say, browsers should do, and authors should write.

If anyone wants to go ahead and fix this mess -- either in specs or by fixing browsers -- I encourage them to do so.
Comment 70 Hixie (not reading bugmail) 2008-10-28 17:55:33 PDT
HTML5 handles the action="" case and navigation to frag IDs specially now. The rest is expected to work per RFC3986. Please let me know if this causes new problems that I can help fix.
Comment 71 Mardeg 2011-05-24 07:08:42 PDT
*** Bug 659298 has been marked as a duplicate of this bug. ***
Comment 72 Patrick McManus [:mcmanus] 2016-01-26 12:08:16 PST
anne, I'm going to wontfix this based on webcompat.. if there should be a concrete alternative action plan please reopen and ni valentin.
Comment 73 Anne (:annevk) 2016-01-26 13:19:13 PST
I agree that we should not do this. Thanks for closing.

Note You need to log in before you can comment on or make changes to this bug.