275689 - same-document references should work according to RFC 3986

Reporter

Description

•

21 years ago

We are currently incorrectly resolving fragment identifiers. Per RFC 2396 section 4.2: # A URI reference that does not contain a URI is a reference to the # current document. In other words, an empty URI reference within a # document is interpreted as a reference to the start of that document, # and a reference containing only a fragment identifier is a reference # to the identified fragment of that document. Traversal of such a # reference should not result in an additional retrieval action. Especially the last sentence is important.

Anne (:annevk)

Reporter

Comment 1

•

21 years ago

Attached file fail — Details

Anne (:annevk)

Reporter

Comment 2

•

21 years ago

Attached file testcase — Details

Anne (:annevk)

Reporter

Updated

•

21 years ago

Keywords: testcase

Sjoerd Visscher

Comment 3

•

21 years ago

Attached file testcase for page without document URI — Details

In fact, a fragment identifier should not be resolved at all. Usually this is the same as resolving against the document URI, except when a document URI doesn't exist. It should be applied to the current in-memory document. This is important for documents that have been created with script, as in this testcase. This makes it a bit awkward to determine what should happen to the location. (both in the DOM and in de location bar) The fragment identifier should certainly be updated.

Anne (:annevk)

Reporter

Updated

•

21 years ago

Summary: fragment identifiers should be resolved against the document URI → fragment identifiers should not be resolved at all

Boris Zbarsky [:bzbarsky]

Comment 4

•

21 years ago

That part of the URI spec doesn't make very much sense when different parts of the document have different base URIs, unfortunately... The net result is that the behavior of fragment identifiers in various cases is rather underdefined and is often abused. As a result there are compat issues involved with changing any aspect of it. That said, please check for existing bugs on fragment identifiers when <base> is present. I'm pretty sure we have some.

Whiteboard: DUPEME

Anne (:annevk)

Reporter

Comment 5

•

21 years ago

There was bug 241981 comment 6. That bug was marked fixed however, since it was depending on a different issue.

Sjoerd Visscher

Comment 6

•

21 years ago

> That part of the URI spec doesn't make very much sense when different > parts of the document have different base URIs, unfortunately... Why? It means that the base URI does not matter at all, so that makes a lot of sense to me. Note that this only applies to fragment-id-only relative URIs. Not to URIs that happen to be the same URI as the document.

Boris Zbarsky [:bzbarsky]

Comment 7

•

21 years ago

> Why? It means that the base URI does not matter at all, so that makes a lot of > sense to me. In an environment where a document is assembled from multiple pieces (one of the desiderata for XML), that makes no sense to me. This is why XML allows setting the base URI on a per-element basis -- so elements can reference things relative to that base using relative URIs without having to hardcode the location of the document pieces in the pieces themselves (the xml:base insertion can happen at assembly time). If the base URI is ignored for fragment ident resolution, how do you propose that this work?

Anne (:annevk)

Reporter

Comment 8

•

21 years ago

Are there any examples of this already used on the web? Anyway, the RFC mentioned in comment 0 is about constructs like |href="#foo"|. If a link points to the current document, for example |href="foo#foo"| it has to be resolved.

Sjoerd Visscher

Comment 9

•

21 years ago

It's the other way around: in an environment where a document is assembled from multiple pieces, this bug has to be fixed! Suppose you have wrapper.xhtml, which x-includes content.xhtml. Content.xhtml contains an index with a link like: <a href="#section1">. The include action makes content.xhtml the base location for the index. If I click on the link in the index in Mozilla, the link will take me to content.xhtml#section1 which is clearly not intended!

Boris Zbarsky [:bzbarsky]

Comment 10

•

21 years ago

> Are there any examples of this already used on the web? Yes, in any document that includes both the XUL and the XBL in a single file (and there are some floating around). > If a link points to the current document, for example |href="foo#foo"| Writing a link like that requires knowing the current document URI, though. (In reply to comment #9) The problem is that you want it to be possible for documents to make assumptions about how and from where they will be included and _im_possible for them to not make such assumptions... From a general web architecture perspective, it is better if documents can avoid making such assumptions, so they are not limited in the ways in which they can be included.

Sjoerd Visscher

Comment 11

•

21 years ago

I keep seeing advantages where you see problems. Could you give an example (not in words but in XML, just to be precise) where you think this will cause problems? > so they are not limited in the ways in which they can be included. Fixing this bug improves that. Same-document references are always used to scroll to a different position. If you want to do this in a document that is included from multiple other documents, it only works if you fix this bug.

Sjoerd Visscher

Comment 12

•

21 years ago

> > If a link points to the current document, for example |href="foo#foo"| > Writing a link like that requires knowing the current document URI, though. No, writing a link like that requires knowing the current *base* URI. (To be clear: such links are not affected by fixing this bug, it's not a same-document reference.)

Summary: fragment identifiers should not be resolved at all → same-document references should not be resolved at all

Hixie (not reading bugmail)

Comment 13

•

21 years ago

According to RFC2396, <a href=""> - always references current document <form action=""> - always references base URI (And since the fragment identifier is not part of the URI, "" and "#foo" are always exactly equivalent as far as resolving a URI is concerned.) Good luck implementing this without a headache. HOWEVER, we already don't follow RFC2396 for resolving URIs. RFC2396 is not compatible with a lot of content on the Web. We currently follow RFC1808, like most modern browsers. I would recommend WONTFIX on the principle that we shouldn't break compat with IE6.

Hixie (not reading bugmail)

Comment 14

•

21 years ago

TESTCASES http://www.hixie.ch/tests/adhoc/uri/001.html http://www.hixie.ch/tests/adhoc/uri/002.html

Hixie (not reading bugmail)

Comment 15

•

21 years ago

Come to think of it, RFC2396 implies we'd use different URIs for: <link xlink:type="simple" xlink:href=""> ...and <link xlink:type="embed" xlink:href=""> ...which seems a bit silly. For example that would mean that in SVG the following would be a link to the current document using an image from another document: <a xlink:href="#circle" xml:base="another-document"> <image xlink:href="#circle" ...> </a> ...which seems a tad unintuitive. (Or would it? Does an embed could as something that is "always intended to result in a new request"? Maybe it doesn't. Section 4.2 of RFC2396 is getting less and less clear the more I think of it.)

Boris Zbarsky [:bzbarsky]

Comment 16

•

21 years ago

(In reply to comment #11) > I keep seeing advantages where you see problems That's because you seem to think that fragment identifiers are only used for scrolling. They're used for a variety of other reasons too, including SVG resources, XBL bindings, etc, etc. For many of these uses, using the current document is inappropriate -- for example XBL bindings are not parsed the same way as other XML documents, so putting the binding in the same document as your content, which we currently support and should continue to support absolutely requires reparsing the document....

Sjoerd Visscher

Comment 17

•

21 years ago

Re comment #13 > I would recommend WONTFIX on the principle that we shouldn't break compat with IE6. Note that we are breaking compat with IE6 when the base URI is unclear, like in the 3rd attachment of this bug. I'm guessing we'd fix more pages than we'd break, because people don't realize that adding a base href breaks internal links. Re comment #15 Why would embed work any differently? Re comment #16 > including SVG resources OK, say I include a SVG resource, and it contains a same-document reference to a path definition. Result: The SVG resource is retrieved again, instead of using the included path definition. This bug should be fixed if you want to flexible in ways SVG can be included. > absolutely requires reparsing the document ... using the local source of the document! There's no need to resolve the URI and use the cache or something like that. Again: not fixing this bug would limit the use of embedded XBL bindings; you can not use it when there is a different base URI. And there have already been a few cases in practice where this bug (also in other redering engines) made things complicated for me. #1 HTML E-mail: I wanted to use the same XSLT template for online content and E-mail content (so it is easy to mail a page to someone). To make all the links work, I added a base href. But some pages also had an index of the page. Clicking on a link in the index opened the browser, instead of scrolling to the right location in the mail. #2 Locally saved HTML. Same story actually. To conclude: I can't think of any use case for same-document references being resolved using the base URI.

Hixie (not reading bugmail)

Comment 18

•

21 years ago

> Note that we are breaking compat with IE6 when the base URI is unclear, like > in the 3rd attachment of this bug. I'm guessing we'd fix more pages than we'd > break, because people don't realize that adding a base href breaks internal > links. That's another bug, already filed separately. > Why would embed work any differently? See the spec. It explicitly says so. > OK, say I include a SVG resource, and it contains a same-document reference to > a path definition. Result: The SVG resource is retrieved again, instead of > using the included path definition. Not if the base URI is the same as the current document (which it is unless the author changes it), since then the document is found to already be loaded and is therefore simply reused. > #1 HTML E-mail: I wanted to use the same XSLT template for online content and > E-mail content (so it is easy to mail a page to someone). Good lord. XSLT in HTML e-mail. What a heinous idea. This use case on its own is the strongest argument so far for a WONTFIX. > #2 Locally saved HTML. Same story actually. > > To conclude: I can't think of any use case for same-document references being > resolved using the base URI. I agree with that, the problem is that the way the spec says to do it is simply screwed up and wouldn't work in the real world. Handling the empty URI differently based on context is just asking for trouble. If you can find a better way to handle this, then I'm happy to listen, but the current proposal is IMHO unworkable on the long run.

Anne (:annevk)

Reporter

Comment 19

•

21 years ago

I believe Sjoerd talks about generating a HTML e-mail /with/ XSLT.

Sjoerd Visscher

Comment 20

•

21 years ago

(In reply to comment #18) > That's another bug, already filed separately. At least that one is fixed too if this is fixed. > > > Why would embed work any differently? > > See the spec. It explicitly says so. Do you mean this part: "However, if the URI reference occurs in a context that is always intended to result in a new request, as in the case of HTML's FORM element, then an empty URI reference represents the base URI of the current document and should be replaced by that URI when transformed into a request." I don't think that applies to embedding. FORM isn't even a real exception: You either POST, or when you do a GET, the relative path automatically gets a query part, so it's not a same-document reference anymore. > > > OK, say I include a SVG resource, and it contains a same-document reference to > > a path definition. Result: The SVG resource is retrieved again, instead of > > using the included path definition. > > Not if the base URI is the same as the current document (which it is unless the > author changes it), since then the document is found to already be loaded and is > therefore simply reused. When the base URI is the same as the current document there's not a bug in any of the cases mentioned here. > > #1 HTML E-mail: I wanted to use the same XSLT template for online content and > > E-mail content (so it is easy to mail a page to someone). > > Good lord. XSLT in HTML e-mail. What a heinous idea. This use case on its own is > the strongest argument so far for a WONTFIX. :) I meant server-side XSLT. This applies to any server-side method that tries to generate html mail with the same code and templates as the online pages. > > #2 Locally saved HTML. Same story actually. > > > > To conclude: I can't think of any use case for same-document references being > > resolved using the base URI. > > I agree with that, the problem is that the way the spec says to do it is simply > screwed up and wouldn't work in the real world. Handling the empty URI > differently based on context is just asking for trouble. If you can find a > better way to handle this, then I'm happy to listen, but the current proposal is > IMHO unworkable on the long run. My suggestion is to not look at the context at all. The empty URI is always the current document, and a fragment-id-only URI is always a fragment of the current document. FORM may look special, but it isn't, as explained above.

Boris Zbarsky [:bzbarsky]

Comment 21

•

21 years ago

(In reply to comment #17) > ... using the local source of the document! There is no such beastie; there is only cache (and network if the data is no longer in cache).

Anne (:annevk)

Reporter

Comment 22

•

21 years ago

I accidentally found this example today: <http://www.stadtaus.com/docu/gallery_script/index_en.html> It shows exactly what people expect to happen. It is similar to attachment 169373 [details] only this is one from the real world. (They use the BASE element for referring to IMG elements only it causes to break the rest of the document in contradiction with RFC 2396.)

Hixie (not reading bugmail)

Comment 23

•

21 years ago

And the fact that it doesn't work in IE either suggests to me that we shouldn't change our behaviour here.

Anne (:annevk)

Reporter

Comment 24

•

21 years ago

So there is finally a case where we can be more standard compliant and support something better than IE does and we do not, because we want to follow IE here?

Sjoerd Visscher

Comment 25

•

21 years ago

I have absolutely no problems with ignoring specs to save compatibility with IE. But in this case I don't see the point. We really are fixing more existing pages then we are breaking (...if any. Try to find a good use for the current behavior, I couldn't find it.) I couldn't care less if there was a work-around, but there isn't. If this stays broken, there's simply *no way* to combine a base href or xml:base and same-document references. In more detail: you could add the location of the current document to your same-document links, but the whole point of a base href and xml:base is exactly that you don't know the location of the current document. What also isn't really mentioned yet is that the "fix" for bug 241981 can be reverted. Boris says: "If all of our behavior is correct, the only option I see is to back out support for content-location (perhaps leaving a comment explaining that no one should ever try implementing it because it is broken-by-design)". Well, not all of our behaviour is correct, and there's no need to tell the world to stop adoption of content-location headers. Boris also mentions there that "We've had similar issues with not being able to use fragment identifiers on pages where the document URI is something like wyciwyg, by the way...." So this really isn't just some freak side effect. It's about the essence of same-document references.

Hixie (not reading bugmail)

Comment 26

•

21 years ago

We're already better than IE here, as the testcases in comment 14 show. We _are_ compliant to a spec, as mentiond in comment 13. We're _already_ non-compliant with the new spec for other reasons, and that won't change, since that _does_ break sites. The new spec is extremely vague as to how it should be implemented, and there are still unanswered questions (see comment 21 comment 16, comment 15, comment 13). There are also other problems, like the effect it should have on the DOM (when you query a link for an absolute URI) and on window.location. Yes, it would be great if attachment 169375 [details] worked. If someone can propose an actual way of doing it that doesn't make matters worse, then I'd be all for it. But we need an actual proposal on how to do it.

Sjoerd Visscher

Comment 27

•

21 years ago

Here's another example of this bug: Same-document references in Google Cache pages don't work, because Google adds a base href to the document. I agree that correctly solving this bug is complicated by details. Therefore I propose we fix this bug by resolving same-document references with the document uri. Any missing details should get their own bug. (If they haven't already been filed, like the one Ian mentioned in comment #18.) The wayback machine has the same problem, but oddly enough it works in Mozilla. (Not in IE.) That may be a another uri resolving bug caused by the odd format of the addresses, f.e. http://web.archive.org/web/20040202074232/www.w3.org/TR/REC-html40/intro/intro.html

Boris Zbarsky [:bzbarsky]

Comment 28

•

21 years ago

> Therefore I propose we fix this bug by resolving same-document references with > the document uri. This isn't really a feasible approach -- it would require playing major whack-a-mole, since the existing apis for resolving URI are frozen, so we'd have to hardcode this at every single place where we resolve a URI.

Sjoerd Visscher

Comment 29

•

21 years ago

Is the api frozen, or also it's semantics? Because this wouldn't need an interface change, only a change in the semantics.

Boris Zbarsky [:bzbarsky]

Comment 30

•

21 years ago

Both are frozen. And yes, it would need an interface change, since resolving a string to a URI would require three items -- string, document URI, base URI. Unless every single caller ends up having to do a check for URIs starting with '#', or something equally silly.

Hixie (not reading bugmail)

Comment 31

•

21 years ago

If we do this at all, I would recommend doing one thing, and one thing only: When the user activates an <html:a href=""> link, if it starts with a "#" character, just go to that location without doing a reload or anything else. Don't fix it for XLinks, don't fix it for <link href="">, don't fix it for SVG, definitely don't fix it for url(), don't fix it for _anything_ except <html:a>.

Boris Zbarsky [:bzbarsky]

Comment 32

•

21 years ago

Which means that we'd inconsistently resolve URIs, effectively? That's not happening, sorry.

Hixie (not reading bugmail)

Comment 33

•

21 years ago

I don't see any other way to do it, so I guess this is WONTFIX. It does suck that we don't handle links well in document.written() pages, though. (Along with the other cases raised on this bug.)

Sjoerd Visscher

Comment 34

•

21 years ago

Technically what Ian proposes is not "inconsistently resolve URIs". It does not resolve URIs at all in a special case, i.e. bypassing the URI apis. What about extending the api? http://www.mozilla.org/projects/embedding/rev-interfaces.html#Extending

Sjoerd Visscher

Comment 35

•

21 years ago

How could we all have forgotten about rfc2396bis? <a href="http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html#same-document">It defines same-document references completely differently</a>. The effect is the same, or even better as more references are considered to be same-document references. I think it also fits the current implementation much better. At least the URI api doesn't have to change, but I doubt it will make the implementation any easier. Bis sais: "When a same-document reference is dereferenced for the purpose of a retrieval action, the target of that reference is defined to be within the same entity (representation, document, or message) as the reference; therefore, a dereference should not result in a new retrieval action." I hope this makes sense and that this is unifiable with how currently same-document references are handled when there's no base ref, as they don't do a new retrieval action either.

Sjoerd Visscher

Comment 36

•

21 years ago

I'm thinking the bug summary should be rephrased as something like: "The decision to navigate to a different *document* is now made by comparing the current document location with the link location. Instead the comparison should be made by comparing the current *base* location with the link location." It seems to be somewhat unintuitive, but I think this is how it should work.

Boris Zbarsky [:bzbarsky]

Comment 37

•

21 years ago

(In reply to comment #35) What is a "retrieval action"? Does this mean that URIs are resolved differently depending on how you plan to use them? Sorry, but that's no good either. (In reply to comment #36) What is "current base location" in a document in which every single node has a different base URI?

Sjoerd Visscher

Comment 38

•

21 years ago

Good points. About "retrieval action". There's nothing you need to change here. Currently there are times, including when clicking a link, where the reference is compared to the document location and it is somehow decided to use the current document. The only change is that the reference should similarly be compared to the base location. About which base location to use: it's the base location of the node containing the same-document reference. (Easy answer, hard to implement probably.)

Boris Zbarsky [:bzbarsky]

Comment 39

•

21 years ago

So linking to an image is not a "retrieval action"? What about doing a form submission? How is <a href="#foo" target="bar"> to be handled? > About which base location to use: it's the base location of the node > containing the same-document reference. This wouldn't be too bad to do, actually, but it would mean, for example, that one can't get the href of a link, set window.location to it, and have it do the same thing as clicking a link... That would break some existing content, for sure.

Sjoerd Visscher

Comment 40

•

21 years ago

Just ask yourself the question: what happens if there is no base location (or the base location is the same as the document location.) Same-document references in combination with: 1. a base href or 2. no document location is the main point of this bug. Changing anything else based on the new rfc doesn't seem like a good idea to me. Setting window.location should imho have the same semantics as just following a link. Currently when you set window.location to a URI that is the same as the document location (apart from the fragment identifier), only the fragment identifier is changed. The same should happen when you set window.location to a URI that is the same as the base location (again apart from the fragment identifier). A way to think about what should happen in each case is to remember that if you have a working document without a base href, you should be able to move it somewhere else, and add a base href equal to the original document location and still have everything working. (Take the Google cache as an example.) You should also be able to include it into another document, and have everything working if you set xml:base on the included part. (Taking this to an extreme would mean that in script window.location should actually contain the base location of the script element.)

Sjoerd Visscher

Comment 41

•

21 years ago

*** Bug 146441 has been marked as a duplicate of this bug. ***

Boris Zbarsky [:bzbarsky]

Comment 42

•

21 years ago

> Same-document references in combination with: 1. a base href or 2. no document > location is the main point of this bug. The whole point is that these concepts are not particularly well-defined... Is the example in comment 39 a same-document reference if "bar" is the frame the anchor is in, for example? In brief, it sounds like we're trying to have different behavior for identical markup based on ambiguous criteria. This is rather hard to implement in any sort of logical setup, for obvious reasons. Breaking window.location based on the base href is simply not an option. Neither is treating two different URIs (document base URI and document URI) identically when they are loaded an option. I'm still tempted to mark this wontfix, due to a lack of clear understanding by anyone, as far as I can see, of what the behavior should be.

Boris Zbarsky [:bzbarsky]

Comment 43

•

21 years ago

Oh, and also of note, see bug 146441 comment 6 and bug 146441 comment 8. Note that those are both retrieval actions, and are implemented in the same code in Mozilla. I see no reason for the arbitrary distinction the RFC makes between those two cases, past the fact that they realized that breaking URI resolution like they did would break real-world pages that used forms.

Sjoerd Visscher

Comment 44

•

21 years ago

Attached file Link and From with empty URI — Details

Guess what: a href="" does follow the base href, but form action="" doesn't. Apparently the code is not that equal.

Sjoerd Visscher

Comment 45

•

21 years ago

> How is <a href="#foo" target="bar"> to be handled? First resolve the href like it is done now (using the base URI of the link node). Compare it with the base URI of the link node. If they, aside from the fragment identifier, match then this is a same-document reference. When clicked it should set the location of the target window to the document location of the current window, with the fragment identifier of the link. This works if the target window happens to be the current window. You are right that window.location shouldn't change. But that doesn't break any exisiting pages. The only effect would be that currently broken pages because of this bug, will still be broken when this bug is fixed if they use script to set window.location. (Instead, <a href="http://annevankesteren.nl/archives/2005/01/fragment-identifiers#comment-2974">people now have to write script</a> to just get links working correctly.)

Boris Zbarsky [:bzbarsky]

Comment 46

•

21 years ago

> Guess what: a href="" does follow the base href, but form action="" doesn't. That's a quirk thing for HTML documents only. See comment in the code at http://lxr.mozilla.org/seamonkey/source/content/html/content/src/nsHTMLFormElement.cpp#1327 and code following. Note that this does NOT apply to href="#whatever". I'd rather not scatter this sort of crap all over the rest of our code, yes. ;) > First resolve the href like it is done now (using the base URI of the link > node). Compare it with the base URI of the link node. If they, aside from the > fragment identifier, match then this is a same-document reference. When > clicked it should set the location of the target window to the document > location of the current window, with the fragment identifier of the link. That's not how this situations behaves in practice; such code is often used to scroll one frame from another one (generally a table of contents). Breaking this is not really acceptable -- a lot of websites depend on it. Please do try to propose solutions that are not self-contradictory and don't break existing content, ok?

Sjoerd Visscher

Comment 47

•

21 years ago

> Please do try to propose solutions that are not self-contradictory > and don't break existing content, ok? Did you mean <a href="#foo" target="bar"> together with a base href?

Boris Zbarsky [:bzbarsky]

Comment 48

•

21 years ago

I mean that whatever solution you propose needs to handle all the possible cases, including base hrefs, targeted links, links where just an anchor name is listed, links with relative URIs, links with absolute URIs, form submission, etc, in a reasonably consistent way. The solution should also not break significant amounts of existing content.

Sjoerd Visscher

Comment 49

•

21 years ago

If <a href="#foo" target="bar"> together with a base href is abused for TOCs, then it's obvious you can't fix this bug in quircks mode, only in standards compliant mode.

Boris Zbarsky [:bzbarsky]

Comment 50

•

21 years ago

It's used without a base href. Who said anything about a base href? And it's commonly used in standards-mode pages, since all browsers implement it interoperably. So what's the benefit of implementing a complicated, self-contradictory spec that is incompatible with pretty much all the existing content that would be affected by it? Keep in mind -- just because someone called something a spec doesn't mean it needs to be implemented...

Anne (:annevk)

Reporter

Comment 51

•

21 years ago

The document just had an update: <http://www.ietf.org/rfc/rfc3986.txt>.

Whiteboard: DUPEME

Boris Zbarsky [:bzbarsky]

Comment 52

•

21 years ago

So... the text quoted in comment 0 is gone. The "abnormal resolution" examples in RFC 3986 clearly indicate that '#foo' is resolved relative to the base URI. The long discussion of fragment identifiers makes it quite clear that they are resolved...

Sjoerd Visscher

Comment 53

•

21 years ago

RFC 3986 and RFC 2396bis are the same. The text from comment 0 was already gone in 2396bis. Let me quote a bit from the list of changes between 3986 and 2396: o The determination of whether a URI reference is a same-document reference has been decoupled from the URI parser, simplifying the URI processing interface within applications in a way consistent with the internal architecture of deployed URI processing implementations. The determination is now based on comparison to the base URI after transforming a reference to absolute form, rather than on the format of the reference itself. This change may result in more references being considered "same-document" under this specification than there would be under the rules given in RFC 2396, especially when normalization is used to reduce aliases. However, it does not change the status of existing same-document references. That last sentence is important! I think we need some pseudo code to make things precise. (I used ECMAScript syntax and the names from nsIURI and nsIURL) We have: - relURI, the (possibly relative) URI of the link - baseURI, an absolute URI, the base URI of the part of the document that contains the link - docURI, an absolute URI, the URI of the current document - resultURI, an absolute URI, that is used for the final retrieval /** * When a URI reference refers to a URI that is, aside from its fragment * component (if any), identical to the base URI (Section 5.1), that * reference is called a "same-document" reference. */ absURI = baseURI.resolve(relURI); absURINoFragment = absURI.clone(); absURINoFragment.ref = ""; baseURINoFragment = baseURI.clone(); baseURINoFragment.ref = ""; if (absURINoFragment.equals(baseURINoFragment)) { /** * This is a same-document reference! * When a same-document reference is dereferenced for a retrieval * action, the target of that reference is defined to be within the same * entity (representation, document, or message) as the reference; */ resultURI = docURI.resolve(relURI); } else { resultURI = absURI; } That's it. Now the last part from RFC 3986: "therefore, a dereference should not result in a new retrieval action." is the most confusing. But I believe that code is already in place; same-document references work fine when there is no base href. This code should run at the moment of dereference. (I.e. when the link is clicked etc.) But f.e. when the href property of a link in Javascript is read then the relURI can simply be resolved using the baseURI, like it is now. (This also means that this code must also run on values that are assigned to window.location, and no that doesn't brake existing code.) (Note: this may not be completely compatible with comments I made before. Please use this pseudo code as a basis for further discussion.)

Summary: same-document references should not be resolved at all → same-document references should work according to RFC 3986

Boris Zbarsky [:bzbarsky]

Comment 54

•

21 years ago

> However, it does not change the status of existing same-document references. I don't see anything in RFC 3986 defining said same-document references, though. Could you please point to that part of the RFC? > - docURI, an absolute URI, the URI of the current document Which document is the "current document"? > resultURI = docURI.resolve(relURI); Where is this code being run? > (This also means that this code must also run on values that are assigned to > window.location It's not clear to me how that works, since at least one of the 4 objects your code depends on is missing in that case.

Sjoerd Visscher

Comment 55

•

21 years ago

(In reply to comment #54) > I don't see anything in RFC 3986 defining said same-document references, > though. Could you please point to that part of the RFC? Section 4.4. http://asg.web.cmu.edu/rfc/rfc3986.html#sec-4.4 > Which document is the "current document"? The document the link is in. > > resultURI = docURI.resolve(relURI); > > Where is this code being run? "Where" is up to the person fixing this bug. "When" is at the time of dereferencing. > > (This also means that this code must also run on values that are assigned to > > window.location > > It's not clear to me how that works, since at least one of the 4 objects your > code depends on is missing in that case. There are 3 objects, the 4th is the result. But you are right, the baseURI is missing. But - thinking about it a little more - that doesn't really matter. It means that nothing at all changes as far as Javascript is concerned. So the worst that could happen is the rare case that pages that are currently broken and use Javascript in a weird way will remain broken.

Boris Zbarsky [:bzbarsky]

Comment 56

•

21 years ago

> > Which document is the "current document"? > The document the link is in. That breaks a use case that is commonly used, as I pointed out in comment 46. > "Where" is up to the person fixing this bug. "When" is at the time of > dereferencing. There is no "time of dereferencing". There is a "time when the link URI is resolved to an absolute URI" and a "time when the absolute URI is loaded". The two happen at different times; at the latter time the base URI of the link may be different from what it was at the former time. > It means that nothing at all changes as far as Javascript is concerned. Sure something changes. The basic invariant that clicking a link with no "target" attribute set and setting window.location.href to the .href of the link have the same result is violated.

Sjoerd Visscher

Comment 57

•

21 years ago

(In reply to comment #56) > > > Which document is the "current document"? > > The document the link is in. > > That breaks a use case that is commonly used, as I pointed out in comment 46. Then I don't understand your use case. Could you elaborate? > > "Where" is up to the person fixing this bug. "When" is at the time of > > dereferencing. > > There is no "time of dereferencing". There is a "time when the link URI is > resolved to an absolute URI" and a "time when the absolute URI is loaded". The > two happen at different times; at the latter time the base URI of the link may > be different from what it was at the former time. When a base URI changes, all relative links should be reresolved. If that doesn't happen those links are broken already and that doesn't change whether this bug is fixed or not. > > It means that nothing at all changes as far as Javascript is concerned. > > Sure something changes. The basic invariant that clicking a link with no > "target" attribute set and setting window.location.href to the .href of the link > have the same result is violated. The invariant is only violated is pages which are currently broken allready because of this bug. What's the problem?

Sjoerd Visscher

Comment 58

•

21 years ago

I reread your use case and I think you mean there a link <a href="#foo" target="bar"> in say a.html, and there's a document b.html in the bar frame. And then clicking that link should change the url of the bar frame to b.html#foo. But I created a testpage and that doesn't happen. The url becomes a.html#foo.

Boris Zbarsky [:bzbarsky]

Comment 59

•

21 years ago

> Then I don't understand your use case. Could you elaborate? Hmm... I had a use case this broke; I'll try to recreate it. > When a base URI changes, all relative links should be reresolved. They are. But I don't think you understood. When a link is clicked, we resolve the URI to load, then post an event to load it. When the event fires, we load it. Your system assumes that the "resolve" and "load" stage happen at the same time, but they do not. > The invariant is only violated is pages which are currently broken allready > because of this bug. The point is, the invariant is violated. Any time that happens, pages that depend on the invariant will break. Maybe that's acceptable in this case because there are very few such pages; I'd have to see numbers to make that call.

Sjoerd Visscher

Comment 60

•

21 years ago

> > When a base URI changes, all relative links should be reresolved. > > They are. But I don't think you understood. When a link is clicked, we resolve > the URI to load, then post an event to load it. When the event fires, we load > it. Your system assumes that the "resolve" and "load" stage happen at the same > time, but they do not. Maybe you can point to the implementations of these stages (for hyperlinks as an example)? (I can't write C, but the Mozilla code is usually very readable.) > > The invariant is only violated is pages which are currently broken allready > > because of this bug. > > The point is, the invariant is violated. Any time that happens, pages that > depend on the invariant will break. Maybe that's acceptable in this case > because there are very few such pages; I'd have to see numbers to make that call. It is acceptable because from the pages that depend on the invariant, only the ones that *are already broken* will break.

Boris Zbarsky [:bzbarsky]

Comment 61

•

21 years ago

> Maybe you can point to the implementations of these stages Start at nsGenericHTMLElement::HandleDOMEventForAnchors at: http://lxr.mozilla.org/seamonkey/source/content/html/content/src/nsGenericHTMLElement.cpp#1437 Go through the NS_UI_ACTIVATE case to nsGenericElement::TriggerLink at http://lxr.mozilla.org/seamonkey/source/content/base/src/nsGenericElement.cpp#3151 This posts an event in nsWebShell::OnLinkClick. When the event fires, we end up in nsWebShell::OnLinkClickSync which proceeds to nsDocShell::InternalLoad

Sjoerd Visscher

Comment 62

•

21 years ago

(In reply to comment #61) > Go through the NS_UI_ACTIVATE case to nsGenericElement::TriggerLink at > http://lxr.mozilla.org/seamonkey/source/content/base/src/nsGenericElement.cpp#3151 Thanks, this helps a lot. The NS_UI_ACTIVATE case seems to be the right spot. All the needed data is available. TriggerLink does some security stuff, so there the URI needs to be the real one. And most of the code before it is DOM related, so there the URI still needs the be the one resolved to the base href. The first line of the pseudo code can be skipped, because hrefURI already is resolved to the baseURI. (Although it doesn't have to be skipped if that might be convenient.) It probably needs to be a utility function, as it is going to be called from several different places, perhaps something like: nsCOMPtr<nsIURI> retrievalURI = nsContentUtils::convertURIIfSameDocumentRef( hrefURI, GetOwnerDoc(), baseURI); (OT: I see that the baseURI is passed as aOriginURI to securityManager->CheckLoadURI. This sounds like it would be possible to circumvent certain security restrictions by simply setting the baseURI to a more trusted domain.)

Boris Zbarsky [:bzbarsky]

Comment 63

•

21 years ago

Note that not all nsIURIs support fragments.... This is an ongoing issue with Mozilla code, unfortunately. Again, this is just the code for clicked-on HTML links. This doesn't handle form submissions (how should those behave?), XLinks, etc. Those are elsewhere in the code, as you probably noticed. So if we decide we want to do this, we'd need an exhaustive list of places where URIs should be treated in this weird way. As for baseURI, that URI is already security-checked. See nsDocument::SetBaseURI and nsGenericElement::GetBaseURI.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 64

•

21 years ago

*** Bug 281463 has been marked as a duplicate of this bug. ***

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 65

•

21 years ago

*** Bug 270606 has been marked as a duplicate of this bug. ***

Arthur

Comment 66

•

21 years ago

I've just came along a real world example where this is a problem. The page (http://m-portal.migros.ch/de/desktopdefault.aspx/tabid-138/) has a base URI different to the URL it was loaded from. At the end of the document it says <script language="javascript" type="text/javascript"> location.href="#p"; </script> with the intention to jump to #p in the already loaded document when having finished loading. Konqueror and Mozilla go to BASE#p, IE 6 just jumps to #p in the already loaded document. From the RFCs and the discussions here I'm not sure what the "de jure" behaviour would be. Is <script .../> similar to "action" or is this an internal link which shouldn't result in a refetch?

Arthur

Comment 67

•

21 years ago

The URL given in the last comment now works as intended by the designers. They have added <script type="text/javascript">var baseurl='http://m-portal.migros.ch/de/';tabid=138;</script>

Darin Fisher

Comment 68

•

19 years ago

-> default owner

Assignee: darin → nobody

Component: Networking: HTTP → Networking

QA Contact: networking.http → networking

Hixie (not reading bugmail)

Comment 69

•

19 years ago

It's not clear to me what the specs say, what browseers do, or what authors need. It's also not clear to me what the specs should say, browsers should do, and authors should write. If anyone wants to go ahead and fix this mess -- either in specs or by fixing browsers -- I encourage them to do so.

Hixie (not reading bugmail)

Comment 70

•

17 years ago

HTML5 handles the action="" case and navigation to frag IDs specially now. The rest is expected to work per RFC3986. Please let me know if this causes new problems that I can help fix.

Boris Zbarsky [:bzbarsky]

Updated

•

17 years ago

Blocks: 359190

Patrick McManus [:mcmanus]

Comment 72

•

10 years ago

anne, I'm going to wontfix this based on webcompat.. if there should be a concrete alternative action plan please reopen and ni valentin.

Status: NEW → RESOLVED

Closed: 10 years ago

Flags: needinfo?(annevk)

Resolution: --- → WONTFIX

Anne (:annevk)

Reporter

Comment 73

•

10 years ago

I agree that we should not do this. Thanks for closing.

Flags: needinfo?(annevk)

fail 21 years ago Anne (:annevk) 4 bytes, text/plain		Details
testcase 21 years ago Anne (:annevk) 185 bytes, application/xml		Details
testcase for page without document URI 21 years ago Sjoerd Visscher 223 bytes, text/html		Details
Link and From with empty URI 21 years ago Sjoerd Visscher 115 bytes, text/html		Details