Closed Bug 78206 Opened 23 years ago Closed 23 years ago

a href="?xyz" does wrong thing

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9.2

People

(Reporter: sballard, Assigned: sballard)

Details

Attachments

(3 files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux 2.2.17 i686; en-US; 0.8.1) Gecko/20010326
BuildID:    2001032614

If a page at "http://www.foo.com/foo/bar.html" contains a link such as
<a href="">, Mozilla (and all other browsers) correctly interpret this as a link
to "http://www.foo.com/foo/". However, if the page contains a link such as
<a href="?foo=bar">, Mozilla interprets this as
"http://www.foo.com/foo/bar.html?foo=bar". Other browsers (including 4.x and IE)
correctly interpret it as "http://www.foo.com/foo/?foo=bar". Mozilla's behavior
is not just inconsistent with other browsers, it is also inconsistent with its
own behavior for href="".

Reproducible: Always
Steps to Reproduce:
1. Visit a page whose URL does not end in /, containing <a href="?foo=bar">
2. Hover over the link and view statusbar.
3. Click the link and observe resulting URL.

Actual Results:  Both the statusbar while hovering over the link and the result
from clicking it are incorrect: the ?foo=bar is appended to the current URL.

Expected Results:  The URL should be truncated at the last / before appending
the ?foo=bar.

I've only tested this on Linux x86, 0.8.1 but I'm setting All/All because it
seems unlikely to be a platform-specific issue. If it doesn't happen on Windows
or Mac, please feel free to change it back to PC/Linux. I couldn't find any bugs
indicating that this has been fixed since 0.8.1, but when 0.9 comes out I'll
post back to this bug to confirm that it is still valid.
4.x gets this right ==> 4xp
nominating for mozilla1.0 since it's a definite bug and the fix is almost
certainly easy.

This breaks a site I am developing. Sorry, it's password-protected so I can't
give the URL, but I use the query string for session tracking and I frequently
provide a "back" link that is <a href=""> plus an appropriate querystring for
the session tracking.
Keywords: 4xp, mozilla1.0
Target Milestone: --- → mozilla1.0
Just tested on 0.9: This bug is still present (just following up on my promise
to confirm this).
The following patch fixes this, and looks like the Right Thing To Do to me. A
URL beginning with a '?' is just an empty relative path with a query string, so
it doesn't need any special processing. Can I have r= and suggestions on who I
should ask for sr= from? (This is my first ever patch, so I only know the theory
of the review process, not the practice).

Adding patch keyword. Should I also assign this bug to me and set its target
milestone?

Anyway, here's the patch. (Gotta love when you can fix bugs by *removing* code...)

Index: netwerk/base/src/nsStdURL.cpp
===================================================================
RCS file: /cvsroot/mozilla/netwerk/base/src/nsStdURL.cpp,v
retrieving revision 1.75
diff -u -r1.75 nsStdURL.cpp
--- netwerk/base/src/nsStdURL.cpp	2001/05/17 06:48:20	1.75
+++ netwerk/base/src/nsStdURL.cpp	2001/05/17 14:34:36
@@ -750,19 +750,6 @@
                                 ESCAPED);
             finalSpec += (char*)start;
             break;
-        case '?': 
-            rv = AppendString(finalSpec,mDirectory,ESCAPED,
-                              nsIIOService::url_Directory);
-            rv = AppendFileName(finalSpec,mFileBaseName,mFileExtension,
-                                ESCAPED);
-            if (mParam)
-            {
-                finalSpec += ';';
-                rv = AppendString(finalSpec,mParam,ESCAPED,
-                                  nsIIOService::url_Param);
-            }
-            finalSpec += (char*)start;
-            break;
         case '#':
             rv = AppendString(finalSpec,mDirectory,ESCAPED,
                               nsIIOService::url_Directory);
Keywords: patch
Keywords: review
So, the html specs reference rfc1808, which is updated by rfc2396.

The <a href="?x"> example is different in both - mozilla appears to follow
rfc1808, while ns4/ie follow rfc2396. The relevent examples section in both rfcs
give different results. Sigh.

I suppose, for both compatability and correctness issues, this patch should be
accepted. dougt?

as for <a href="">, rfc2396 changes rfc1808 (we follow rfc1808):

"G.4. Modifications from RFC 1808

   RFC 1808 (Section 4) defined an empty URL reference (a reference
   containing nothing aside from the fragment identifier) as being a
   reference to the base URL.  Unfortunately, that definition could be
   interpreted, upon selection of such a reference, as a new retrieval
   action on that resource.  Since the normal intent of such references
   is for the user agent to change its view of the current document to
   the beginning of the specified fragment within that document, not to
   make an additional request of the resource, a description of how to
   correctly interpret an empty reference has been added in Section 4.
"

so our behaviour is incorrect.

dougt - are you the standards person for this sort of issue?
And having now reread section 4, the behaviour depends on whether or not "the
URI reference occurs in a context that is always intended to result in a new
request".

_EXPLICITLY_, (as in, the spec says we must), we are required to handle this
example differently for HTML forms than we are for hrefs (the "new view" stuff
in section 4 has to be read in conjunction with section G4, I suppose). Hmm...
Well, I'm glad the spec agrees with me for one case, anyway :)

FWIW, here are my arguments for keeping <a href=""> as it is:

1) I have pages that used this feature that were last modified in Oct 1997.
This significantly predates RFC2396, and at the time the browsers I developed
these pages on were IE3 and NS3. These pages have continued to work in every
browser I've tried since, up to and including Mozilla. RFC2396 preserves the
semantics from RFC1808 on the behavior of href="", without any comment about
incorrect implementations in then-current browsers. This strongly suggests, to
me, that nobody at the IETF cared enough about the issue of how href="" should
behave to have actually tested it on any browser. Section G.4, as cited by
bbaetz, claims to refer to href="" but its actual contents are entirely
concerned with href="#foo". This seems to be the limit of the thought they gave
to this issue. Admittedly, I'm arguing for backwards compatibility over strict
standards compliance, but only in an obscure feature that is not only never
used (because nobody supports it) but that is actually less useful than the
current behavior (see 3 below).

2) By a suitably tortuous reading of the spec, I can twist the meaning to cover
our current behavior *anyway*. (I'll admit that this is seriously stretching,
ifnot breaking, the language of the spec, but still...)
Our current behavior is supposedly legal if "the URI reference occurs in a 
context that is always intended to result in a new request". First I have to
stretch "request" to include possibly re-fetching from cache, but if I do this,
*almost* all <a href>s *are* supposed to result in a new request. The only time
they shouldn't result in a new request is if the URI consists *only* of a
fragment identifier (that is, even the URI "currentpage.html#foo" should still
result in a fresh request for currentpage.html, just as
<a href="currentpage.html"> should. It's only <a href="#foo"> that should
scroll the current document without re-fetching.

3) The current behavior is useful: I can get to http://foo/bar/ from
http://foo/bar/baz.html without having to know what directory I'm in (resulting
in code that can be moved from one location to another without breaking). The
alternatives are -
- link to index.html: this breaks the visited-ness of the link because the
  browser cannot tell that /index.html is the same as /
- link to "../bar/" or to "/" if in the root - the code is now only portable if
  the parent directory keeps the same name.
- link to "./" - this is rfc-compliant but I don't think it's historically
  supported. Users of noncompliant browsers will get chains of "./" appended to
  their URLs as they traverse lots of these links, and will ALSO suffer the
  visited-ness breakage as a result of not knowing that the URLs are the same.
None of these alternatives are satisfactory. Further, the proposed behavior is
not very useful:
- You almost always know the page you are currently on. You could make
  rename-proof pages with the proposed behavior, but you still have to change
  any links to the page from anywhere else, so why not change the ones in the
  page too?
- You can't *actually* use this behavior anyway, because no other browser
  supports it.
Removing a useful feature for an unusable and un-useful feature doesn't seem
worthwhile.

4) To my knowledge, nobody has EVER reported this behavior as a bug in any
browser. The only reason this issue even came up is due to careful scrutinizing
of the RFCs as a result of my submission of this bug. So it doesn't seem that
anybody wants this, except for a standards committee that doesn't particularly
care about this feature anyway (see 1).

5) It's completely illogical for <a href=""> and <a href="?foo"> to differ by
anything other than the query string. Now, you can argue on the other hand that
it's illogical for <a href=""> to differ from <a href="#foo">, but *one* of
themis going to have to differ if we are to meet the standards for "#foo" and
"?foo". "#foo" is already significantly different from any other URL, as
mentioned in #2 above, because it doesn't cause a new request. I believe you
canget the "desired" behavior of "" anyway by using <a href="#">.

Now, clearly all of these are value judgements, and they're all going up
againstthe clear wording of the spec. So it's not a clear-cut decision even with
all
of these arguments in favor of the status quo. But I hope these arguments at
at least suggest that it's not a clear-cut decision the other way either.
Just thought of this alternative interpretation:

From rfc2396, G.4: "an empty URL reference (a reference containing nothing aside
from the fragment identifier)"

This "nothing aside from a fragment identifier" actually strongly implies that a
fragment identifier (#foo) MUST be present for this section to apply. You can
therefore plausibly disregard at least this section entirely when the URL is
*completely* empty. I'm not sure if any other parts of rfc2396 give explicit
instructions for the null URL, but if not, we may be off the hook.
You're nitpicking :) It may be possible to argue that, but since one of the
examples is the empty string, thats the one that the rfc specifies. However,
there is the backwards compatibility issue.

I really don't know what the correct behaviour mozilla should follow is. I'd opt
for 4.x/ie compatibility - ie leave the "" behaviour as is, and change the ?
behaviour, taking your patch as is.

What version of ie did you test? (I haven't tested either ie or ns4)

I'll let one of the networking people decide on the correct behaviour. dougt?
Well, section 5.2, step 2 rules out the possibility that there's a loophole. The
standards clearly say that <a href=""> is a reference to the current document
without a refresh. I still maintain, though, that the standard is useless and
wrong (except when a fragment identifier is given, which was clearly the intent
in the first place).

Cc'ing gagan who was mentioned on irc as another person with a possible interest
or insight into this bug, and valeski who shows up on cvsblame for all the lines
affected by this bug. If any of you guys feel like giving my patch r= or sr=,
I'd much appreciate it.

Should I assign this bug to myself while waiting for r/sr?
->dougt
Assignee: neeti → dougt
Yeah, I'm nitpicking - I just hate arguing *against* standards compliance, even
when the standard in question is screwy :)

How about this: split off the <a href=""> into a separate bug which we can
WONTFIX if the consensus eventually agrees with me. Then we can check this patch
in and leave the other bug for the flamewars, if there are any.

For href="?foo" I've tested NS4.7 and IE5.5. For href="" I've tested at least:
NS3.0x, NS4.0x, NS4.5-7, NS6.0x, Mozilla (up to now), IE3.x, IE4.x, IE5.0,
IE5.5, Konqueror and lynx. I've been using href="" in my pages since 1997 with
no problems. I'd say that qualifies as a de-facto standard :)

Exactly. Leave the bug assigned to dougt, since hes going to end up looking at
it, probably. I just mentioned that "" in passing, really, since I wanted to
check what the spec was (when you asked on IRC).

Anyway, if dougt agrees that this is the way to go, r=bbaetz for the patch - its
doing what its intended to do.
Changing my milestone nomination for this bug to 0.9.2 for the following
reasons:
1) It's not a regression or a serious bug, and although I can't imagine it
causing any regressions, it has only had a little testing ==> Not 0.9.1
2) The patch is here and works, and only needs checking in, it's a definite
standards-compliance bug, and the fix seems very safe and unlikely to
destabilize anything. So there's no reason to put it off any further.
Keywords: mozilla1.0mozilla0.9.2
mass move, v2.
qa to me.
QA Contact: tever → benc
It sounds like the right right thing to do, but the patch doesn't looks right.  
Shouldn't we just remove the AppendFileName()?  Doing this results in the same 
behavior as 4x.  Maybe I am missing something.
Whiteboard: reviewing submitted patch.
Target Milestone: mozilla1.0 → mozilla0.9.2
Dougt, sure, do that - then compare the resulting code with what happens in the
default case of the switch...

However, your comment did make me realize that I'm not sure exactly what should
happen to mParam - I haven't really grokked the behavior of a ; in a URL. I also
haven't time right now to pore over RF2396 again :) So you could be right that
it needs to be preserved.

The motivation behind my patch was that href="" takes the default case of the
switch, and href="?foo" is just href="" plus a querystring. So my gut instinct
was that the behavior should be the same.
Yeah, but the default case is not appending the mParam.

andreas, any thoughts on this bug.
What do you make of my reasoning above: that href="?foo" is just the same as
href="" but with a querystring? By that logic, ?foo should take the default
because href="" does so. When I started using href="?foo" in my sites, that was
the thought that led me to try it.
The ; bit is basically deprecated in the updated RFC, IIRC.
Shame on me, although Judson checked it in, this is my code. I have to admit, I
have not read rfc2396 carefully enough, at least when it comes to relative urls. 

mozillas handling of relative urls is (mostly) based on the examples given in
rfc1808 (see urltest.cpp).

After rereading 2396 and 1808 I think Stuart Ballards patch is correct,
furthermore also the ; can be removed. 

As for the href = "" debate, the behaviour should stay as it is I think,
although technically it is not conforming to the written standard.
urltest.cpp should also be changed to fit rfc2396. I have done that in my tree,
but my version of urltest differs in many other ways, some rewriting done with
tever that never made it into the tree. 
lxr shows two urltest.cpp files - one in netwerk/test and one in
netwerk/base/tests. Looking at the content, I'm guessing you mean the former?

If so, I can supply the (obvious) patch that would just change the array of test
URLs and the comment that goes with it, but if there's anything else to change,
I don't know what it is (I'm not at all familiar with this code).
The one in netwerk/base/tests is really dead meat. Should be removed. I will
attach my patch to urltest, but it does much more than just move from rfc 1808
to 2396, for example change the file format of the urlparse.dat file and supply
a specific version for Windows / OS/2. This was done for special tests with file
urls which differ based on the operating system.
These urlparse.dat files also assume that the patch for bug 61269 is applied.
re: urltest

I'm in the process of cleaning up the test directory (and writing automated
regression tests) So don't worry about all the out of date tests. Most of them
don't work, unfortunately.
So in that case we're good to go with this patch, right? bbaetz, feel like
updating your r= to cover andreas' latest patch (id 37831)?

Do we also need r= from gagan as module owner?

(sorry, I'm learning mozilla rules as I go along here...)
Just found out on IRC that an sr from a module owner or peer would be
appropriate here, so cc'ing darin (dougt is already on the bug)

darin/dougt - sr? (the patch is *deleting* lines, so most of the stuff normally
covered by sr doesn't even apply right? Could an rs= be appropriate here?)
r/sr.

over to gagan.  gagan, can you see that this gets approved and checked in. 
Assignee: dougt → gagan
->dougt
Assignee: gagan → dougt
darin can you r/sr this?
I thought it already had r=bbaetz and r/sr=dougt - that would make it just
waiting on a=, which I have a mail ready to send to drivers about...

(I also need to check that bbaetz's r= is still good for Andreas' patch, but it
should be - but bbaetz is idle on irc right now ;) )
r=bbaetz for the updated patch
a= asa@mozilla.org for checkin to the trunk.
(on behalf of drivers)
Blocks: 83989
Taking bug because I intend to get this checked in one way or another...
Assignee: dougt → sballard
No longer blocks: 83989
fix checked in.
Status: NEW → RESOLVED
Closed: 23 years ago
Keywords: review
Resolution: --- → FIXED
Whiteboard: reviewing submitted patch.
verified
Status: RESOLVED → VERIFIED
+making test: Can you provide a brief two or three URL test file?

Keywords: makingtest
It is so disheartening to me that this "bug" was "fixed".  We spent sooooo long 
making NS6 work.  Then between minor releases, you change a feature from 
working the same as IE5 to working differently.  It's driving me to drink.

(not that it matters bug)
The assertion that one always knows the file one is using is only obvious for 
static web sites.  It is very convient to be able to jump from say record 1 to 
record 2 without knowing which "view" of the record you are using.
<a href="?record=2"> does (did) that nicely.

Make mine a Scotch (neat).
No need to get drunk ... it was changed back to it's old behaviour a few days
ago, expect it to be working as usual in 0.9.4.

Is this really a dupe of some other http URL parsing bug?
See bug 90439 for the discussion on reverting this.

Should we reopen this bug and then WONTFIX or INVALID it, since the fix has been
reverted?
if we update the summary to "?xxx" -> something specific (rather than "wrong"),
then we can set the status to FIXED or WONT based on the summary and the current
implementation.

I'll defer to the judgement of the crowd.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: