Closed
Bug 87996
Opened 24 years ago
Closed 24 years ago
Anchors with special characters do not work properly
Categories
(Core :: DOM: Navigation, defect)
Tracking
()
RESOLVED
FIXED
mozilla0.9.7
People
(Reporter: eh, Assigned: adamlock)
References
()
Details
Attachments
(6 files)
1.02 KB,
text/html
|
Details | |
1.16 KB,
patch
|
Details | Diff | Splinter Review | |
1.41 KB,
patch
|
Details | Diff | Splinter Review | |
2.13 KB,
patch
|
Details | Diff | Splinter Review | |
2.34 KB,
patch
|
john
:
review+
rpotts
:
superreview+
|
Details | Diff | Splinter Review |
2.41 KB,
patch
|
Details | Diff | Splinter Review |
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux 2.2.16-22 i586; en-US; rv:0.9.1)
Gecko/20010608
BuildID: 2001060810
The standards (at least HTML 4.01) have limitations on what can be included in a
NAME tag ("ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores
("_"), colons (":"), and periods (".")."
(http://www.w3.org/TR/html4/types.html#type-cdata)) and my example doesn't quite
fit to those limitations, but it's kind of strange that if open a link with
special characters in the fragment identifier in a new window, it works just
fine (goes where it's supposed to), but if I try to use a link like this in the
page, it doesn't do anything at all.
So either these special characters in the fragment identifier (#foo) should be
supported or then they shouldn't, but the way it works (and doesn't work) now is
completely illogical.
Reproducible: Always
Steps to Reproduce:
1. Go to a page with an anchor with special characters (and a link to that anchor)
2. Try to press the link (doesn't do anything)
3. Try to open the link in a new window (opens the page and scrolls straight to
the anchor)
Actual Results: Described above
Expected Results: Either it should ignore anchors with special characters all
the time or then it should fully support them (better idea IMHO).
Comment 1•24 years ago
|
||
-> HTML Element
Assignee: asa → clayton
Status: UNCONFIRMED → NEW
Component: Browser-General → HTML Element
Ever confirmed: true
QA Contact: doronr → bsharma
![]() |
||
Comment 2•24 years ago
|
||
Linux build 2001-06-26-06 I see this with the link to
http://www.ifsociety.com/herodishonest/lyrics.php#Don%27t+you+fucking+get+it%3F
Now, the name attribute of the target anchor is invalid (apart from using weird
chars). It should not be HTML-escaped. Replacing %27 with ' and %3F with ? in
the name attribute (but not the href attribute, which _should_ be escaped) makes
this link work fine in Mozilla in both cases...
We should still behave consistently, however. Over to docshell.
Assignee: clayton → adamlock
Component: HTML Element → Embedding: Docshell
QA Contact: bsharma → adamlock
Comment 3•24 years ago
|
||
This works in things like Sun's javadoc
http://java.sun.com/j2se/1.4/docs/api/index.html (where there are wierd chars
(parens, spaces) but not URL-encoded) and completely breaks in, for example, the
New Hacker Dictionary where anchors often have spaces. Wierd.
Comment 4•24 years ago
|
||
The Perl manpages all show this problem. What's happening there is, the link
says http://www.perldoc.com/perl5.6/lib/ExtUtils/MakeMaker.html#make%20install
and the anchor says <a NAME="make%20install">. (I checked the source.)
Not being able to jump around in the Perl manpages makes this a larger problem,
IMO, since I anything *anywhere* generated by Perldoc will exhibit this
problem--and that means all Perl packages.
While these are identical, I suspect Mozilla is translating one to "make
install" and not the other, and thus is unable to find it.
It seems like there would be zero negative impact and 100% positive if we
translated %20 to space (among others), since right now it's *impossible* to
find these links with Mozilla (if I understand the problem right). The only
case where it could affect browser behavior in an unexpected way is if someone
had *both* <A NAME="make install"> and <A NAME="make%20install"> in their page.
IMO, the way to fix this problem is to politely shoot the HTML developer in
question.
Comment 5•24 years ago
|
||
I just made a test page that includes all three variants of link
("#make%20install", "#make+install", "#make install") and all three variants of
<A NAME=...> (same three variants). None of the three links does anything at all.
Specifying them on the browser address bar doesn't work either.
So now the question is: how do you put a space into <A NAME> and have it work?
My answer is: read A NAME the same way you read URIs. I think the spec intends
the same, but I can't even find the place in the spec where it says %20 is legal
in a URI--maybe you can. It does seem to be implied by the excerpted text
below, however.
http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars , which
admittedly describes handling an illegal case, seems to indicate that both URIs
and NAME attribute of anchors should be escaped in the same way.
1. Represent each character in UTF-8 (see [RFC2279]) as one or more bytes.
2. Escape these bytes with the URI escaping mechanism (i.e., by converting each
byte to %HH, where HH is the hexadecimal notation of the byte value).
They even include a special note: "The same conversion based on UTF-8 should be
applied to values of the name attribute for the A element."
If they were being escaped in the same way in Mozilla, I doubt this bug would be
here:
Comment 6•24 years ago
|
||
Comment 8•24 years ago
|
||
Urlencoded anchors often apear in texinfo documents converted to html, as stated
by John Keiser (I currenty have this problem). A node like "Some node" will
become #some%20node in a html document; these nodes properly work in netscape
4x, I'd recomment adding 4xp keyword.
Comment 9•24 years ago
|
||
Check out your own last two examples and you will see that it still does not work:
<li><a href="lyrics.php#We+give%2C+they+take">We give, they take</a>
<li><a href="lyrics.php#Won%27t+be+defeated">Won't be defeated</a>
suomi
Comment 10•24 years ago
|
||
*** Bug 96218 has been marked as a duplicate of this bug. ***
Comment 11•24 years ago
|
||
*** Bug 100531 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 12•24 years ago
|
||
Attachment #53271 -
Attachment mime type: text/plain → text/html
Assignee | ||
Comment 13•24 years ago
|
||
Run the updated test case and you should see the second anchor works but the
others do not.
The reasons for failure/success in each case:
1. The anchor is unescaped "make install1", but the element name remains escaped
in the DOM as "make%20install1" so they do not match.
2. Anchor and element name match as "make install2" so scrolling happens.
3. Anchor is "make+install3" whereas element name is "make install3" so they do
not match.
The code that attempts to create a list of matching named elements is in
nsHTMLDocument::GetElementsByName() which is called by PresShell::GoToAnchor().
Therefore the proper fix is to probably to modify GetElementsByName to unescape
the element name before doing the comparison. I'll work on a patch and CC the
DOM people in the meantime for their opinion.
Assignee | ||
Comment 14•24 years ago
|
||
Assignee | ||
Comment 15•24 years ago
|
||
The attached patch fixes the problem with the first anchor, but probably
degrades performance in this oft-called function.
It works by unescaping the element name before comparing it in
nsHTMLDocument::GetElementsByName().
Do the DOM folks know a better way to do this? Where is the element name set in
the first place? Ideally the name should be unescaped when the element is built
and that way we avoid having to do this every time GetElementsByName is called.
Comment 16•24 years ago
|
||
Comment 17•24 years ago
|
||
The patch I just attached makes us store the unescaped name in the DOM, that
should IMO fix case 1 too, I don't have a build handy where I can test this now,
can someone else test the patch and report back?
![]() |
||
Comment 18•24 years ago
|
||
Yep. jst's patch fixes case 1 as well.
Assignee | ||
Comment 19•24 years ago
|
||
I like the content sink patch, it works and was what I was getting at. r=adamlock
Test 3 still doesn't work however. That would require unescaping the anchor text
too. We could justify doing nothing about this because it's an illegal anchor
name anyway.
For that matter, so is an anchor containing space characters. Perhaps we should
be strict about this and simply toss non-conformant name/id attributes away?
Comment 20•24 years ago
|
||
Assignee | ||
Comment 21•24 years ago
|
||
Ack, it just occured to me, you'd have to do this for ID too.
Comment 22•24 years ago
|
||
Assignee | ||
Comment 23•24 years ago
|
||
I think it has to be this way around since the id attribute can be on any type
of element:
if (keyAtom == nsHTMLAtoms::id ||
(nodeType == eHTMLTag_a && keyAtom == nsHTMLAtoms::name)) {
Comment 24•24 years ago
|
||
But is #foo type URI's supposed to scroll to any element with the ID "foo"? I
don't like doing this for all ID attributes, does IE do that?
Comment 25•24 years ago
|
||
IDs have more far-ranging consequences than names ... and the
spec only talks about converting NAME. I don't think we should do it.
r=jkeiser,jst-review.pl on either patch, depending on whether it's decided ID
should be converted or not.
Assignee | ||
Comment 26•24 years ago
|
||
See:
http://www.w3.org/TR/html4/struct/links.html#h-12.1.1
"Destination anchors in HTML documents may be specified either by the A element
(naming it with the name attribute), or by any other element (naming with the id
attribute)."
Comment 27•24 years ago
|
||
Agreed, but again, we're dealing with a weird case here. IDs are (IMO) more
meant to be programmatically accessed, and thus need to retain their original
values. Additionally, the spec only calls for converting NAME.
In IE, what seems to happen is, the NAME or ID is converted on-the-fly when the
anchor is actually compared. When you do myElement.id, it shows "x%20y", not "x
y". Because of the consequences here, I think we should leave ID the same.
Assignee | ||
Comment 28•24 years ago
|
||
John Keiser, do you have a link pointing to where it says name values should be
unescaped?
It is my understanding that neither names nor ids should even contain spaces
and we would be perfectly entitled to just throw away such values. This
behaviour of this patch is not that strict but if we really followed the spec
then none of these anchors should work.
Er... we need to scroll to all IDs in a document, regardless of element. HTML4
deprecated name, and gave id to just about all the elements.
Also, since space in URL works and space in name works NOW (there are also a
number of websites abusing this behavior), I think we need to
support space in HTML ids as well (not in XML though, validated parsers would
flag such documents invalid).
By the way, would some of this be needed for XML, or is it taken care of by default?
Comment 30•24 years ago
|
||
If we need to do this for ID's we should IMO do this conversion in the code that
searches for the element, not store the converted ID in the DOM.
Comment 31•24 years ago
|
||
Adam, see http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars (second
note). And my comment on this bug at 2001-08-07 12:08.
Heikki, no worries, spaces will still work with NAMEs and IDs, it's %20 that
doesn't work right now. If I have my way, %20 will continue to not work with ID
(and we will be able to safely have IDs with % in them) but will work with <A NAME>.
JST, I agree.
Assignee | ||
Comment 32•24 years ago
|
||
jst, a space char is illegal in an id or name anyway so one could argue that it
doesn't matter that you unescape it when you create the element. The reason I
suggested doing it by putting the unescaping into the parsing is because I was
uneasy about slowing down every call to the GetElementsByName function for the
sake of an edge case.
John Keiser, we're not talking about the href here. Yes this can contain escaped
chars, but AFAIK the id and name shouldn't and don't need to be escaped.
Basically the only legal form for these attrs is [a-zA-Z][a-zA-z0-9:.-]*. This
bug deals with the fact that when they are escaped, they are not matched against
the unescaped anchor in the scrolling code.
Assignee | ||
Comment 33•24 years ago
|
||
I think we need to decide:
1. If illegal chars are allowed in these elements and what to do if they're not
2. If illegal char means any illegal char or just a select few such as spaces
3. If chars may be escaped and where unescaping occurs
Comment 34•24 years ago
|
||
Adam, the second note at that URL specifically says, "The same conversion based
on UTF-8 should be applied to values of the name attribute for the A element."
In fact, that entire link says that these things are illegal but shows how to
implement them if you do allow--and we are talking about allowing them because,
as Heikki points out, many websites are currently "abusing" the behavior
(including the Perl docs).
I think that section of the spec adequately answers to the questions you
asked--it says to escape non-URI characters using the URI encoding scheme. In
actuality it recommends the opposite of what we are doing (unescaping them), but
I think the effect is pretty much the same. What else do you think needs to be
said?
All I can see that's left to decide is whether to make ID escape, and I don't
think we should do that unless we do something like what JST proposes (escape
only when looking for it as an anchor). I don't think we should ever change the
actual ID attribute. For example, some authors might want to use the % char in
ID as something similar to . or -> in C++.
But I have a funny feeling that fix is not going to be as quick as this one, so
we may want to put in the fix for NAME and leave ID for another bug.
Comment 35•24 years ago
|
||
Do we know of existing sites that rely on being able to use %xx fu in element ID's?
Updated•24 years ago
|
Attachment #53613 -
Flags: review+
Comment 36•24 years ago
|
||
So does anyone have a problem with committing the A NAME= patch now and dealing
with ID in a separate bug (that fix would be larger)? This will improve
usability in Mozilla directly.
Assignee | ||
Comment 37•24 years ago
|
||
I don't have a problem with the NAME patch for the sake of compatibility though
I think ID should strictly conform to the spec.
Target Milestone: --- → mozilla0.9.6
Comment 39•24 years ago
|
||
Comment on attachment 53613 [details] [diff] [review]
Same as above, but only unescape name for anchors, as jkeiser pointed out to me.
sr=rpotts@netscape.com
Attachment #53613 -
Flags: superreview+
Comment 40•24 years ago
|
||
Comment on attachment 53613 [details] [diff] [review]
Same as above, but only unescape name for anchors, as jkeiser pointed out to me.
sr=rpotts@netscape.com
Comment 41•24 years ago
|
||
Fix checked in.
Assignee | ||
Comment 42•24 years ago
|
||
Marking this bug fixed. ID attributes are untouched, NAME attributes are unescaped.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 43•15 years ago
|
||
For the record, for HTML5-compliance, the HTML5 parser doesn't have special treatment for the name attribute on <a>. Any evidence of that part of the patch still being necessary?
You need to log in
before you can comment on or make changes to this bug.
Description
•