Closed
Bug 58819
Opened 24 years ago
Closed 24 years ago
Non-Latin 1 anchors don't work
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
People
(Reporter: dbaron, Assigned: nhottanscp)
References
Details
(Keywords: intl, Whiteboard: [rtm-] [r=erik])
Attachments
(9 files)
2.55 KB,
patch
|
Details | Diff | Splinter Review | |
3.08 KB,
patch
|
Details | Diff | Splinter Review | |
3.15 KB,
patch
|
Details | Diff | Splinter Review | |
3.71 KB,
patch
|
Details | Diff | Splinter Review | |
3.70 KB,
patch
|
Details | Diff | Splinter Review | |
3.32 KB,
patch
|
Details | Diff | Splinter Review | |
3.87 KB,
patch
|
Details | Diff | Splinter Review | |
1.32 KB,
text/html
|
Details | |
3.91 KB,
patch
|
Details | Diff | Splinter Review |
DESCRIPTION: Named anchors with cyrillic text aren't working. I suspect the
problem has something to do with URL escaping -- perhaps we need to unescape the
URL when we go to the anchor? We should also be careful to check the spec for
correct behavior.
This was pointed out on bug 56285 by Eugene Savitsky.
STEPS TO REPRODUCE (from bug 56285):
1. Load http://www.fcenter.ru/wn/hn02082000.htm
2. Select the second link (the one below 3dfx..., which works fine)
ACTUAL RESULTS
Nothing happens, only the URLbar updates to
http://www.fcenter.ru/wn/hn02082000.htm#%E1%E5%EB%E0%FF
EXPECTED RESULTS:
* page scrolls down
DOES NOT WORK CORRECTLY ON:
* Linux, mozilla, opt build of 2000-11-01 tree closure
Assignee | ||
Comment 1•24 years ago
|
||
I can reproduce this when I set character coding to windows-1251.
But it works if character coding is ISO-8859-1.
It might be that charset conversion (document charset to unicode) is missing
after url escape. That could be the reason it works for only ISO-8859-1.
But I have no idea where the escaping is happening for anchors, I mean what
source file, I need help.
Cc to i18n people, this might be more generic issue (e.g. Japaense anchor name).
Comment 2•24 years ago
|
||
As far as I can tell, this is a generic issue.
Currently, no anchors are working unless it is interpresed
as ISO-8859-1. Japanese anchors don't work, either.
I changed the summary to reflect the general nature of the
problem. This is a big problem and it will break many
web pages.
Summary: problems going to cyrillic anchors → Non-Latin 1 anchors don't work
Comment 4•24 years ago
|
||
I'm trying to quantify what percentage of web sites are using
non-ASCII anchors. My initial rough estimate is 20% of all non-ASCII
sites and 10% of all web sites. Please correct me with solid info
if anyone has it.
Comment 5•24 years ago
|
||
CC'ing a bunch of people whose names appear on the other bug.
Comment 6•24 years ago
|
||
Bobj/Msanz - Whatare your thoughts on this bug? Would you consider it a
stop-ship bug? Remember, the CORE team is at a point of RC1, right now.
Assignee | ||
Comment 7•24 years ago
|
||
In the code, Latin1 is assumed(line 3753).
We should convert the enescaped string from document charset to unicode.
nsDocShell.cpp
nsDocShell::ScrollIfAnchor
3748 char *str = sNewRef.ToNewCString();
3749
3750 // nsUnescape modifies the string that is passed into it.
3751 nsUnescape(str);
3752
3753 sNewRef.AssignWithConversion(str);
3754
3755 nsMemory::Free(str);
3756
3757 rv = shell->GoToAnchor(sNewRef);
Assignee | ||
Comment 8•24 years ago
|
||
I don't know how to get a document charset in docShell.
Cata, do you know anything about that?
Comment 10•24 years ago
|
||
We believe non-latin1 anchors are widely used and this would break many pages.
Whiteboard: [rtm need info]
Assignee | ||
Comment 11•24 years ago
|
||
Assignee | ||
Comment 12•24 years ago
|
||
The patch works with the Russian page.
Is there Japanese data that I can test?
OS: Linux → All
Hardware: PC → All
Assignee | ||
Comment 13•24 years ago
|
||
There is a description in the HTML spec.
B.2.1 Non-ASCII characters in URI attribute values
http://www.w3.org/TR/html4/appendix/notes.html#idx-anchor
Assignee | ||
Comment 14•24 years ago
|
||
Corrent link:
http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1
Assignee | ||
Comment 15•24 years ago
|
||
Comment 16•24 years ago
|
||
Thanks for adding the UTF-8 conversion code.
There are many places in that file where they check mContentViewer before using
it. Please do the same in your patch.
Assignee | ||
Comment 17•24 years ago
|
||
Assignee | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 19•24 years ago
|
||
The patch #3 convers the anchors in the same document.
There are other cases also not working for non ASCII names. Those are the url
bar, open web location and referenced from othter documents.
nsHTMLContentSink.cpp assumes the string as Latin1 by using AssignWithConversion
for the input url string (line 3514).
3507 rv = mDocumentURI->QueryInterface(NS_GET_IID(nsIURL), (void**)&url);
3508 if (NS_SUCCEEDED(rv)) {
3509 rv = url->GetRef(&ref);
3510 NS_RELEASE(url);
3511 }
3512 if (rv == NS_OK) {
3513 mRef = new nsString;
3514 mRef->AssignWithConversion(ref);
3515 nsCRT::free(ref);
3516 }
Assignee | ||
Comment 20•24 years ago
|
||
Assignee | ||
Comment 21•24 years ago
|
||
*** Bug 58980 has been marked as a duplicate of this bug. ***
Comment 22•24 years ago
|
||
I quickly glanced at the patch and noticed that there's "new nsString" in there
without a null check after the new call, that needs to be fixed. Also, do we
need two string members for the ref in the content sink, couldn't mRef be
replaced with mRefInDocCharset?
Comment 23•24 years ago
|
||
sr=waterson for patch#3.
Assignee | ||
Comment 24•24 years ago
|
||
Johnny, I tried to minimized changes for the existing code (to make ASCII case
unchanged). I think mRefInDocCharset can replace mRef but I don't do that change
now. I am going to add the null check for "new nsString".
Assignee | ||
Comment 25•24 years ago
|
||
Assignee | ||
Comment 26•24 years ago
|
||
Comment 27•24 years ago
|
||
PDT marking [rtm-] because this bug doesn't crash or lose data, and we're out of
time to shake out a big patch.
Whiteboard: [rtm need info] [r=erik] → [rtm-] [r=erik]
Assignee | ||
Updated•24 years ago
|
Target Milestone: --- → M22
Comment 28•24 years ago
|
||
Are we fixing this for the trunk?
Assignee | ||
Comment 29•24 years ago
|
||
Assignee | ||
Comment 30•24 years ago
|
||
Erik, please review patch #6 & #7.
Patch #6, almost same as patch #4, I just added an empty string change.
Patch #7, this is for the case reference outside the document. Basically doing
the same things as nsDocShell.cpp patch (try UTF-8 first then document charset).
Comment 31•24 years ago
|
||
Netscape Keyword List page at Netcenter (at the above URL) or
from the drop down list of a Japanese NS6 build uses Japanese
anchors and they are failing. (Added Lynn W.)
Comment 32•24 years ago
|
||
Yes. I reported this some time ago to Chau in Bugsplat. I am still waiting for
her to fix the page.
Assignee | ||
Comment 33•24 years ago
|
||
I talked to erik about patch #6. Since the additional change assumes Latin1 as
an input for special case, it is not following the spec. So we decided to use
patch #4 (got r and sr already).
About patch #7, for external anchor, I will make sure this is consistent with
related bug fix (bug 10373) before I get reviews and check in.
Comment 34•24 years ago
|
||
Please remove the redundant mRef... member from the sink before checking in if
you're using patch #4...
Assignee | ||
Comment 35•24 years ago
|
||
>So we decided to use patch #4 (got r and sr already).
My mistake, I meant patch #3.
Patch #4 will not be used (I rewrote it as patch #7).
Assignee | ||
Comment 36•24 years ago
|
||
Checked in patch #3 (fix for internal document case) to the trunk.
I tested patch #7 (fix for external anchor) with Japanese path name (used test
data of bug 10373), like this.
<a href="http://mugen/url/ÆüËܸì/ja_anchors.html#´Á»ú">ÆüËܸì</a>
It opens the correct file and scrolls to the anchor correctly.
Comment 37•24 years ago
|
||
What is the problem here? http://www.alion.ru/tech/slovar_cdrw.shtml
The page is on russian, but the anchors are latin. When pressing "? ??????????"
it does not takes tou to the top...
Assignee | ||
Comment 38•24 years ago
|
||
In that page, the name is "#top",
<h6><a name="#top">[</a>
It works after I changed href to "##top",
<a href="##top">? ??????????</a>
The spec say NAME must begin with a letter ([A-Za-z])
http://www.w3.org/TR/html4/types.html#type-name
But the link work with 4.x. I am not sure we want to support this in mozilla.
Assignee | ||
Comment 39•24 years ago
|
||
Sorry, I mentioned a wrong place in the spec.
http://www.w3.org/TR/html4/struct/links.html#adef-name-A
Comment 40•24 years ago
|
||
I have removed the live Netcenter test case -- to my chagrin,
they fixed the problem by changing Japanese anchors to
ASCII ones.
Assignee | ||
Comment 41•24 years ago
|
||
Assignee | ||
Comment 42•24 years ago
|
||
Comment 43•24 years ago
|
||
r=jst for patch #8
Assignee | ||
Comment 44•24 years ago
|
||
Fixed in the trunk.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 45•24 years ago
|
||
I verified this in 2000-12-20-10 Mac Mtrunk, 2000-12-19-04 Win32 Mtrunk, and 2000-12-19-08 Linux Mtrunk builds.
Assignee | ||
Comment 46•24 years ago
|
||
*** Bug 52844 has been marked as a duplicate of this bug. ***
Comment 47•24 years ago
|
||
*** Bug 62933 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•