Closed
Bug 58819
Opened 24 years ago
Closed 24 years ago
Non-Latin 1 anchors don't work
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
People
(Reporter: dbaron, Assigned: nhottanscp)
References
Details
(Keywords: intl, Whiteboard: [rtm-] [r=erik])
Attachments
(9 files)
2.55 KB,
patch
|
Details | Diff | Splinter Review | |
3.08 KB,
patch
|
Details | Diff | Splinter Review | |
3.15 KB,
patch
|
Details | Diff | Splinter Review | |
3.71 KB,
patch
|
Details | Diff | Splinter Review | |
3.70 KB,
patch
|
Details | Diff | Splinter Review | |
3.32 KB,
patch
|
Details | Diff | Splinter Review | |
3.87 KB,
patch
|
Details | Diff | Splinter Review | |
1.32 KB,
text/html
|
Details | |
3.91 KB,
patch
|
Details | Diff | Splinter Review |
DESCRIPTION: Named anchors with cyrillic text aren't working. I suspect the problem has something to do with URL escaping -- perhaps we need to unescape the URL when we go to the anchor? We should also be careful to check the spec for correct behavior. This was pointed out on bug 56285 by Eugene Savitsky. STEPS TO REPRODUCE (from bug 56285): 1. Load http://www.fcenter.ru/wn/hn02082000.htm 2. Select the second link (the one below 3dfx..., which works fine) ACTUAL RESULTS Nothing happens, only the URLbar updates to http://www.fcenter.ru/wn/hn02082000.htm#%E1%E5%EB%E0%FF EXPECTED RESULTS: * page scrolls down DOES NOT WORK CORRECTLY ON: * Linux, mozilla, opt build of 2000-11-01 tree closure
Assignee | ||
Comment 1•24 years ago
|
||
I can reproduce this when I set character coding to windows-1251. But it works if character coding is ISO-8859-1. It might be that charset conversion (document charset to unicode) is missing after url escape. That could be the reason it works for only ISO-8859-1. But I have no idea where the escaping is happening for anchors, I mean what source file, I need help. Cc to i18n people, this might be more generic issue (e.g. Japaense anchor name).
Comment 2•24 years ago
|
||
As far as I can tell, this is a generic issue. Currently, no anchors are working unless it is interpresed as ISO-8859-1. Japanese anchors don't work, either. I changed the summary to reflect the general nature of the problem. This is a big problem and it will break many web pages.
Summary: problems going to cyrillic anchors → Non-Latin 1 anchors don't work
Comment 4•24 years ago
|
||
I'm trying to quantify what percentage of web sites are using non-ASCII anchors. My initial rough estimate is 20% of all non-ASCII sites and 10% of all web sites. Please correct me with solid info if anyone has it.
Comment 5•24 years ago
|
||
CC'ing a bunch of people whose names appear on the other bug.
Comment 6•24 years ago
|
||
Bobj/Msanz - Whatare your thoughts on this bug? Would you consider it a stop-ship bug? Remember, the CORE team is at a point of RC1, right now.
Assignee | ||
Comment 7•24 years ago
|
||
In the code, Latin1 is assumed(line 3753). We should convert the enescaped string from document charset to unicode. nsDocShell.cpp nsDocShell::ScrollIfAnchor 3748 char *str = sNewRef.ToNewCString(); 3749 3750 // nsUnescape modifies the string that is passed into it. 3751 nsUnescape(str); 3752 3753 sNewRef.AssignWithConversion(str); 3754 3755 nsMemory::Free(str); 3756 3757 rv = shell->GoToAnchor(sNewRef);
Assignee | ||
Comment 8•24 years ago
|
||
I don't know how to get a document charset in docShell. Cata, do you know anything about that?
Comment 10•24 years ago
|
||
We believe non-latin1 anchors are widely used and this would break many pages.
Whiteboard: [rtm need info]
Assignee | ||
Comment 11•24 years ago
|
||
Assignee | ||
Comment 12•24 years ago
|
||
The patch works with the Russian page. Is there Japanese data that I can test?
OS: Linux → All
Hardware: PC → All
Assignee | ||
Comment 13•24 years ago
|
||
There is a description in the HTML spec. B.2.1 Non-ASCII characters in URI attribute values http://www.w3.org/TR/html4/appendix/notes.html#idx-anchor
Assignee | ||
Comment 14•24 years ago
|
||
Corrent link: http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1
Assignee | ||
Comment 15•24 years ago
|
||
Comment 16•24 years ago
|
||
Thanks for adding the UTF-8 conversion code. There are many places in that file where they check mContentViewer before using it. Please do the same in your patch.
Assignee | ||
Comment 17•24 years ago
|
||
Assignee | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 19•24 years ago
|
||
The patch #3 convers the anchors in the same document. There are other cases also not working for non ASCII names. Those are the url bar, open web location and referenced from othter documents. nsHTMLContentSink.cpp assumes the string as Latin1 by using AssignWithConversion for the input url string (line 3514). 3507 rv = mDocumentURI->QueryInterface(NS_GET_IID(nsIURL), (void**)&url); 3508 if (NS_SUCCEEDED(rv)) { 3509 rv = url->GetRef(&ref); 3510 NS_RELEASE(url); 3511 } 3512 if (rv == NS_OK) { 3513 mRef = new nsString; 3514 mRef->AssignWithConversion(ref); 3515 nsCRT::free(ref); 3516 }
Assignee | ||
Comment 20•24 years ago
|
||
Assignee | ||
Comment 21•24 years ago
|
||
*** Bug 58980 has been marked as a duplicate of this bug. ***
Comment 22•24 years ago
|
||
I quickly glanced at the patch and noticed that there's "new nsString" in there without a null check after the new call, that needs to be fixed. Also, do we need two string members for the ref in the content sink, couldn't mRef be replaced with mRefInDocCharset?
Comment 23•24 years ago
|
||
sr=waterson for patch#3.
Assignee | ||
Comment 24•24 years ago
|
||
Johnny, I tried to minimized changes for the existing code (to make ASCII case unchanged). I think mRefInDocCharset can replace mRef but I don't do that change now. I am going to add the null check for "new nsString".
Assignee | ||
Comment 25•24 years ago
|
||
Assignee | ||
Comment 26•24 years ago
|
||
Comment 27•24 years ago
|
||
PDT marking [rtm-] because this bug doesn't crash or lose data, and we're out of time to shake out a big patch.
Whiteboard: [rtm need info] [r=erik] → [rtm-] [r=erik]
Assignee | ||
Updated•24 years ago
|
Target Milestone: --- → M22
Comment 28•24 years ago
|
||
Are we fixing this for the trunk?
Assignee | ||
Comment 29•24 years ago
|
||
Assignee | ||
Comment 30•24 years ago
|
||
Erik, please review patch #6 & #7. Patch #6, almost same as patch #4, I just added an empty string change. Patch #7, this is for the case reference outside the document. Basically doing the same things as nsDocShell.cpp patch (try UTF-8 first then document charset).
Comment 31•24 years ago
|
||
Netscape Keyword List page at Netcenter (at the above URL) or from the drop down list of a Japanese NS6 build uses Japanese anchors and they are failing. (Added Lynn W.)
Comment 32•24 years ago
|
||
Yes. I reported this some time ago to Chau in Bugsplat. I am still waiting for her to fix the page.
Assignee | ||
Comment 33•24 years ago
|
||
I talked to erik about patch #6. Since the additional change assumes Latin1 as an input for special case, it is not following the spec. So we decided to use patch #4 (got r and sr already). About patch #7, for external anchor, I will make sure this is consistent with related bug fix (bug 10373) before I get reviews and check in.
Comment 34•24 years ago
|
||
Please remove the redundant mRef... member from the sink before checking in if you're using patch #4...
Assignee | ||
Comment 35•24 years ago
|
||
>So we decided to use patch #4 (got r and sr already).
My mistake, I meant patch #3.
Patch #4 will not be used (I rewrote it as patch #7).
Assignee | ||
Comment 36•24 years ago
|
||
Checked in patch #3 (fix for internal document case) to the trunk. I tested patch #7 (fix for external anchor) with Japanese path name (used test data of bug 10373), like this. <a href="http://mugen/url/ÆüËܸì/ja_anchors.html#´Á»ú">ÆüËܸì</a> It opens the correct file and scrolls to the anchor correctly.
Comment 37•24 years ago
|
||
What is the problem here? http://www.alion.ru/tech/slovar_cdrw.shtml The page is on russian, but the anchors are latin. When pressing "? ??????????" it does not takes tou to the top...
Assignee | ||
Comment 38•24 years ago
|
||
In that page, the name is "#top", <h6><a name="#top">[</a> It works after I changed href to "##top", <a href="##top">? ??????????</a> The spec say NAME must begin with a letter ([A-Za-z]) http://www.w3.org/TR/html4/types.html#type-name But the link work with 4.x. I am not sure we want to support this in mozilla.
Assignee | ||
Comment 39•24 years ago
|
||
Sorry, I mentioned a wrong place in the spec. http://www.w3.org/TR/html4/struct/links.html#adef-name-A
Comment 40•24 years ago
|
||
I have removed the live Netcenter test case -- to my chagrin, they fixed the problem by changing Japanese anchors to ASCII ones.
Assignee | ||
Comment 41•24 years ago
|
||
Assignee | ||
Comment 42•24 years ago
|
||
Comment 43•24 years ago
|
||
r=jst for patch #8
Assignee | ||
Comment 44•24 years ago
|
||
Fixed in the trunk.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 45•24 years ago
|
||
I verified this in 2000-12-20-10 Mac Mtrunk, 2000-12-19-04 Win32 Mtrunk, and 2000-12-19-08 Linux Mtrunk builds.
Assignee | ||
Comment 46•24 years ago
|
||
*** Bug 52844 has been marked as a duplicate of this bug. ***
Comment 47•23 years ago
|
||
*** Bug 62933 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•