Closed Bug 58819 Opened 25 years ago Closed 25 years ago

Non-Latin 1 anchors don't work

Categories

(Core :: Internationalization, defect, P3)

Product:

Component:

Type:

defect

Priority:

P3

Severity:

normal

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: dbaron, Assigned: nhottanscp)

References

Details

(Keywords: intl, Whiteboard: [rtm-] [r=erik])

Attachments

(9 files)

patch 25 years ago nhottanscp 2.55 KB, patch		Details \| Diff \| Splinter Review
patch #2, changed to assume UTF-8 instead of Latin1 25 years ago nhottanscp 3.08 KB, patch		Details \| Diff \| Splinter Review
patch #3, added to check mContentViewer before using it 25 years ago nhottanscp 3.15 KB, patch		Details \| Diff \| Splinter Review
patch #4, change for nsHTMLContentSink.cpp 25 years ago nhottanscp 3.71 KB, patch		Details \| Diff \| Splinter Review
patch#5, for nsHTMLContentSink.cpp, added more error checking 25 years ago nhottanscp 3.70 KB, patch		Details \| Diff \| Splinter Review
patch #6, for nsDocShell.cpp, added an error check 25 years ago nhottanscp 3.32 KB, patch		Details \| Diff \| Splinter Review
patch #7, for nsHTMLContentSink.cpp rewrote by jst's suggestion 25 years ago nhottanscp 3.87 KB, patch		Details \| Diff \| Splinter Review
test cases for link to external anchor name 25 years ago nhottanscp 1.32 KB, text/html		Details
patch #8, for nsHTMLContentSink.cpp, rewritten after reviewed by erik 25 years ago nhottanscp 3.91 KB, patch		Details \| Diff \| Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Description

•

25 years ago

DESCRIPTION: Named anchors with cyrillic text aren't working. I suspect the problem has something to do with URL escaping -- perhaps we need to unescape the URL when we go to the anchor? We should also be careful to check the spec for correct behavior. This was pointed out on bug 56285 by Eugene Savitsky. STEPS TO REPRODUCE (from bug 56285): 1. Load http://www.fcenter.ru/wn/hn02082000.htm 2. Select the second link (the one below 3dfx..., which works fine) ACTUAL RESULTS Nothing happens, only the URLbar updates to http://www.fcenter.ru/wn/hn02082000.htm#%E1%E5%EB%E0%FF EXPECTED RESULTS: * page scrolls down DOES NOT WORK CORRECTLY ON: * Linux, mozilla, opt build of 2000-11-01 tree closure

Assignee

Comment 1

•

25 years ago

I can reproduce this when I set character coding to windows-1251. But it works if character coding is ISO-8859-1. It might be that charset conversion (document charset to unicode) is missing after url escape. That could be the reason it works for only ISO-8859-1. But I have no idea where the escaping is happening for anchors, I mean what source file, I need help. Cc to i18n people, this might be more generic issue (e.g. Japaense anchor name).

Katsuhiko Momoi

Comment 2

•

25 years ago

As far as I can tell, this is a generic issue. Currently, no anchors are working unless it is interpresed as ISO-8859-1. Japanese anchors don't work, either. I changed the summary to reflect the general nature of the problem. This is a big problem and it will break many web pages.

Summary: problems going to cyrillic anchors → Non-Latin 1 anchors don't work

Katsuhiko Momoi

Comment 3

•

25 years ago

Nominating for rtm.

Keywords: rtm

Katsuhiko Momoi

Comment 4

•

25 years ago

I'm trying to quantify what percentage of web sites are using non-ASCII anchors. My initial rough estimate is 20% of all non-ASCII sites and 10% of all web sites. Please correct me with solid info if anyone has it.

Katsuhiko Momoi

Comment 5

•

25 years ago

CC'ing a bunch of people whose names appear on the other bug.

Jaime Rodriguez, Jr.

Comment 6

•

25 years ago

Bobj/Msanz - Whatare your thoughts on this bug? Would you consider it a stop-ship bug? Remember, the CORE team is at a point of RC1, right now.

Assignee

Comment 7

•

25 years ago

In the code, Latin1 is assumed(line 3753). We should convert the enescaped string from document charset to unicode. nsDocShell.cpp nsDocShell::ScrollIfAnchor 3748 char *str = sNewRef.ToNewCString(); 3749 3750 // nsUnescape modifies the string that is passed into it. 3751 nsUnescape(str); 3752 3753 sNewRef.AssignWithConversion(str); 3754 3755 nsMemory::Free(str); 3756 3757 rv = shell->GoToAnchor(sNewRef);

Assignee

Comment 8

•

25 years ago

I don't know how to get a document charset in docShell. Cata, do you know anything about that?

Comment 9

•

25 years ago

Bob, can you mark this [rtm need info]?

Comment 10

•

25 years ago

We believe non-latin1 anchors are widely used and this would break many pages.

Whiteboard: [rtm need info]

Assignee

Comment 11

•

25 years ago

Attached patch patch — Details — Splinter Review

Assignee

Comment 12

•

25 years ago

The patch works with the Russian page. Is there Japanese data that I can test?

OS: Linux → All

Hardware: PC → All

Assignee

Comment 13

•

25 years ago

There is a description in the HTML spec. B.2.1 Non-ASCII characters in URI attribute values http://www.w3.org/TR/html4/appendix/notes.html#idx-anchor

Assignee

Comment 14

•

25 years ago

Corrent link: http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1

Assignee

Comment 15

•

25 years ago

Attached patch patch #2, changed to assume UTF-8 instead of Latin1 — Details — Splinter Review

Erik van der Poel

Comment 16

•

25 years ago

Thanks for adding the UTF-8 conversion code. There are many places in that file where they check mContentViewer before using it. Please do the same in your patch.

Assignee

Comment 17

•

25 years ago

Attached patch patch #3, added to check mContentViewer before using it — Details — Splinter Review

Erik van der Poel

Comment 18

•

25 years ago

Added [r=erik].

Whiteboard: [rtm need info] → [rtm need info] [r=erik]

Assignee

Updated

•

25 years ago

Status: NEW → ASSIGNED

Assignee

Updated

•

25 years ago

Keywords: 4xp

Assignee

Comment 19

•

25 years ago

The patch #3 convers the anchors in the same document. There are other cases also not working for non ASCII names. Those are the url bar, open web location and referenced from othter documents. nsHTMLContentSink.cpp assumes the string as Latin1 by using AssignWithConversion for the input url string (line 3514). 3507 rv = mDocumentURI->QueryInterface(NS_GET_IID(nsIURL), (void**)&url); 3508 if (NS_SUCCEEDED(rv)) { 3509 rv = url->GetRef(&ref); 3510 NS_RELEASE(url); 3511 } 3512 if (rv == NS_OK) { 3513 mRef = new nsString; 3514 mRef->AssignWithConversion(ref); 3515 nsCRT::free(ref); 3516 }

Assignee

Comment 20

•

25 years ago

Attached patch patch #4, change for nsHTMLContentSink.cpp — Details — Splinter Review

Assignee

Comment 21

•

25 years ago

*** Bug 58980 has been marked as a duplicate of this bug. ***

Johnny Stenback (:jst)

Comment 22

•

25 years ago

I quickly glanced at the patch and noticed that there's "new nsString" in there without a null check after the new call, that needs to be fixed. Also, do we need two string members for the ref in the content sink, couldn't mRef be replaced with mRefInDocCharset?

Comment 23

•

25 years ago

sr=waterson for patch#3.

Assignee

Comment 24

•

25 years ago

Johnny, I tried to minimized changes for the existing code (to make ASCII case unchanged). I think mRefInDocCharset can replace mRef but I don't do that change now. I am going to add the null check for "new nsString".

Assignee

Comment 25

•

25 years ago

Attached patch patch#5, for nsHTMLContentSink.cpp, added more error checking — Details — Splinter Review

Assignee

Comment 26

•

25 years ago

Attached patch patch #6, for nsDocShell.cpp, added an error check — Details — Splinter Review

Comment 27

•

25 years ago

PDT marking [rtm-] because this bug doesn't crash or lose data, and we're out of time to shake out a big patch.

Whiteboard: [rtm need info] [r=erik] → [rtm-] [r=erik]

Assignee

Updated

•

25 years ago

Target Milestone: --- → M22

Comment 28

•

25 years ago

Are we fixing this for the trunk?

Assignee

Comment 29

•

25 years ago

Attached patch patch #7, for nsHTMLContentSink.cpp rewrote by jst's suggestion — Details — Splinter Review

Assignee

Comment 30

•

25 years ago

Erik, please review patch #6 & #7. Patch #6, almost same as patch #4, I just added an empty string change. Patch #7, this is for the case reference outside the document. Basically doing the same things as nsDocShell.cpp patch (try UTF-8 first then document charset).

Katsuhiko Momoi

Comment 31

•

25 years ago

Netscape Keyword List page at Netcenter (at the above URL) or from the drop down list of a Japanese NS6 build uses Japanese anchors and they are failing. (Added Lynn W.)

URL: http://home.jp.netscape.com/ja/escape...

Comment 32

•

25 years ago

Yes. I reported this some time ago to Chau in Bugsplat. I am still waiting for her to fix the page.

Assignee

Comment 33

•

25 years ago

I talked to erik about patch #6. Since the additional change assumes Latin1 as an input for special case, it is not following the spec. So we decided to use patch #4 (got r and sr already). About patch #7, for external anchor, I will make sure this is consistent with related bug fix (bug 10373) before I get reviews and check in.

Johnny Stenback (:jst)

Comment 34

•

25 years ago

Please remove the redundant mRef... member from the sink before checking in if you're using patch #4...

Assignee

Comment 35

•

25 years ago

>So we decided to use patch #4 (got r and sr already). My mistake, I meant patch #3. Patch #4 will not be used (I rewrote it as patch #7).

Assignee

Comment 36

•

25 years ago

Checked in patch #3 (fix for internal document case) to the trunk. I tested patch #7 (fix for external anchor) with Japanese path name (used test data of bug 10373), like this. <a href="http://mugen/url/ÆüËÜ¸ì/ja_anchors.html#´Á»ú">ÆüËÜ¸ì</a> It opens the correct file and scrolls to the anchor correctly.

Boris Zbarsky [:bzbarsky]

Updated

•

25 years ago

Blocks: 60075

Eugene Savitsky

Comment 37

•

25 years ago

What is the problem here? http://www.alion.ru/tech/slovar_cdrw.shtml The page is on russian, but the anchors are latin. When pressing "? ??????????" it does not takes tou to the top...

Assignee

Comment 38

•

25 years ago

In that page, the name is "#top", <h6><a name="#top">[</a> It works after I changed href to "##top", <a href="##top">? ??????????</a> The spec say NAME must begin with a letter ([A-Za-z]) http://www.w3.org/TR/html4/types.html#type-name But the link work with 4.x. I am not sure we want to support this in mozilla.

Assignee

Comment 39

•

25 years ago

Sorry, I mentioned a wrong place in the spec. http://www.w3.org/TR/html4/struct/links.html#adef-name-A

Katsuhiko Momoi

Comment 40

•

25 years ago

I have removed the live Netcenter test case -- to my chagrin, they fixed the problem by changing Japanese anchors to ASCII ones.

URL: http://home.jp.netscape.com/ja/escape...

Assignee

Comment 41

•

25 years ago

Attached file test cases for link to external anchor name — Details

Assignee

Comment 42

•

25 years ago

Attached patch patch #8, for nsHTMLContentSink.cpp, rewritten after reviewed by erik — Details — Splinter Review

Johnny Stenback (:jst)

Comment 43

•

25 years ago

r=jst for patch #8

Assignee

Comment 44

•

25 years ago

Fixed in the trunk.

Status: ASSIGNED → RESOLVED

Closed: 25 years ago

Resolution: --- → FIXED

Teruko Kobayashi

Comment 45

•

25 years ago

I verified this in 2000-12-20-10 Mac Mtrunk, 2000-12-19-04 Win32 Mtrunk, and 2000-12-19-08 Linux Mtrunk builds.

Status: RESOLVED → VERIFIED

Keywords: intl, nsbeta1

Assignee

Comment 46

•

25 years ago

*** Bug 52844 has been marked as a duplicate of this bug. ***

Updated

•

24 years ago

No longer blocks: 60075

Comment 47

•

24 years ago

*** Bug 62933 has been marked as a duplicate of this bug. ***

You need to log in before you can comment on or make changes to this bug.