Closed Bug 92314 Opened 24 years ago Closed 23 years ago

Ja search engine title garbled on sidebar

Tracking

(Not tracked)

Status:

VERIFIED FIXED

Milestone:

mozilla1.0.1

People

(Reporter: yinglinxia, Assigned: nhottanscp)

References

Details

(Keywords: intl, Whiteboard: [adt2 RTM] [ETA 05/31] [Needs a=])

Attachments

(5 files, 6 obsolete files)

a test file, Netscape_Japan.src, with the title localized in Japanese 24 years ago Ying-Lin Xia 1.91 KB, text/plain		Details
a screehshot with the test src file above 24 years ago Ying-Lin Xia 84.00 KB, image/jpeg		Details
Netscape_Japan.src (same as the first attacmenet, but encoded in UTF-8) 23 years ago Frank Tang 1.92 KB, text/plain		Details
Making ftang's Netscape_Japan.src (UTF-8 encoded) to work. 23 years ago Roy Yokoyama 726 bytes, patch		Details \| Diff \| Splinter Review
Decode name, description attributes 23 years ago Roy Yokoyama 3.97 KB, patch		Details \| Diff \| Splinter Review
revised 23 years ago Roy Yokoyama 4.54 KB, patch	nhottanscp : review+	Details \| Diff \| Splinter Review
Use of queryCharset in SEARCH section and remove the null terminate 23 years ago Roy Yokoyama 4.42 KB, patch	tetsuroy : review+	Details \| Diff \| Splinter Review
FreshEye.src ship with MacOS X 23 years ago Frank Tang 786 bytes, text/plain		Details
Changed to use "sourceTextEncoding" which contains a Macintosh script code. 23 years ago nhottanscp 7.89 KB, patch	mozilla : review+	Details \| Diff \| Splinter Review
Changed formatting. 23 years ago nhottanscp 7.72 KB, patch		Details \| Diff \| Splinter Review
Changed to reduce runtime allocation and copy for strings. 23 years ago nhottanscp 8.79 KB, patch	alecf : superreview+ jesup : approval+	Details \| Diff \| Splinter Review

Ying-Lin Xia

Reporter

Description

•

24 years ago

If localize the search engine title in the searchplugins\xxx.src file, for example Japanese, the title will be garbled on the sidebar. The src file should use Shift-JIS for Japanese, but I also tested UTF-8 and unicode, none of them works.

Ying-Lin Xia

Reporter

Comment 1

•

24 years ago

Attached file a test file, Netscape_Japan.src, with the title localized in Japanese — Details

Ying-Lin Xia

Reporter

Comment 2

•

24 years ago

Attached image a screehshot with the test src file above — Details

Lynn Webb

Comment 3

•

24 years ago

Fortunately, this does not affect Latin-1 characters. I see that a Spanish sherlock file title "Netscape Search en Español" and it looks fine in browser (entered directly without any special encoding).

Status: UNCONFIRMED → NEW

Ever confirmed: true

Ying-Lin Xia

Reporter

Comment 4

•

24 years ago

Thanks for the information, Lynn. Changing summary.

Summary: Search engine title can not be localized → Ja search engine title garbled on sidebar

Peter Trudelle

Comment 5

•

24 years ago

->search, 0.9.7

Assignee: matt → sgehani

Component: Sidebar → Search

QA Contact: sujay → claudius

Target Milestone: --- → mozilla0.9.7

Samir Gehani

Comment 6

•

24 years ago

Moving to mozilla0.9.8.

Target Milestone: mozilla0.9.7 → mozilla0.9.8

Samir Gehani

Updated

•

24 years ago

Target Milestone: mozilla0.9.8 → mozilla0.9.9

Peter Trudelle

Comment 7

•

24 years ago

-> mozilla1.0, nsbeta1+

Keywords: nsbeta1+

Priority: -- → P3

Target Milestone: mozilla0.9.9 → mozilla1.0

Peter Trudelle

Comment 8

•

23 years ago

Reassigning 'several bugs at once' to Steve Morse, to level the load better across the team.

Assignee: sgehani → morse

Peter Trudelle

Comment 9

•

23 years ago

ADT2 per ADT triage team

Whiteboard: [adt2]

Frank Tang

Updated

•

23 years ago

Keywords: intl

Frank Tang

Comment 10

•

23 years ago

The problem is in the nsInternetSearchService.cpp look at line 4596 of function OnDataAvailable The decoder is probably null so we call context->AppendBytes instead of decoder->Convert + context->AppendUnicodeBytes to process the data. which nsIInternetSearchContext is used for "name" and "description" ? should it have a "UTF-8" converter ?

Frank Tang

Comment 11

•

23 years ago

Attached file Netscape_Japan.src (same as the first attacmenet, but encoded in UTF-8) — Details

still does not work even with UTF-8

Roy Yokoyama

Comment 12

•

23 years ago

This sounds very similar to http://bugzilla.mozilla.org/show_bug.cgi?id=118179 I'll see what I can do after fixing 118179

Assignee: morse → yokoyama

Frank Tang

Comment 13

•

23 years ago

roy: the code is very different the code should be in xpfe/components/search/src/nsInternetSearchService.cpp

Roy Yokoyama

Comment 14

•

23 years ago

Problem is at http://lxr.mozilla.org/seamonkey/source/xpfe/components/search/src/nsInternetSearchService.cpp#4176 InternetSearchDataSource::ReadFileContents() { ... sourceContents.AssignWithConversion(contents, contentsLen); <= HERE! ... } Basicall, here is what's happening 1) Read the search file content as char * and stuff it to nsString by calling AssignWithConversion() which, in this case, creates a "0x00 prefixed Shift-JIS" UCS2 disguised string. 2) By the time we read attribes (name, description, etc) http://lxr.mozilla.org/seamonkey/source/xpfe/components/search/src/nsInternetSearchService.cpp#4235, the data is already corrupted.

Status: NEW → ASSIGNED

Roy Yokoyama

Comment 15

•

23 years ago

Attached patch Making ftang's Netscape_Japan.src (UTF-8 encoded) to work. (obsolete) — Details — Splinter Review

In search of Sherlock file encoding specification http://developer.apple.com/technotes/tn/tn1141.html I failed to find an appropriate encoding for the file. Attributes are defined as : "name" This is a human-readable name for the search plug-in. Which encoding is _human-readable name_? Can we assume the file will be encoded in UTF-8?

Roy Yokoyama

Comment 16

•

23 years ago

ftang: can you comment? if it's ok, then can you give /r=?

Roy Yokoyama

Comment 17

•

23 years ago

Attached patch Decode name, description attributes (obsolete) — Details — Splinter Review

I was thinking of Decoding at InternetSearchDataSource::ReadFileContents(); but it will have more performance impact. Instead, I call InternetSearchDataSource::DecodeData() when it is asked.

Attachment #80713 - Attachment is obsolete: true

Roy Yokoyama

Comment 18

•

23 years ago

nhotta: I can't find an owner of internet-search. I found a doc in mozilla.org you wrote 2 yrs ago (http://www.mozilla.org/projects/intl/internet_search_charset_spec.html) Can you review or recommend someone else?

Roy Yokoyama

Comment 19

•

23 years ago

Humm. nhotta's doc says something about queryCharset in SEARCH section. queryCharset - a charset name string for server queries, specify in "SEARCH" section Attached test case didn't contain queryCharset http://bugzilla.mozilla.org/attachment.cgi?id=79522&action=view My patch using charset in "INTERPRET" section. Is it wrong?

nhottanscp

Assignee

Comment 20

•

23 years ago

The charset in interpret section is used for the search result from the server while the query charset is for a charset of the query by the client. So please use the query charset for now (because currently no way to specify a charset of the search .src). I am going to look at the patch.

nhottanscp

Assignee

Comment 21

•

23 years ago

* You can use const PRUnichar* to pass a charset and PRUnichar** for the converted result. See my comments below. +InternetSearchDataSource::DecodeData(const nsString &aCharset, nsString &aValue) * Please use nsICharsetConverterManager2. Because the charset is from outside, you need to resolve charset alias. nsICharsetConverterManager2::GetCharsetAtom does that work (and it takes const const PRUnichar* instead of nsString&) + nsCOMPtr<nsICharsetConverterManager> charsetConv = * Please add a comment, this is needed to restore the original multibyte but I think needs comment to explain what is going on. + char *value = ToNewCString(aValue); * Please change to use strlen. + PRInt32 srcLength = nsCRT::strlen(NS_REINTERPRET_CAST(const char *, value)); * I think the function can use PRUnichar ** and avoid to duplicate the buffer in the function. I see the caller is depending on nsString but the caller can allocate nsString from the returned buffer. + aValue.Assign(unicodeData, outUnicodeLen); + nsMemory::Free(unicodeData);

Roy Yokoyama

Comment 22

•

23 years ago

Attached patch revised (obsolete) — Details — Splinter Review

It looks like nsAWritableString has depreciated. nhotta: can you review?

Attachment #80879 - Attachment is obsolete: true

nhottanscp

Assignee

Comment 23

•

23 years ago

Comment on attachment 81095 [details] [diff] [review] revised Please change "interpret" to the constant for query charset. This is not needed, the next line you assign with a length specified. + (unicodeData)[outUnicodeLen] = '\0'; // null terminate. Convert() doesn't do it for us Change those then r=nhotta

Attachment #81095 - Flags: review+

Roy Yokoyama

Comment 24

•

23 years ago

Attached patch Use of queryCharset in SEARCH section and remove the null terminate (obsolete) — Details — Splinter Review

This is to use the queryCharset defined in http://www.mozilla.org/projects/intl/internet_search_charset_spec.html For example: <SEARCH name="Netscape <Japanese text in Shift-JIS>" description = "<Japanese text in Shift-JIS>" queryCharset="Shift_JIS" ............................ >

Attachment #81095 - Attachment is obsolete: true

Roy Yokoyama

Comment 25

•

23 years ago

Comment on attachment 81220 [details] [diff] [review] Use of queryCharset in SEARCH section and remove the null terminate Commet #23 r=nhotta

Attachment #81220 - Flags: review+

Chris Waterson

Comment 26

•

23 years ago

Why can't we just apply the converter at the time we read the file?

Roy Yokoyama

Comment 27

•

23 years ago

It's simply the performance issue. Please see Comment #17 I could call rv = GetData(dataUni, "search", 0, "queryCharset", charsetValue) right after reading the file and then search for "name" and "description" and decode them ; but we decided to call the converter only when the name and description attriubtes are required.

Roy Yokoyama

Comment 28

•

23 years ago

or do you mean to decode everything in the file after reading?

Alec Flett

Comment 29

•

23 years ago

Comment on attachment 81220 [details] [diff] [review] Use of queryCharset in SEARCH section and remove the null terminate ugh, I wish the file was just stored in UTF8 and then we could just use the UTF8ConverterStream already built into XPCOM, and the data would be decoded as it is read in. That said, there's no need for char *value = ToNewCString(PromiseFlatString(aValue)); you should just say char *value = ToNewCString(aValue) (i.e. the PromiseFlatString is unnecessary and might make an extra copy of the string) However, even that is lame. You should really use NS_LossyConvertUCS2toASCII value(aValue), then you won't have to call strlen(value), nor free up the string when you're done. on the trunk, we eventually need to fix nsICharsetConverter2 to start taking nsAString&/nsACString& so we can avoid all this extra copying and allocating. but in the mean time, looking at this you could clean this up to avoid the extra copying by changing DecodeData() to return a PRUnichar**, and return the buffer you allocated. Then using Adopt() to make descValue take ownership of the newly allocated string. it would look something like: >+ nsAutoString charsetValue; >+ if (NS_SUCCEEDED(rv = GetData(dataUni, "search", 0, "queryCharset", charsetValue)) && !charsetValue.IsEmpty()) >+ { >+ PRUnichar *decodedValue; >+ rv = DecodeData(charsetValue.get(), &decodedValue); >+ if (NS_SUCCEEDED(rv)) >+ descValue.Adopt(decodedValue); >+ }

Attachment #81220 - Flags: needs-work+

Alec Flett

Comment 30

•

23 years ago

I think what waterson means is to decode the file into unicode as you read it out of the file.. so you're dealing with a correctly-decoded unicode buffer, to begin with. However, I'm not quite sure how we'd accomplish this since the charset name is actually stored in the file itself. That's why I was hoping for UTF8. Roy: Are there already files like this that we need to support (i.e. that went out in previous releases), or is there any chance we can just fix our own versions of these files in CVS to be UTF8, and then use the UTF8ConverterStream, which assumes the file is in UTF8?

Roy Yokoyama

Comment 31

•

23 years ago

Ya, my first patch (04/24/02 16:46) http://bugzilla.mozilla.org/attachment.cgi?id=80713&action=view was exactly that. (assume the file is encoding in UTF-8) >Roy: Are there already files like this that we need to support I think tao or ying can answer your question; but ftang and I looked at a sherlock file in Macintosh-Ja and it was encoded in Shift-JIS.

Alec Flett

Comment 32

•

23 years ago

but it never worked before, right? I mean, if this bug has always existed, then we won't be breaking anything that used to work. All of our other data files (.properties, .xml, .dtd, .rdf, etc) are all in UTF8, it seems odd that we'd decide not to use UTF8 in this case.

tao

Comment 33

•

23 years ago

Sherlock files use native encoding :-\. However, like xml parser, the browser should use the embedded charset info to decode the data.

Frank Tang

Comment 34

•

23 years ago

Attached file FreshEye.src ship with MacOS X — Details

Notice it does not have a queryCharset but a queryEncoding="2561" Please make sure this also work. This file is ship with MacOS X. (in "Users:ftang:Library:Internet Search Sites:Internet", replace 'ftang' with your login name of course) notice the name is encoded in Shift_JIS instead of UTF-8

Alec Flett

Comment 35

•

23 years ago

ah! frank that's the best reason yet to give up on the UTF8 idea :) so this is completely outside of our control - 3rd parties are already distributing non-UTF8 files, and these files are shared between apps. That said, we should definitely do what frank said, and support queryEncoding as well.

Roy Yokoyama

Comment 36

•

23 years ago

nhotta,ftang and I did little digging how Apple handle the encoding of Sherlock. It looks like they are using sourceTextEncoding field(not queryEncoding) to identify the file encoding and if it's not defined, then it falls back to "x-mac-roman" sourceTextEncoding is using the Script Codes. I am going to add a mapScriptCode table or modify the table in ::MapEncoding() to support this field.

Roy Yokoyama

Updated

•

23 years ago

Blocks: 141008

Frank Tang

Comment 37

•

23 years ago

I need to ask yokoyama to focus on bug 141513 now. nhotta- can you take over this bug?

Assignee: yokoyama → nhotta

Status: ASSIGNED → NEW

nhottanscp

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

nhottanscp

Assignee

Comment 38

•

23 years ago

Attached patch Changed to use "sourceTextEncoding" which contains a Macintosh script code. (obsolete) — Details — Splinter Review

Attachment #81220 - Attachment is obsolete: true

Robert John Churchill

Comment 39

•

23 years ago

Comment on attachment 83593 [details] [diff] [review] Changed to use "sourceTextEncoding" which contains a Macintosh script code. r=rjc (Please be sure to follow tabbed indentation formatting used in file.)

Attachment #83593 - Flags: review+

nhottanscp

Assignee

Comment 40

•

23 years ago

Attached patch Changed formatting. (obsolete) — Details — Splinter Review

Attachment #83593 - Attachment is obsolete: true

nhottanscp

Assignee

Comment 41

•

23 years ago

Alec, could you super review?

Alec Flett

Comment 42

•

23 years ago

Comment on attachment 83975 [details] [diff] [review] Changed formatting. Why does MapScriptCodeToCharsetName need to allocate its result? It seems like the whole function could just return a const char* - since the strings are just static strings. Also, I think scriptList[] could be "static" as well.. I also wonder if it might be even better to make MapScriptCodeToCharsetName take a const PRUnichar* as a parameter, then you could avoid the excess copying/conversion... you could declare scriptList[] like static struct scripts scriptList[] = { { { '0', '\0' }, "x-mac-roman" }, ..etc it will be annoying to manually expand all that unicode, but once you've done it once, then we never ever have to do it at runtime! :) and actually, since really what you want is a charset in PRUnichar, you should make DecodeData take a PRUnichar* as the charset, and let the consumers of that function do the conversion finally, does this compiler? the call to MapScriptCodeToCharsetName seems to be lacking a 2nd parameter, the line ends with a ","? overall this looks fine, I just think we can simplify the code a lot by doing less conversions

Attachment #83975 - Flags: needs-work+

nhottanscp

Assignee

Comment 43

•

23 years ago

Attached patch Changed to reduce runtime allocation and copy for strings. — Details — Splinter Review

Attachment #83975 - Attachment is obsolete: true

nhottanscp

Assignee

Comment 44

•

23 years ago

Alec, please review the new patch, thanks.

Alec Flett

Comment 45

•

23 years ago

Comment on attachment 84102 [details] [diff] [review] Changed to reduce runtime allocation and copy for strings. oh, wow! thanks for going the extra mile and putting in the unicode names too. sr=alecf

Attachment #84102 - Flags: superreview+

nhottanscp

Assignee

Comment 46

•

23 years ago

checked in to the trunk

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Frank Tang

Updated

•

23 years ago

Keywords: adt1.0.0, approval

nhottanscp

Assignee

Comment 47

•

23 years ago

Yuying, could you verify this on the trunk? You can use FreshEye data attached to the bug.

Yuying Long

Comment 48

•

23 years ago

Verified it works fine on 05-20 trunk build / WinME-JA by applied the FreshEye.src in comment #34.

Status: RESOLVED → VERIFIED

Ying-Lin Xia

Reporter

Comment 49

•

23 years ago

Just want to make sure, does it also work on the location bar? You know, if you set the search engine as default, when you type something on the URL location bar, there will be a dropdown string says "Search xxxx for xxxx".

Jaime Rodriguez, Jr.

Updated

•

23 years ago

Blocks: 143047

Whiteboard: [adt2] → [adt2 RTM] [ETA 05/31] [Needs a=]

Jaime Rodriguez, Jr.

Comment 50

•

23 years ago

adt1.0.0+ (on ADT's behalf) for approval to checkin to the 1.0 branch,pending Driver's approval. After, checking in, please add the fixed1.0 keyword.

Keywords: adt1.0.0 → adt1.0.0+

Roy Yokoyama

Comment 51

•

23 years ago

>Just want to make sure, does it also work on the location bar? You know, if you >set the search engine as default, when you type something on the URL location >bar, there will be a dropdown string says "Search xxxx for xxxx" I am not sure if URL location bar will get fixed; but I know it will get Preference/Internet Search fixed as well. ylong?

nhottanscp

Assignee

Comment 52

•

23 years ago

>>Just want to make sure, does it also work on the location bar? Yes, fixed on the trunk.

scottputterman

Comment 53

•

23 years ago

Adding adt1.0.1+ for adt approval for branch checkin for the 1.0.1 milestone. Please get drivers approval before checking in.

Keywords: adt1.0.0+ → adt1.0.1+

Frank Tang

Updated

•

23 years ago

Keywords: mozilla1.0.1

Randell Jesup [:jesup] (needinfo me)

Updated

•

23 years ago

Attachment #84102 - Flags: approval+

Randell Jesup [:jesup] (needinfo me)

Comment 54

•

23 years ago

Comment on attachment 84102 [details] [diff] [review] Changed to reduce runtime allocation and copy for strings. please check into the 1.0.1 branch ASAP. once landed remove the mozilla1.0.1+ keyword and add the fixed1.0.1 keyword

Randell Jesup [:jesup] (needinfo me)

Updated

•

23 years ago

Keywords: approval, mozilla1.0.1 → mozilla1.0.1+

Target Milestone: mozilla1.0 → mozilla1.0.1

Roy Yokoyama

Comment 55

•

23 years ago

fix checked in to the 1.0 branch

Keywords: mozilla1.0.1+ → fixed1.0.1

Frank Tang

Updated

•

23 years ago

Blocks: 146292
No longer blocks: 141008

Ying-Lin Xia

Reporter

Comment 56

•

23 years ago

I'm making a new JA build now, and the Japanese search engine title is showing properly! Just want to thank you guys again to make it happens. Yeah!!!

Claudius Gayle

Comment 57

•

23 years ago

VERIFIED Fixed with 20020719 branch builds

Keywords: fixed1.0.1 → verified1.0.1

Nobody; OK to take it and work on it

Updated

•

17 years ago

Product: Core → SeaMonkey

You need to log in before you can comment on or make changes to this bug.