Closed Bug 92314 Opened 24 years ago Closed 23 years ago

Ja search engine title garbled on sidebar

Categories

(SeaMonkey :: Search, defect, P3)

defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: yinglinxia, Assigned: nhottanscp)

References

Details

(Keywords: intl, Whiteboard: [adt2 RTM] [ETA 05/31] [Needs a=])

Attachments

(5 files, 6 obsolete files)

If localize the search engine title in the searchplugins\xxx.src file, for example Japanese, the title will be garbled on the sidebar. The src file should use Shift-JIS for Japanese, but I also tested UTF-8 and unicode, none of them works.
Fortunately, this does not affect Latin-1 characters. I see that a Spanish sherlock file title "Netscape Search en Español" and it looks fine in browser (entered directly without any special encoding).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Thanks for the information, Lynn. Changing summary.
Summary: Search engine title can not be localized → Ja search engine title garbled on sidebar
->search, 0.9.7
Assignee: matt → sgehani
Component: Sidebar → Search
QA Contact: sujay → claudius
Target Milestone: --- → mozilla0.9.7
Moving to mozilla0.9.8.
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Target Milestone: mozilla0.9.8 → mozilla0.9.9
-> mozilla1.0, nsbeta1+
Keywords: nsbeta1+
Priority: -- → P3
Target Milestone: mozilla0.9.9 → mozilla1.0
Reassigning 'several bugs at once' to Steve Morse, to level the load better across the team.
Assignee: sgehani → morse
ADT2 per ADT triage team
Whiteboard: [adt2]
Keywords: intl
The problem is in the nsInternetSearchService.cpp look at line 4596 of function OnDataAvailable The decoder is probably null so we call context->AppendBytes instead of decoder->Convert + context->AppendUnicodeBytes to process the data. which nsIInternetSearchContext is used for "name" and "description" ? should it have a "UTF-8" converter ?
still does not work even with UTF-8
This sounds very similar to http://bugzilla.mozilla.org/show_bug.cgi?id=118179 I'll see what I can do after fixing 118179
Assignee: morse → yokoyama
roy: the code is very different the code should be in xpfe/components/search/src/nsInternetSearchService.cpp
Problem is at http://lxr.mozilla.org/seamonkey/source/xpfe/components/search/src/nsInternetSearchService.cpp#4176 InternetSearchDataSource::ReadFileContents() { ... sourceContents.AssignWithConversion(contents, contentsLen); <= HERE! ... } Basicall, here is what's happening 1) Read the search file content as char * and stuff it to nsString by calling AssignWithConversion() which, in this case, creates a "0x00 prefixed Shift-JIS" UCS2 disguised string. 2) By the time we read attribes (name, description, etc) http://lxr.mozilla.org/seamonkey/source/xpfe/components/search/src/nsInternetSearchService.cpp#4235, the data is already corrupted.
Status: NEW → ASSIGNED
In search of Sherlock file encoding specification http://developer.apple.com/technotes/tn/tn1141.html I failed to find an appropriate encoding for the file. Attributes are defined as : "name" This is a human-readable name for the search plug-in. Which encoding is _human-readable name_? Can we assume the file will be encoded in UTF-8?
ftang: can you comment? if it's ok, then can you give /r=?
I was thinking of Decoding at InternetSearchDataSource::ReadFileContents(); but it will have more performance impact. Instead, I call InternetSearchDataSource::DecodeData() when it is asked.
Attachment #80713 - Attachment is obsolete: true
nhotta: I can't find an owner of internet-search. I found a doc in mozilla.org you wrote 2 yrs ago (http://www.mozilla.org/projects/intl/internet_search_charset_spec.html) Can you review or recommend someone else?
Humm. nhotta's doc says something about queryCharset in SEARCH section. queryCharset - a charset name string for server queries, specify in "SEARCH" section Attached test case didn't contain queryCharset http://bugzilla.mozilla.org/attachment.cgi?id=79522&action=view My patch using charset in "INTERPRET" section. Is it wrong?
The charset in interpret section is used for the search result from the server while the query charset is for a charset of the query by the client. So please use the query charset for now (because currently no way to specify a charset of the search .src). I am going to look at the patch.
* You can use const PRUnichar* to pass a charset and PRUnichar** for the converted result. See my comments below. +InternetSearchDataSource::DecodeData(const nsString &aCharset, nsString &aValue) * Please use nsICharsetConverterManager2. Because the charset is from outside, you need to resolve charset alias. nsICharsetConverterManager2::GetCharsetAtom does that work (and it takes const const PRUnichar* instead of nsString&) + nsCOMPtr<nsICharsetConverterManager> charsetConv = * Please add a comment, this is needed to restore the original multibyte but I think needs comment to explain what is going on. + char *value = ToNewCString(aValue); * Please change to use strlen. + PRInt32 srcLength = nsCRT::strlen(NS_REINTERPRET_CAST(const char *, value)); * I think the function can use PRUnichar ** and avoid to duplicate the buffer in the function. I see the caller is depending on nsString but the caller can allocate nsString from the returned buffer. + aValue.Assign(unicodeData, outUnicodeLen); + nsMemory::Free(unicodeData);
Attached patch revised (obsolete) — Splinter Review
It looks like nsAWritableString has depreciated. nhotta: can you review?
Attachment #80879 - Attachment is obsolete: true
Comment on attachment 81095 [details] [diff] [review] revised Please change "interpret" to the constant for query charset. This is not needed, the next line you assign with a length specified. + (unicodeData)[outUnicodeLen] = '\0'; // null terminate. Convert() doesn't do it for us Change those then r=nhotta
Attachment #81095 - Flags: review+
This is to use the queryCharset defined in http://www.mozilla.org/projects/intl/internet_search_charset_spec.html For example: <SEARCH name="Netscape <Japanese text in Shift-JIS>" description = "<Japanese text in Shift-JIS>" queryCharset="Shift_JIS" ............................ >
Attachment #81095 - Attachment is obsolete: true
Comment on attachment 81220 [details] [diff] [review] Use of queryCharset in SEARCH section and remove the null terminate Commet #23 r=nhotta
Attachment #81220 - Flags: review+
Why can't we just apply the converter at the time we read the file?
It's simply the performance issue. Please see Comment #17 I could call rv = GetData(dataUni, "search", 0, "queryCharset", charsetValue) right after reading the file and then search for "name" and "description" and decode them ; but we decided to call the converter only when the name and description attriubtes are required.
or do you mean to decode everything in the file after reading?
Comment on attachment 81220 [details] [diff] [review] Use of queryCharset in SEARCH section and remove the null terminate ugh, I wish the file was just stored in UTF8 and then we could just use the UTF8ConverterStream already built into XPCOM, and the data would be decoded as it is read in. That said, there's no need for char *value = ToNewCString(PromiseFlatString(aValue)); you should just say char *value = ToNewCString(aValue) (i.e. the PromiseFlatString is unnecessary and might make an extra copy of the string) However, even that is lame. You should really use NS_LossyConvertUCS2toASCII value(aValue), then you won't have to call strlen(value), nor free up the string when you're done. on the trunk, we eventually need to fix nsICharsetConverter2 to start taking nsAString&/nsACString& so we can avoid all this extra copying and allocating. but in the mean time, looking at this you could clean this up to avoid the extra copying by changing DecodeData() to return a PRUnichar**, and return the buffer you allocated. Then using Adopt() to make descValue take ownership of the newly allocated string. it would look something like: >+ nsAutoString charsetValue; >+ if (NS_SUCCEEDED(rv = GetData(dataUni, "search", 0, "queryCharset", charsetValue)) && !charsetValue.IsEmpty()) >+ { >+ PRUnichar *decodedValue; >+ rv = DecodeData(charsetValue.get(), &decodedValue); >+ if (NS_SUCCEEDED(rv)) >+ descValue.Adopt(decodedValue); >+ }
Attachment #81220 - Flags: needs-work+
I think what waterson means is to decode the file into unicode as you read it out of the file.. so you're dealing with a correctly-decoded unicode buffer, to begin with. However, I'm not quite sure how we'd accomplish this since the charset name is actually stored in the file itself. That's why I was hoping for UTF8. Roy: Are there already files like this that we need to support (i.e. that went out in previous releases), or is there any chance we can just fix our own versions of these files in CVS to be UTF8, and then use the UTF8ConverterStream, which assumes the file is in UTF8?
Ya, my first patch (04/24/02 16:46) http://bugzilla.mozilla.org/attachment.cgi?id=80713&action=view was exactly that. (assume the file is encoding in UTF-8) >Roy: Are there already files like this that we need to support I think tao or ying can answer your question; but ftang and I looked at a sherlock file in Macintosh-Ja and it was encoded in Shift-JIS.
but it never worked before, right? I mean, if this bug has always existed, then we won't be breaking anything that used to work. All of our other data files (.properties, .xml, .dtd, .rdf, etc) are all in UTF8, it seems odd that we'd decide not to use UTF8 in this case.
Sherlock files use native encoding :-\. However, like xml parser, the browser should use the embedded charset info to decode the data.
Notice it does not have a queryCharset but a queryEncoding="2561" Please make sure this also work. This file is ship with MacOS X. (in "Users:ftang:Library:Internet Search Sites:Internet", replace 'ftang' with your login name of course) notice the name is encoded in Shift_JIS instead of UTF-8
ah! frank that's the best reason yet to give up on the UTF8 idea :) so this is completely outside of our control - 3rd parties are already distributing non-UTF8 files, and these files are shared between apps. That said, we should definitely do what frank said, and support queryEncoding as well.
nhotta,ftang and I did little digging how Apple handle the encoding of Sherlock. It looks like they are using sourceTextEncoding field(not queryEncoding) to identify the file encoding and if it's not defined, then it falls back to "x-mac-roman" sourceTextEncoding is using the Script Codes. I am going to add a mapScriptCode table or modify the table in ::MapEncoding() to support this field.
Blocks: 141008
I need to ask yokoyama to focus on bug 141513 now. nhotta- can you take over this bug?
Assignee: yokoyama → nhotta
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Attachment #81220 - Attachment is obsolete: true
Comment on attachment 83593 [details] [diff] [review] Changed to use "sourceTextEncoding" which contains a Macintosh script code. r=rjc (Please be sure to follow tabbed indentation formatting used in file.)
Attachment #83593 - Flags: review+
Attached patch Changed formatting. (obsolete) — Splinter Review
Attachment #83593 - Attachment is obsolete: true
Alec, could you super review?
Comment on attachment 83975 [details] [diff] [review] Changed formatting. Why does MapScriptCodeToCharsetName need to allocate its result? It seems like the whole function could just return a const char* - since the strings are just static strings. Also, I think scriptList[] could be "static" as well.. I also wonder if it might be even better to make MapScriptCodeToCharsetName take a const PRUnichar* as a parameter, then you could avoid the excess copying/conversion... you could declare scriptList[] like static struct scripts scriptList[] = { { { '0', '\0' }, "x-mac-roman" }, ..etc it will be annoying to manually expand all that unicode, but once you've done it once, then we never ever have to do it at runtime! :) and actually, since really what you want is a charset in PRUnichar, you should make DecodeData take a PRUnichar* as the charset, and let the consumers of that function do the conversion finally, does this compiler? the call to MapScriptCodeToCharsetName seems to be lacking a 2nd parameter, the line ends with a ","? overall this looks fine, I just think we can simplify the code a lot by doing less conversions
Attachment #83975 - Flags: needs-work+
Attachment #83975 - Attachment is obsolete: true
Alec, please review the new patch, thanks.
Comment on attachment 84102 [details] [diff] [review] Changed to reduce runtime allocation and copy for strings. oh, wow! thanks for going the extra mile and putting in the unicode names too. sr=alecf
Attachment #84102 - Flags: superreview+
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Keywords: adt1.0.0, approval
Yuying, could you verify this on the trunk? You can use FreshEye data attached to the bug.
Verified it works fine on 05-20 trunk build / WinME-JA by applied the FreshEye.src in comment #34.
Status: RESOLVED → VERIFIED
Just want to make sure, does it also work on the location bar? You know, if you set the search engine as default, when you type something on the URL location bar, there will be a dropdown string says "Search xxxx for xxxx".
Blocks: 143047
Whiteboard: [adt2] → [adt2 RTM] [ETA 05/31] [Needs a=]
adt1.0.0+ (on ADT's behalf) for approval to checkin to the 1.0 branch,pending Driver's approval. After, checking in, please add the fixed1.0 keyword.
Keywords: adt1.0.0adt1.0.0+
>Just want to make sure, does it also work on the location bar? You know, if you >set the search engine as default, when you type something on the URL location >bar, there will be a dropdown string says "Search xxxx for xxxx" I am not sure if URL location bar will get fixed; but I know it will get Preference/Internet Search fixed as well. ylong?
>>Just want to make sure, does it also work on the location bar? Yes, fixed on the trunk.
Adding adt1.0.1+ for adt approval for branch checkin for the 1.0.1 milestone. Please get drivers approval before checking in.
Keywords: adt1.0.0+adt1.0.1+
Keywords: mozilla1.0.1
Comment on attachment 84102 [details] [diff] [review] Changed to reduce runtime allocation and copy for strings. please check into the 1.0.1 branch ASAP. once landed remove the mozilla1.0.1+ keyword and add the fixed1.0.1 keyword
Target Milestone: mozilla1.0 → mozilla1.0.1
fix checked in to the 1.0 branch
Blocks: 146292
No longer blocks: 141008
I'm making a new JA build now, and the Japanese search engine title is showing properly! Just want to thank you guys again to make it happens. Yeah!!!
VERIFIED Fixed with 20020719 branch builds
Product: Core → SeaMonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: