Closed
Bug 92314
Opened 24 years ago
Closed 23 years ago
Ja search engine title garbled on sidebar
Categories
(SeaMonkey :: Search, defect, P3)
SeaMonkey
Search
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla1.0.1
People
(Reporter: yinglinxia, Assigned: nhottanscp)
References
Details
(Keywords: intl, Whiteboard: [adt2 RTM] [ETA 05/31] [Needs a=])
Attachments
(5 files, 6 obsolete files)
1.91 KB,
text/plain
|
Details | |
84.00 KB,
image/jpeg
|
Details | |
1.92 KB,
text/plain
|
Details | |
786 bytes,
text/plain
|
Details | |
8.79 KB,
patch
|
alecf
:
superreview+
jesup
:
approval+
|
Details | Diff | Splinter Review |
If localize the search engine title in the searchplugins\xxx.src file, for
example Japanese, the title will be garbled on the sidebar. The src file should
use Shift-JIS for Japanese, but I also tested UTF-8 and unicode, none of them
works.
Reporter | ||
Comment 1•24 years ago
|
||
Reporter | ||
Comment 2•24 years ago
|
||
Fortunately, this does not affect Latin-1 characters. I see that a Spanish
sherlock file title "Netscape Search en Español" and it looks fine in browser
(entered directly without any special encoding).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Comment 4•24 years ago
|
||
Thanks for the information, Lynn. Changing summary.
Summary: Search engine title can not be localized → Ja search engine title garbled on sidebar
Comment 5•24 years ago
|
||
->search, 0.9.7
Assignee: matt → sgehani
Component: Sidebar → Search
QA Contact: sujay → claudius
Target Milestone: --- → mozilla0.9.7
Updated•24 years ago
|
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Comment 7•24 years ago
|
||
-> mozilla1.0, nsbeta1+
Comment 8•23 years ago
|
||
Reassigning 'several bugs at once' to Steve Morse, to level the load better
across the team.
Assignee: sgehani → morse
Comment 10•23 years ago
|
||
The problem is in the nsInternetSearchService.cpp
look at line 4596 of function OnDataAvailable
The decoder is probably null so we call context->AppendBytes instead of
decoder->Convert + context->AppendUnicodeBytes to process the data.
which nsIInternetSearchContext is used for "name" and "description" ? should it
have a "UTF-8" converter ?
Comment 11•23 years ago
|
||
still does not work even with UTF-8
Comment 12•23 years ago
|
||
This sounds very similar to
http://bugzilla.mozilla.org/show_bug.cgi?id=118179
I'll see what I can do after fixing 118179
Assignee: morse → yokoyama
Comment 13•23 years ago
|
||
roy: the code is very different
the code should be in
xpfe/components/search/src/nsInternetSearchService.cpp
Comment 14•23 years ago
|
||
Problem is at
http://lxr.mozilla.org/seamonkey/source/xpfe/components/search/src/nsInternetSearchService.cpp#4176
InternetSearchDataSource::ReadFileContents()
{
...
sourceContents.AssignWithConversion(contents, contentsLen); <= HERE!
...
}
Basicall, here is what's happening
1) Read the search file content as char *
and stuff it to nsString by calling AssignWithConversion()
which, in this case, creates a "0x00 prefixed Shift-JIS" UCS2
disguised string. 2) By the time we read attribes (name, description, etc)
http://lxr.mozilla.org/seamonkey/source/xpfe/components/search/src/nsInternetSearchService.cpp#4235,
the data is already corrupted.
Status: NEW → ASSIGNED
Comment 15•23 years ago
|
||
In search of Sherlock file encoding specification
http://developer.apple.com/technotes/tn/tn1141.html
I failed to find an appropriate encoding for the file.
Attributes are defined as :
"name"
This is a human-readable name for the search plug-in.
Which encoding is _human-readable name_?
Can we assume the file will be encoded in UTF-8?
Comment 16•23 years ago
|
||
ftang: can you comment? if it's ok, then can you give /r=?
Comment 17•23 years ago
|
||
I was thinking of Decoding at InternetSearchDataSource::ReadFileContents();
but it will have more performance impact.
Instead, I call InternetSearchDataSource::DecodeData() when it is asked.
Attachment #80713 -
Attachment is obsolete: true
Comment 18•23 years ago
|
||
nhotta: I can't find an owner of internet-search. I found a doc in mozilla.org
you wrote 2 yrs ago
(http://www.mozilla.org/projects/intl/internet_search_charset_spec.html)
Can you review or recommend someone else?
Comment 19•23 years ago
|
||
Humm. nhotta's doc says something about queryCharset in SEARCH section.
queryCharset - a charset name string for server queries, specify in "SEARCH"
section
Attached test case didn't contain queryCharset
http://bugzilla.mozilla.org/attachment.cgi?id=79522&action=view
My patch using charset in "INTERPRET" section. Is it wrong?
Assignee | ||
Comment 20•23 years ago
|
||
The charset in interpret section is used for the search result from the server
while the query charset is for a charset of the query by the client. So please
use the query charset for now (because currently no way to specify a charset of
the search .src).
I am going to look at the patch.
Assignee | ||
Comment 21•23 years ago
|
||
* You can use const PRUnichar* to pass a charset and PRUnichar** for the
converted result. See my comments below.
+InternetSearchDataSource::DecodeData(const nsString &aCharset, nsString &aValue)
* Please use nsICharsetConverterManager2. Because the charset is from outside,
you need to resolve charset alias. nsICharsetConverterManager2::GetCharsetAtom
does that work (and it takes const const PRUnichar* instead of nsString&)
+ nsCOMPtr<nsICharsetConverterManager> charsetConv =
* Please add a comment, this is needed to restore the original multibyte but I
think needs comment to explain what is going on.
+ char *value = ToNewCString(aValue);
* Please change to use strlen.
+ PRInt32 srcLength = nsCRT::strlen(NS_REINTERPRET_CAST(const char *,
value));
* I think the function can use PRUnichar ** and avoid to duplicate the buffer in
the function. I see the caller is depending on nsString but the caller can
allocate nsString from the returned buffer.
+ aValue.Assign(unicodeData, outUnicodeLen);
+ nsMemory::Free(unicodeData);
Comment 22•23 years ago
|
||
It looks like nsAWritableString has depreciated.
nhotta: can you review?
Attachment #80879 -
Attachment is obsolete: true
Assignee | ||
Comment 23•23 years ago
|
||
Comment on attachment 81095 [details] [diff] [review]
revised
Please change "interpret" to the constant for query charset.
This is not needed, the next line you assign with a length specified.
+ (unicodeData)[outUnicodeLen] = '\0'; // null terminate.
Convert() doesn't do it for us
Change those then r=nhotta
Attachment #81095 -
Flags: review+
Comment 24•23 years ago
|
||
This is to use the queryCharset defined in
http://www.mozilla.org/projects/intl/internet_search_charset_spec.html
For example:
<SEARCH
name="Netscape <Japanese text in Shift-JIS>"
description = "<Japanese text in Shift-JIS>"
queryCharset="Shift_JIS"
............................
>
Attachment #81095 -
Attachment is obsolete: true
Comment 25•23 years ago
|
||
Comment on attachment 81220 [details] [diff] [review]
Use of queryCharset in SEARCH section and remove the null terminate
Commet #23
r=nhotta
Attachment #81220 -
Flags: review+
Comment 26•23 years ago
|
||
Why can't we just apply the converter at the time we read the file?
Comment 27•23 years ago
|
||
It's simply the performance issue.
Please see Comment #17
I could call rv = GetData(dataUni, "search", 0, "queryCharset", charsetValue)
right after reading the file and then search for "name" and "description"
and decode them ;
but we decided to call the converter only when the name and description
attriubtes are required.
Comment 28•23 years ago
|
||
or do you mean to decode everything in the file after reading?
Comment 29•23 years ago
|
||
Comment on attachment 81220 [details] [diff] [review]
Use of queryCharset in SEARCH section and remove the null terminate
ugh, I wish the file was just stored in UTF8
and then we could just use the UTF8ConverterStream already built into XPCOM,
and the data
would be decoded as it is read in.
That said, there's no need for
char *value = ToNewCString(PromiseFlatString(aValue));
you should just say
char *value = ToNewCString(aValue)
(i.e. the PromiseFlatString is unnecessary and might make an extra copy of the
string)
However, even that is lame. You should really use
NS_LossyConvertUCS2toASCII value(aValue),
then you won't have to call strlen(value), nor free up the string when you're
done.
on the trunk, we eventually need to fix nsICharsetConverter2 to start taking
nsAString&/nsACString& so we can avoid all this extra copying and allocating.
but in the mean time, looking at this you could clean this up to avoid the
extra copying
by changing DecodeData() to return a PRUnichar**, and return the buffer you
allocated.
Then using Adopt() to make descValue take ownership of the newly allocated
string.
it would look something like:
>+ nsAutoString charsetValue;
>+ if (NS_SUCCEEDED(rv = GetData(dataUni, "search", 0, "queryCharset", charsetValue)) && !charsetValue.IsEmpty())
>+ {
>+ PRUnichar *decodedValue;
>+ rv = DecodeData(charsetValue.get(), &decodedValue);
>+ if (NS_SUCCEEDED(rv))
>+ descValue.Adopt(decodedValue);
>+ }
Attachment #81220 -
Flags: needs-work+
Comment 30•23 years ago
|
||
I think what waterson means is to decode the file into unicode as you read it
out of the file.. so you're dealing with a correctly-decoded unicode buffer, to
begin with.
However, I'm not quite sure how we'd accomplish this since the charset name is
actually stored in the file itself. That's why I was hoping for UTF8.
Roy: Are there already files like this that we need to support (i.e. that went
out in previous releases), or is there any chance we can just fix our own
versions of these files in CVS to be UTF8, and then use the UTF8ConverterStream,
which assumes the file is in UTF8?
Comment 31•23 years ago
|
||
Ya, my first patch (04/24/02 16:46)
http://bugzilla.mozilla.org/attachment.cgi?id=80713&action=view
was exactly that. (assume the file is encoding in UTF-8)
>Roy: Are there already files like this that we need to support
I think tao or ying can answer your question; but ftang and I
looked at a sherlock file in Macintosh-Ja and it
was encoded in Shift-JIS.
Comment 32•23 years ago
|
||
but it never worked before, right? I mean, if this bug has always existed, then
we won't be breaking anything that used to work.
All of our other data files (.properties, .xml, .dtd, .rdf, etc) are all in
UTF8, it seems odd that we'd decide not to use UTF8 in this case.
Comment 33•23 years ago
|
||
Sherlock files use native encoding :-\. However, like xml parser, the browser
should use the embedded charset info to decode the data.
Comment 34•23 years ago
|
||
Notice it does not have a queryCharset but a queryEncoding="2561"
Please make sure this also work.
This file is ship with MacOS X. (in "Users:ftang:Library:Internet Search
Sites:Internet", replace 'ftang' with your login name of course)
notice the name is encoded in Shift_JIS instead of UTF-8
Comment 35•23 years ago
|
||
ah! frank that's the best reason yet to give up on the UTF8 idea :)
so this is completely outside of our control - 3rd parties are already
distributing non-UTF8 files, and these files are shared between apps.
That said, we should definitely do what frank said, and support queryEncoding as
well.
Comment 36•23 years ago
|
||
nhotta,ftang and I did little digging how Apple handle the encoding of Sherlock.
It looks like they are using sourceTextEncoding field(not queryEncoding)
to identify the file encoding and if it's not defined, then it
falls back to "x-mac-roman"
sourceTextEncoding is using the Script Codes. I am going to
add a mapScriptCode table or modify the table in ::MapEncoding()
to support this field.
Comment 37•23 years ago
|
||
I need to ask yokoyama to focus on bug 141513 now. nhotta- can you take over
this bug?
Assignee: yokoyama → nhotta
Status: ASSIGNED → NEW
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 38•23 years ago
|
||
Attachment #81220 -
Attachment is obsolete: true
Comment 39•23 years ago
|
||
Comment on attachment 83593 [details] [diff] [review]
Changed to use "sourceTextEncoding" which contains a Macintosh script code.
r=rjc (Please be sure to follow tabbed indentation formatting used in file.)
Attachment #83593 -
Flags: review+
Assignee | ||
Comment 40•23 years ago
|
||
Attachment #83593 -
Attachment is obsolete: true
Assignee | ||
Comment 41•23 years ago
|
||
Alec, could you super review?
Comment 42•23 years ago
|
||
Comment on attachment 83975 [details] [diff] [review]
Changed formatting.
Why does MapScriptCodeToCharsetName need to allocate its result? It seems like
the whole function could just return a const char* - since the strings are just
static strings. Also, I think scriptList[] could be "static" as well..
I also wonder if it might be even better to make MapScriptCodeToCharsetName
take a const PRUnichar* as a parameter, then you could avoid the excess
copying/conversion... you could declare scriptList[] like
static struct scripts scriptList[] =
{
{ { '0', '\0' }, "x-mac-roman" },
..etc
it will be annoying to manually expand all that unicode, but once you've done
it once, then we never ever have to do it at runtime! :)
and actually, since really what you want is a charset in PRUnichar, you should
make DecodeData take a PRUnichar* as the charset, and let the consumers of that
function do the conversion
finally, does this compiler? the call to MapScriptCodeToCharsetName seems to
be lacking a 2nd parameter, the line ends with a ","?
overall this looks fine, I just think we can simplify the code a lot by doing
less conversions
Attachment #83975 -
Flags: needs-work+
Assignee | ||
Comment 43•23 years ago
|
||
Attachment #83975 -
Attachment is obsolete: true
Assignee | ||
Comment 44•23 years ago
|
||
Alec, please review the new patch, thanks.
Comment 45•23 years ago
|
||
Comment on attachment 84102 [details] [diff] [review]
Changed to reduce runtime allocation and copy for strings.
oh, wow! thanks for going the extra mile and putting in the unicode names too.
sr=alecf
Attachment #84102 -
Flags: superreview+
Assignee | ||
Comment 46•23 years ago
|
||
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Updated•23 years ago
|
Assignee | ||
Comment 47•23 years ago
|
||
Yuying, could you verify this on the trunk? You can use FreshEye data attached
to the bug.
Comment 48•23 years ago
|
||
Verified it works fine on 05-20 trunk build / WinME-JA by applied the
FreshEye.src in comment #34.
Status: RESOLVED → VERIFIED
Reporter | ||
Comment 49•23 years ago
|
||
Just want to make sure, does it also work on the location bar? You know, if you
set the search engine as default, when you type something on the URL location
bar, there will be a dropdown string says "Search xxxx for xxxx".
Comment 50•23 years ago
|
||
adt1.0.0+ (on ADT's behalf) for approval to checkin to the 1.0 branch,pending
Driver's approval. After, checking in, please add the fixed1.0 keyword.
Comment 51•23 years ago
|
||
>Just want to make sure, does it also work on the location bar? You know, if you
>set the search engine as default, when you type something on the URL location
>bar, there will be a dropdown string says "Search xxxx for xxxx"
I am not sure if URL location bar will get fixed; but I know it will
get Preference/Internet Search fixed as well.
ylong?
Assignee | ||
Comment 52•23 years ago
|
||
>>Just want to make sure, does it also work on the location bar?
Yes, fixed on the trunk.
Comment 53•23 years ago
|
||
Adding adt1.0.1+ for adt approval for branch checkin for the 1.0.1 milestone.
Please get drivers approval before checking in.
Updated•23 years ago
|
Keywords: mozilla1.0.1
Updated•23 years ago
|
Attachment #84102 -
Flags: approval+
Comment 54•23 years ago
|
||
Comment on attachment 84102 [details] [diff] [review]
Changed to reduce runtime allocation and copy for strings.
please check into the 1.0.1 branch ASAP. once landed remove the
mozilla1.0.1+ keyword and add the fixed1.0.1 keyword
Updated•23 years ago
|
Target Milestone: mozilla1.0 → mozilla1.0.1
Updated•23 years ago
|
Reporter | ||
Comment 56•23 years ago
|
||
I'm making a new JA build now, and the Japanese search engine title is showing
properly! Just want to thank you guys again to make it happens. Yeah!!!
Comment 57•23 years ago
|
||
VERIFIED Fixed with 20020719 branch builds
Keywords: fixed1.0.1 → verified1.0.1
Updated•17 years ago
|
Product: Core → SeaMonkey
You need to log in
before you can comment on or make changes to this bug.
Description
•