If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Ja search engine title garbled on sidebar

VERIFIED FIXED in mozilla1.0.1

Status

SeaMonkey
Search
P3
normal
VERIFIED FIXED
16 years ago
9 years ago

People

(Reporter: Ying-Lin Xia, Assigned: nhottanscp)

Tracking

({intl})

Trunk
mozilla1.0.1
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [adt2 RTM] [ETA 05/31] [Needs a=])

Attachments

(5 attachments, 6 obsolete attachments)

(Reporter)

Description

16 years ago
If localize the search engine title in the searchplugins\xxx.src file, for 
example Japanese, the title will be garbled on the sidebar. The src file should 
use Shift-JIS for Japanese, but I also tested UTF-8 and unicode, none of them 
works.
(Reporter)

Comment 1

16 years ago
Created attachment 43566 [details]
a test file, Netscape_Japan.src, with the title localized in Japanese
(Reporter)

Comment 2

16 years ago
Created attachment 43568 [details]
a screehshot with the test src file above

Comment 3

16 years ago
Fortunately, this does not affect Latin-1 characters. I see that a Spanish 
sherlock file title "Netscape Search en EspaƱol" and it looks fine in browser 
(entered directly without any special encoding).
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Reporter)

Comment 4

16 years ago
Thanks for the information, Lynn. Changing summary.
Summary: Search engine title can not be localized → Ja search engine title garbled on sidebar

Comment 5

16 years ago
->search, 0.9.7
Assignee: matt → sgehani
Component: Sidebar → Search
QA Contact: sujay → claudius
Target Milestone: --- → mozilla0.9.7

Comment 6

16 years ago
Moving to mozilla0.9.8.
Target Milestone: mozilla0.9.7 → mozilla0.9.8

Updated

16 years ago
Target Milestone: mozilla0.9.8 → mozilla0.9.9

Comment 7

16 years ago
-> mozilla1.0, nsbeta1+
Keywords: nsbeta1+
Priority: -- → P3
Target Milestone: mozilla0.9.9 → mozilla1.0

Comment 8

16 years ago
Reassigning 'several bugs at once' to Steve Morse, to level the load better
across the team.
Assignee: sgehani → morse

Comment 9

16 years ago
ADT2 per ADT triage team
Whiteboard: [adt2]

Updated

16 years ago
Keywords: intl

Comment 10

16 years ago
The problem is in the nsInternetSearchService.cpp 
look at line 4596 of function OnDataAvailable
The decoder is probably null so we call context->AppendBytes instead of
decoder->Convert + context->AppendUnicodeBytes to process the data. 
which nsIInternetSearchContext is used for "name" and "description" ? should it
have a "UTF-8" converter ?

Comment 11

16 years ago
Created attachment 79522 [details]
Netscape_Japan.src (same as the first attacmenet, but encoded in UTF-8)

still does not work even with UTF-8

Comment 12

16 years ago
This sounds very similar to 
http://bugzilla.mozilla.org/show_bug.cgi?id=118179

I'll see what I can do after fixing 118179
Assignee: morse → yokoyama

Comment 13

16 years ago
roy: the code is very different 
the code should be in 
xpfe/components/search/src/nsInternetSearchService.cpp

Comment 14

16 years ago
Problem is at
http://lxr.mozilla.org/seamonkey/source/xpfe/components/search/src/nsInternetSearchService.cpp#4176
InternetSearchDataSource::ReadFileContents()
{
...
  sourceContents.AssignWithConversion(contents, contentsLen); <= HERE!
...
}

Basicall, here is what's happening
1) Read the search file content as char * 
   and stuff it to nsString by calling AssignWithConversion() 
   which, in this case, creates a "0x00 prefixed Shift-JIS" UCS2 
   disguised string.    2) By the time we read attribes (name, description, etc)
 
http://lxr.mozilla.org/seamonkey/source/xpfe/components/search/src/nsInternetSearchService.cpp#4235,
 the data is already corrupted.
Status: NEW → ASSIGNED

Comment 15

16 years ago
Created attachment 80713 [details] [diff] [review]
Making ftang's Netscape_Japan.src (UTF-8 encoded) to work.

In search of Sherlock file encoding specification
http://developer.apple.com/technotes/tn/tn1141.html
I failed to find an appropriate encoding for the file.

Attributes are defined as :
 "name"  
  This is a human-readable name for the search plug-in.

Which encoding is _human-readable name_?
Can we assume the file will be encoded in UTF-8?

Comment 16

16 years ago
ftang: can you comment?  if it's ok, then can you give /r=?

Comment 17

16 years ago
Created attachment 80879 [details] [diff] [review]
Decode name, description attributes

I was thinking of Decoding at InternetSearchDataSource::ReadFileContents();
but it will have more performance impact.
Instead, I call InternetSearchDataSource::DecodeData() when it is asked.
Attachment #80713 - Attachment is obsolete: true

Comment 18

16 years ago
nhotta: I can't find an owner of internet-search.  I found a doc in mozilla.org
you wrote 2 yrs ago
(http://www.mozilla.org/projects/intl/internet_search_charset_spec.html)
 
Can you review or recommend someone else?

Comment 19

16 years ago
Humm.  nhotta's doc says something about queryCharset in SEARCH section.
queryCharset - a charset name string for server queries, specify in "SEARCH"
section 

Attached test case didn't contain queryCharset
http://bugzilla.mozilla.org/attachment.cgi?id=79522&action=view

My patch using charset in "INTERPRET" section. Is it wrong?
(Assignee)

Comment 20

16 years ago
The charset in interpret section is used for the search result from the server
while the query charset is for a charset of the query by the client. So please
use the query charset for now (because currently no way to specify a charset of
the search .src).
I am going to look at the patch.
(Assignee)

Comment 21

16 years ago
* You can use const PRUnichar* to pass a charset and PRUnichar** for the
converted result. See my comments below.
+InternetSearchDataSource::DecodeData(const nsString &aCharset, nsString &aValue)

* Please use nsICharsetConverterManager2. Because the charset is from outside,
you need to resolve charset alias. nsICharsetConverterManager2::GetCharsetAtom
does that work (and it takes const const PRUnichar* instead of nsString&)
+    nsCOMPtr<nsICharsetConverterManager> charsetConv = 

* Please add a comment, this is needed to restore the original multibyte but I
think needs comment to explain what is going on.
+        char *value = ToNewCString(aValue);

* Please change to use strlen.
+        PRInt32 srcLength = nsCRT::strlen(NS_REINTERPRET_CAST(const char *,
value));

* I think the function can use PRUnichar ** and avoid to duplicate the buffer in
the function. I see the caller is depending on nsString but the caller can
allocate nsString from the returned buffer.
+                aValue.Assign(unicodeData, outUnicodeLen);
+                nsMemory::Free(unicodeData);

Comment 22

16 years ago
Created attachment 81095 [details] [diff] [review]
revised

It looks like nsAWritableString has depreciated.
nhotta: can you review?
Attachment #80879 - Attachment is obsolete: true
(Assignee)

Comment 23

16 years ago
Comment on attachment 81095 [details] [diff] [review]
revised

Please change "interpret" to the constant for query charset.

This is not needed, the next line you assign with a length specified.
+	     (unicodeData)[outUnicodeLen] = '\0';    // null terminate.
Convert() doesn't do it for us

Change those then r=nhotta
Attachment #81095 - Flags: review+

Comment 24

16 years ago
Created attachment 81220 [details] [diff] [review]
Use of queryCharset in SEARCH section and remove the null terminate

This is to use the queryCharset defined in 
http://www.mozilla.org/projects/intl/internet_search_charset_spec.html
For example:
<SEARCH 
 name="Netscape <Japanese text in Shift-JIS>"
 description = "<Japanese text in Shift-JIS>"
 queryCharset="Shift_JIS"
 ............................
>
Attachment #81095 - Attachment is obsolete: true

Comment 25

16 years ago
Comment on attachment 81220 [details] [diff] [review]
Use of queryCharset in SEARCH section and remove the null terminate

Commet #23
r=nhotta
Attachment #81220 - Flags: review+

Comment 26

16 years ago
Why can't we just apply the converter at the time we read the file?

Comment 27

16 years ago
It's simply the performance issue.
Please see Comment #17

I could call rv = GetData(dataUni, "search", 0, "queryCharset", charsetValue)
right after reading the file and then search for "name" and "description" 
and decode  them ; 
but we decided to call the converter only when the name and description
attriubtes are required.

Comment 28

16 years ago
or do you mean to decode everything in the file after reading?

Comment 29

16 years ago
Comment on attachment 81220 [details] [diff] [review]
Use of queryCharset in SEARCH section and remove the null terminate


ugh, I wish the file was just stored in UTF8
and then we could just use the UTF8ConverterStream already built into XPCOM,
and the data 
would be decoded as it is read in.

That said, there's no need for 
char *value = ToNewCString(PromiseFlatString(aValue));
you should just say
char *value = ToNewCString(aValue)

(i.e. the PromiseFlatString is unnecessary and might make an extra copy of the
string)

However, even that is lame. You should really use 
NS_LossyConvertUCS2toASCII value(aValue), 
then you won't have to call strlen(value), nor free up the string when you're
done.

on the trunk, we eventually need to fix nsICharsetConverter2 to start taking 
nsAString&/nsACString& so we can avoid all this extra copying and allocating.

but in the mean time, looking at this you could clean this up to avoid the
extra copying
by changing DecodeData() to return a PRUnichar**, and return the buffer you
allocated. 
Then using Adopt() to make descValue take ownership of the newly allocated
string. 

it would look something like:
>+        nsAutoString charsetValue;
>+        if (NS_SUCCEEDED(rv = GetData(dataUni, "search", 0, "queryCharset", charsetValue)) && !charsetValue.IsEmpty())
>+        {
>+            PRUnichar *decodedValue;
>+            rv = DecodeData(charsetValue.get(), &decodedValue);
>+            if (NS_SUCCEEDED(rv))
>+                descValue.Adopt(decodedValue);
>+        }
Attachment #81220 - Flags: needs-work+

Comment 30

16 years ago
I think what waterson means is to decode the file into unicode as you read it
out of the file.. so you're dealing with a correctly-decoded unicode buffer, to
begin with. 

However, I'm not quite sure how we'd accomplish this since the charset name is
actually stored in the file itself. That's why I was hoping for UTF8. 

Roy: Are there already files like this that we need to support (i.e. that went
out in previous releases), or is there any chance we can just fix our own
versions of these files in CVS to be UTF8, and then use the UTF8ConverterStream,
which assumes the file is in UTF8?

Comment 31

16 years ago
Ya, my first patch (04/24/02 16:46)
http://bugzilla.mozilla.org/attachment.cgi?id=80713&action=view
was exactly that.  (assume the file is encoding in UTF-8)

>Roy: Are there already files like this that we need to support
I think tao or ying can answer your question; but ftang and I
looked at a sherlock file in Macintosh-Ja and it
was encoded in Shift-JIS. 



Comment 32

16 years ago
but it never worked before, right? I mean, if this bug has always existed, then
we won't be breaking anything that used to work. 

All of our other data files (.properties, .xml, .dtd, .rdf, etc) are all in
UTF8, it seems odd that we'd decide not to use UTF8 in this case.

Comment 33

16 years ago
Sherlock files use native encoding :-\. However, like xml parser, the browser
should use the embedded charset info to decode the data. 

Comment 34

16 years ago
Created attachment 81708 [details]
FreshEye.src ship with MacOS X

Notice it does not have a queryCharset but a queryEncoding="2561"
Please make sure this also work.
This file is ship with MacOS X. (in "Users:ftang:Library:Internet Search
Sites:Internet", replace 'ftang' with your login name of course)
notice the name is encoded in Shift_JIS instead of UTF-8

Comment 35

16 years ago
ah! frank that's the best reason yet to give up on the UTF8 idea :)
so this is completely outside of our control - 3rd parties are already
distributing non-UTF8 files, and these files are shared between apps. 

That said, we should definitely do what frank said, and support queryEncoding as
well.

Comment 36

16 years ago
nhotta,ftang and I did little digging how Apple handle the encoding of Sherlock.
It looks like they are using sourceTextEncoding field(not queryEncoding)
to identify the file encoding and if it's not defined, then it
falls back to "x-mac-roman"

sourceTextEncoding is using the Script Codes.  I am going to
add a mapScriptCode table or modify the table in ::MapEncoding()
to support this field.

Updated

16 years ago
Blocks: 141008

Comment 37

16 years ago
I need to ask yokoyama to focus on bug 141513 now. nhotta- can you take over
this bug?
Assignee: yokoyama → nhotta
Status: ASSIGNED → NEW
(Assignee)

Updated

16 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 38

16 years ago
Created attachment 83593 [details] [diff] [review]
Changed to use "sourceTextEncoding" which contains a Macintosh script code.
Attachment #81220 - Attachment is obsolete: true
Comment on attachment 83593 [details] [diff] [review]
Changed to use "sourceTextEncoding" which contains a Macintosh script code.

r=rjc  (Please be sure to follow tabbed indentation formatting used in file.)
Attachment #83593 - Flags: review+
(Assignee)

Comment 40

16 years ago
Created attachment 83975 [details] [diff] [review]
Changed formatting.
Attachment #83593 - Attachment is obsolete: true
(Assignee)

Comment 41

16 years ago
Alec, could you super review?

Comment 42

16 years ago
Comment on attachment 83975 [details] [diff] [review]
Changed formatting.

Why does MapScriptCodeToCharsetName need to allocate its result? It seems like
the whole function could just return a const char* - since the strings are just
static strings. Also, I think scriptList[] could be "static" as well.. 

I also wonder if it might be even better to make MapScriptCodeToCharsetName
take a const PRUnichar* as a parameter, then you could avoid the excess
copying/conversion... you could declare scriptList[] like

static struct scripts scriptList[] = 
{
  { { '0', '\0' }, "x-mac-roman" },
..etc

it will be annoying to manually expand all that unicode, but once you've done
it once, then we never ever have to do it at runtime! :)

and actually, since really what you want is a charset in PRUnichar, you should
make DecodeData take a PRUnichar* as the charset, and let the consumers of that
function do the conversion

finally, does this compiler? the call to MapScriptCodeToCharsetName  seems to
be lacking a 2nd parameter, the line ends with a ","?

overall this looks fine, I just think we can simplify the code a lot by doing
less conversions
Attachment #83975 - Flags: needs-work+
(Assignee)

Comment 43

16 years ago
Created attachment 84102 [details] [diff] [review]
Changed to reduce runtime allocation and copy for strings.
Attachment #83975 - Attachment is obsolete: true
(Assignee)

Comment 44

16 years ago
Alec, please review the new patch, thanks.

Comment 45

16 years ago
Comment on attachment 84102 [details] [diff] [review]
Changed to reduce runtime allocation and copy for strings.

oh, wow! thanks for going the extra mile and putting in the unicode names too. 
sr=alecf
Attachment #84102 - Flags: superreview+
(Assignee)

Comment 46

16 years ago
checked in to the trunk
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED

Updated

16 years ago
Keywords: adt1.0.0, approval
(Assignee)

Comment 47

16 years ago
Yuying, could you verify this on the trunk? You can use FreshEye data attached
to the bug.

Comment 48

16 years ago
Verified it works fine on 05-20 trunk build / WinME-JA by applied the
FreshEye.src in comment #34.
Status: RESOLVED → VERIFIED
(Reporter)

Comment 49

16 years ago
Just want to make sure, does it also work on the location bar? You know, if you 
set the search engine as default, when you type something on the URL location 
bar, there will be a dropdown string says "Search xxxx for xxxx".

Updated

16 years ago
Blocks: 143047
Whiteboard: [adt2] → [adt2 RTM] [ETA 05/31] [Needs a=]

Comment 50

16 years ago
adt1.0.0+ (on ADT's behalf) for approval to checkin to the 1.0 branch,pending
Driver's approval.  After, checking in, please add the fixed1.0 keyword.
Keywords: adt1.0.0 → adt1.0.0+

Comment 51

16 years ago
>Just want to make sure, does it also work on the location bar? You know, if you 
>set the search engine as default, when you type something on the URL location 
>bar, there will be a dropdown string says "Search xxxx for xxxx"

I am not sure if URL location bar will get fixed; but I know it will
get Preference/Internet Search fixed as well.
ylong?
(Assignee)

Comment 52

16 years ago
>>Just want to make sure, does it also work on the location bar?
Yes, fixed on the trunk.

Comment 53

16 years ago
Adding adt1.0.1+ for adt approval for branch checkin for the 1.0.1 milestone. 
Please get drivers approval before checking in.
Keywords: adt1.0.0+ → adt1.0.1+

Updated

16 years ago
Keywords: mozilla1.0.1

Updated

16 years ago
Attachment #84102 - Flags: approval+
Comment on attachment 84102 [details] [diff] [review]
Changed to reduce runtime allocation and copy for strings.

please check into the 1.0.1 branch ASAP. once landed remove the 
mozilla1.0.1+ keyword and add the fixed1.0.1 keyword

Updated

16 years ago
Keywords: approval, mozilla1.0.1 → mozilla1.0.1+
Target Milestone: mozilla1.0 → mozilla1.0.1

Comment 55

16 years ago
fix checked in to the 1.0 branch
Keywords: mozilla1.0.1+ → fixed1.0.1

Updated

16 years ago
Blocks: 146292
No longer blocks: 141008
(Reporter)

Comment 56

16 years ago
I'm making a new JA build now, and the Japanese search engine title is showing 
properly! Just want to thank you guys again to make it happens. Yeah!!!

Comment 57

15 years ago
VERIFIED Fixed with 20020719 branch builds
Keywords: fixed1.0.1 → verified1.0.1
Product: Core → SeaMonkey
You need to log in before you can comment on or make changes to this bug.