Japanese localized charactor is broken when ftp directory listing

VERIFIED INVALID

Status

()

defect
P3
normal
VERIFIED INVALID
18 years ago
17 years ago

People

(Reporter: mal, Assigned: tetsuroy)

Tracking

({intl, l12y})

Trunk
mozilla1.0
x86
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [adt2 rtm], Verified on trunk)

Attachments

(2 attachments, 8 obsolete attachments)

Reporter

Description

18 years ago
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7+) Gecko/20020104
BuildID:    (CVS build on linux)

Reproducible: Always
Steps to Reproduce:
1.change necko.properties to Japanese as below

Index: necko.properties<
===================================================================<
RCS file: /cvsroot/mozilla/netwerk/resources/locale/en-US/necko.properties,v<
retrieving revision 1.10<
diff -u -r1.10 necko.properties<
--- necko.properties>---2001/12/06 00:05:13>1.10<
+++ necko.properties>---2002/01/04 16:48:02<
@@ -50,5 +50,5 @@<
 DeniedPortAccess=Access to the port number given has been disabled for security
reasons.<
-<
 # Directory listing strings<
-DirTitle=Index of %1$S<
-DirGoUp=Up to higher level directory<
+DirTitle=%1$S \u306e\u4e00\u89a7<
+DirGoUp=\u4e0a\u4f4d\u306e\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u3078<

2.check [Preferances]-[Debug]-.[Networking]-[Enable html directry listing]
3.access ftp://ftp.mozilla.org/pub/mozilla/

Actual Results:  translated charactors were broken

Expected Results:  show right translated charactors:
ftp://ftp.mozilla.org/pub/mozilla/ ¤Î°ìÍ÷
---------------
¾å°Ì¤Î¥Ç¥£¥ì¥¯¥È¥ê¤Ø

view html source when ftp directry listing,

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

is show. It is not i18n to set charset 'ISO-8859-1'.
Please remove charset.

Updated

18 years ago
QA Contact: teruko → ruixu

Comment 2

18 years ago
It looks like the problem code is in 
source/netwerk/streamconv/converters/nsIndexedToHTML.cpp
 93 nsIndexedToHTML::OnStartRequest(nsIRequest* request, nsISupports *aContext) {

the charset value is generated as "encoding" from the parser. 
we should be careful when we fix this thing. The data from the ftp could be
different from the data from the property file. What we should do is convert
those non ASCII in from the property file into NCR form before we output the the
layout. This way we can switch the encoding from the encoding menu but still see
the text from the property correctly.

a possible work around is use NCR in the property file instead : Could you try
changing 

-DirTitle=Index of %1$S<
-DirGoUp=Up to higher level directory<
+DirTitle=%1$S \u306e\u4e00\u89a7<
+DirGoUp=\u4e0a\u4f4d\u306e\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u3078<


to
-DirTitle=Index of %1$S<
-DirGoUp=Up to higher level directory<
+DirTitle=%1$S &#x306e;&#x4e00&#xu89a7;<
+DirGoUp=&#x4e0a;&#x4f4d;&#x306e;&#x30c7;&#x30a3;&#x30ec;&#x30af;&#x30c8;&#x30ea;&#x3078;<

Comment 3

18 years ago
CONFIRMED.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter

Comment 4

18 years ago
When the items changed to NCR, Japanese has been displayed normally.

Comment 5

18 years ago
This bug is related to the bug fixes 82439, 84472, 103737 and 106114. cc the 
related persons.
Assignee: rchen → bbaetz

Updated

18 years ago
Keywords: l12y, nsbeta1
If you manually change the encoding via the encoding menu, does this fix 
itsself? If so, I know what the problem is, sort of.

Note that this problem is not fixable in all cases - the directory could be in 
one encoding, whilst the text is in another, incompatable, encoding. That is 
not the problem here, though.
Actually, ftp directory listings are of unknown charset. There is a spec to
extend this, but I haven't found a server which implements it - see bug 26767.

Maybe I should just remove the charset setting, and let autodetection deal with
it. Would that work?

Comment 8

18 years ago
bradley, can you follow Frank's suggestion - convert those non ASCII in from the 
property file into NCR form. 

If you need a Japanese machine to verify your fix, please create a patch and I 
will help you from there. 
So NCR form would then display independantly of the encoding specified, either
in the html file or in the charset menu? If so, then its probably better for
someone else to come up with the patch, since I don't speak japanese.

If not, then we may need another solution.

Comment 10

18 years ago
bradley, you don't need to know any Japanese to fix this. 

You already have the sample data. Read the strings from resource bundle 
(properties file). They are in Unicode already. If the characters are not ASCII, 
convert them to NCR by adding &#; as below. 
convet

DirTitle=%1$S \u306e\u4e00\u89a7
DirGoUp=\u4e0a\u4f4d\u306e\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u3078

to

DirTitle=%1$S &#x306e;&#x4e00;&#xu89a7;
DirGoUp=&#x4e0a;&#x4f4d;&#x306e;&#x30c7;&#x30a3;&#x30ec;&#x30af;&#x30c8;&#x30ea;
&#x3078;

I am not sure any API available. Naoki, do you know any API to convert the 
codings?
Yes, but the actual file to change would be the localisation file, right? I
don't have access to the jp dtd files.

I could test it by making those changes to the en-us file, but since I wouldn't
recognise a problem in the display, I'm not sure how useful that would be.

Or are you suggesting that I do this programatically? That would be hard,
because the paramater substitution happens internally, so I'd have to do so
manually in that case.

If you edit the html file to remove that encoding from the meta tag, does it
work? If so, then that is the correct fix, which I'm probably going to do as
partof other changes.

Comment 12

18 years ago
The actual file is necko.properties not dtd files.  And the data is provided 
DirTitle=%1$S \u306e\u4e00\u89a7
DirGoUp=\u4e0a\u4f4d\u306e\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u3078

If you can save the html file and attach it here, I can verify it for you.
OK, I'll get to this later today. I'm not sure why you can't make those changes,
though - it would be quicker than me trying several things, and going via you to
check if they are correct.

Also note that save page for a direcotry listing doesn't actually work at the
moment.

Updated

18 years ago
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.9
Now that save as works, I played with this a bit. Using your encoding only
displayes 'n' on the screen after the url.

Can you please try using the normal text for this, and then manually selecting a
japanese charset from view->character encoding? Let me know if that works.

->1.0
Target Milestone: mozilla0.9.9 → mozilla1.0

Comment 15

18 years ago
per adt, not critical for nsbeta1. hence minus.
Keywords: nsbeta1nsbeta1-
Since I really can't do anything about this, -> default owner.

I don't know if this doesn't work on linux, or what, but its probably better for
someone else to own this.
Assignee: bbaetz → rchen
Status: ASSIGNED → NEW
Target Milestone: mozilla1.0 → ---

Comment 17

18 years ago
Posted patch Add NCR conversion (obsolete) — Splinter Review

Comment 18

18 years ago
-> nsbeta1+

bradley, can you review the patch? see how it goes. Basically I add NCR 
conversion for title and header. Since they are the same string, this patch 
works fine. Please review it to see if any side efffect or anything I might 
miss.
Status: NEW → ASSIGNED
Keywords: nsbeta1-nsbeta1+
Comment on attachment 75066 [details] [diff] [review]
Add NCR conversion

I still want to know why this place in the code needs to handle conversion, and
no other place that
prints localised text does.

>Index: nsIndexedToHTML.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/netwerk/streamconv/converters/nsIndexedToHTML.cpp,v
>retrieving revision 1.22
>diff -u -r1.22 nsIndexedToHTML.cpp
>--- nsIndexedToHTML.cpp	12 Mar 2002 05:56:29 -0000	1.22
>+++ nsIndexedToHTML.cpp	19 Mar 2002 23:07:31 -0000

> 
>+static nsString ConvertoNCR(nsAString& oldstring) {

static nsString ConvertToNCR(const nsAString& oldstring) {

(Note that added T)

>@@ -189,18 +208,23 @@
>     buffer.Append(NS_LITERAL_STRING("<html>\n<head><title>"));
> 
>     nsXPIDLString title;
>-    nsAutoString uniSpec; uniSpec.AssignWithConversion(spec);
>-
>-    const PRUnichar* formatTitle[] = {
>-        uniSpec.get()
>+    char* escaped = nsEscapeHTML(spec);

This should already be excaped, I think - why do you need to double escape?
>+
>+    nsString NCRTitle;

You can do: const nsString& NCRtitle = ConvertToNCR(title); I think.

Again, why does this code have to deal with this type of encoding?
Attachment #75066 - Flags: needs-work+

Comment 20

18 years ago
I think there are some other problems. Need to double check.

Updated

18 years ago
Keywords: intl

Comment 21

18 years ago
bradley, there are some other problem with the OnStartRequest which can't 
display Japanese folder name or file name correctly. 

The reason why we use NCR here is that we don't know what encoding returned from 
ftp server. The same number means different characters for shift jis and big 5. 
Without this piece of information we can't convert the data into unicode. So the 
solution is convert the strings from .properties into NCR which will work in any 
cases.

Bradley, do you know which file implemented HTTP? Maybe we can take a look at, 
too?
http listings are generated by the server; you cna't do anything to those.

You can put unicode data into .properties files, though. I still dno't see why
this doesn't affect other places in the code - how do the menus know what to
display, for example/

Comment 23

18 years ago
I guess "Index of " in HTTP is from server. I searched in LXR and couldn't find 
it. I also compared HTTP page with IE and saw they are the same but not ftp 
page. 

Usually we know the encoding from differnt ways - 1) look at the mega tag in 
html 2) from users preference 3) from OS locale 4) autodetect.  In terms of ftp, 
does the server provide the info? If not, it could be any encoding. We can't 
just convert it to unicode. We can however convert unicode to the encoding if we 
know. For example, for Japanese release (we know they are Japanese), the ui data  
(Ja text)is in unicode in .properties files. They are converted to Shift_JIS for 
Windows platform and EUC-JP for Linux platform.

I guess in this case, we need perserve the list data in single bye and up to the 
autodetector or user preference to decide the encoding. But the strings from 
properties need to be in NCR to be correctly displayed.
For http, the "index of" is server generated. For ftp, we generate it.

I'm talking about stuff like dialogs, mainly. Why can't you write the
.properties file in unicode, using the \u notation ? Would that work.

If not, or it would make it too complicated, then I'm fine with the patch if you
fix those minor problesm I had with it, and add a comment to the en-us
properties file documenting this.

Comment 25

18 years ago
Yes. Properties file are all in unicode. They are in \u notation. For ASCII 
character, they remains the same. You can't see \u.  For example, 
\u306e\u4e00\u89a7 are the unicode for Japanese text. 

Right now, the problem is not the patch but OnStartRequest. It can't display the 
non-ASCii folder/files name. This is more serious than the two strings from 
properties. 

But the way, do you know "File" stuff? It uses something else. Does "File" go 
into OnStartRequest?  
That isn't a bug - the ftp standard is for all file/folder names to be
transmitted in ASCII. Some servers will send will send using the server's
encoding (ISO-8859-x), but we have no way of knowing when they do that, or what
encoding they are using. There is an rfc for allowing utf8 to be sent, but I
haven't found a server which actually implements it. Obviously, just guessing
gives the wrong result. See bug 26767.

File doens't currently use this same code, but it soon will, once I get my patch
working on windows, and tested on a jp system - bug 102812.

I'm still confused. If you provide unicode, isn't there only one way for the
data to be displayed? Isn't that the point of unicode?

Comment 27

18 years ago
It is a bug. IE can display it and we can't. Can you say it is not a bug? 

As you said, the data is transmitted in ASCII. We should use nsCString instead 
of nsString to store the data. nsString is for unicode.
Well: a) Its not this particular bug, and b) noone answered my questions in that
bug.

I'd love to use nsCString, but the interface takes nsString - its a generic
iface. And using nsCString definately wouldn't help you here, wince then you
would never get non ascii.

Again, I'm more than happy to r= this patch if you can tell me why this is the
only part of the browser which needs this code. Is it just because its being
displayed in the content area?

Comment 29

18 years ago
adt3 - impact localization
Priority: -- → P3
Whiteboard: adt3
Target Milestone: --- → mozilla1.0

Comment 30

18 years ago
ftang, this bug is more than l10n. 

The folder/file name come back from ftp can't be displayed properly if they are 
non-ascii. I don't have time to fix it now. You may want to take over it or mark 
it adt1.

Comment 31

18 years ago
rchen is too busy now. give it back to ftang

Impact Summery
Impact Platform: ALL
Impact language users: ALL non English users 339M 59.8%
Probability of hitting the problem: MID- Mid workflow
Severity if hit the problem in the worst case: Users see garbage text
Way of recover after hit the problem: Change the localization string to encoded
in NCR
Risk of the fix: Low
Potential benefit of fix this problem: Probably none
ADT3
Assignee: rchen → ftang
Status: ASSIGNED → NEW

Comment 32

18 years ago
assigned
Status: NEW → ASSIGNED

Comment 33

18 years ago
give mid-low priority bug to yokoyama
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
Assignee

Comment 34

18 years ago
>The folder/file name come back from ftp can't be displayed properly if they are 
>non-ascii. 

nsDirIndexParser.mEncoding is initialized as "ISO-8859-1" and nobody is calling
nsDirIndexParser::SetEncoding() to set a proper encoding for the Parser.

Thus it will convert the stream data to a corrupt filename
UnEscapeAndConvert("wrong encoding", filename, ......)
http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsDirIndexParser.cpp#279

Status: NEW → ASSIGNED
Yes, but the problem is that we have no way of working out what the encoding
sent from teh server actually is - there is no standard.

This bug has changed to become a dupe of bug 26767, I think. See comments there.
Assignee

Comment 36

18 years ago
Correct, there is no FTP standard for that.
I think we have two things to do:
1) Initialization nsDirIndexParser.mEncoding should come from nsPref, not
hard-coded "ISO-8859-1"
   I quickly looked at this and netwerk module is already depend on nsPref. :)
2) We need to propagate the encoding menu change by the user.  
   
darin: do we have any way of getting the user forced encoding at
nsIndexToHTML::OnStartRequest()
       I was hoping the nsStreamIOChannel.mContentCharset would help; but it's
empty. :(
But the encoding for a remote ftp site has nothing to do with the user's local
encoding.

ftp doesn't provide an encoding, which is why you can't get it.

If ftp did, then this bug would be easy to fix, since there is already a way for
the parser to get teh encoding from teh channle.

Again, see the comments in the other bug.

Maybe the real fix is to just avoid this text-based intermediate format
entirely. valeski didn't want this when I tried to do this about 9 months ago,
but its causing enough probles that I think we can justify it. That won't happen
for 1.0, though.
Assignee

Comment 38

18 years ago
>ftp doesn't provide an encoding, which is why you can't get it.
Yes, I understand that.  What I'm attempt to do is to 
respect the encoding setting that user set in the encoding menu.

>Maybe the real fix is to just avoid this text-based intermediate format
>entirely
Ok, I am open to suggestions.  What do you have in mind?
What do we replace it with?
We pass arround a set of nsIDirectoryIndex objects directly. Curently we go from
network fomat -> text -> binary, and lose stuff in the text translation

Heres a thought. What if you disable using nsITextToSubURI in that code, and
just use the fallback code (just comment it out)?

I'm trying this for another bug ATM, but I think that this would work, because
then the character encoding menu will affect layout - this code is reached
before then, so subsequent changes wouldn't help.

Comment 40

17 years ago
>Correct, there is no FTP standard for that.
We should said there are no well-adopted FTP standard for that. 

because there are some not-well-adopted ftp propose standard for this in the
form of RFC, see http://www.ietf.org/rfc/rfc2640.txt
 
Internationalization of the File Transfer Protocol (RFC 2640) (57204 bytes)

But since ftp is widely implement without support RFC2640, we still need to take
care the case of without RFC2640.
I said I'd implement that rfc as soon as someone can point me to a publically
available server which implements that standard.

I looked, but couldn't find one.

Does making the change I suggest in comment #39 help?
Assignee

Comment 42

17 years ago
>What if you disable using nsITextToSubURI in that code, and
>just use the fallback code (just comment it out)?
Nice, but unfortunately it still does work.
The fallback code _assumes_ the value is UTF-8. Thus corrupts the string.


I think it will be easier if we can read the default encoding from
Pref.
Assignee

Comment 43

17 years ago
>this code is reached before then, so subsequent changes wouldn't help.
Exactly. We needed to pass the string properly converted to UCS2 before
layout gets it. 

If you're happy with making the default directory format encoding be a pref, I'm
happy with that, I guess.
Assignee

Comment 45

17 years ago
While I am making this happen, the subsequent call from

 nsresult rv = NS_NewStringInputStream(getter_AddRefs(inputData), pushBuffer);

ended up with calling 

 ConstStringImpl(const nsAString& inString)
  : ConstCharImpl(ToNewCString(inString),          <= HERE!!!
    inString.Length())

ToNewCString() is corrupting the Unicode buffer. :(
Boy, this has more work than I originally thought.....
Hmm, this could be my problem with html file:/// listings, too. Good catch!
Theres a comment which says:

class StringImpl
    : public ConstStringImpl
// This is wrong, since it really converts to 1-char strings.

Do we want to open a separate bug for this?
Assignee

Comment 48

17 years ago
>Do we want to open a separate bug for this?
Well, I am not sure if I can fix this bug without fixing ConstStringImpl.
We have two options:
1) Fix ConstStringImpl
2) Call NS_NewCStringInputStream() instead of NS_NewStringInputStream()

Either way, I don't think we can make it for 1.0.0
Suggestions?

Comment 49

17 years ago
seems like you should avoid NS_NewStringInputStream... perhaps format all of
your text using some known charset, and then just use NS_NewCStringInputStream??

Updated

17 years ago
Whiteboard: adt3 → [adt3]
Assignee

Comment 50

17 years ago
I am testing it with WinNT-Ja ftp server and it's sending it in Shift-JIS.
The patch seems works good.
There is one problem with this patch.
directory folder after "index of...." needs to be converted to
proper encoding.
Comment on attachment 79023 [details] [diff] [review]
Convert from default locale to UTF-8

>Index: streamconv/converters/nsDirIndexParser.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/netwerk/streamconv/converters/nsDirIndexParser.cpp,v
>retrieving revision 1.3
>diff -u -r1.3 nsDirIndexParser.cpp
>--- streamconv/converters/nsDirIndexParser.cpp	14 Nov 2001 12:35:41 -0000	1.3
>+++ streamconv/converters/nsDirIndexParser.cpp	13 Apr 2002 02:10:56 -0000
>@@ -50,6 +50,7 @@
> #include "nsIInputStream.h"
> #include "nsIChannel.h"
> #include "nsIURI.h"
>+#include "nsiPref.h"

#include "nsIPref.h" - case is important on unix.

>+    // Bug 118179
>     nsCOMPtr<nsIInputStream> inputData;
>-    rv = NS_NewStringInputStream(getter_AddRefs(inputData), buffer);
>+    nsCString nsaBuffer;
>+    nsaBuffer.Assign(((char *)NS_ConvertUCS2toUTF8(buffer).get()));
>+    rv = NS_NewCStringInputStream(getter_AddRefs(inputData), nsaBuffer);

Why not:

nsaBuffer.Assign(NS_ConvertUCS2toUTF8(buffer)); so that we don't have to
recount the number of characters.

The parent link is reached from nsIndexedToHTML.cpp - we don't read it out of
this stream, for various
reasons documented in that file.
Attachment #79023 - Flags: needs-work+

Comment 52

17 years ago
roy- just want to make sure you understand there are three different problem
related to the ftp issue:
1. if all the file name and directory name is ASCII only, want want to make the
Japanese loclaized build show the generated  Japanese characters at the top show
correctly in Japanese build. That is a l12y issue and shoudl be addressed ASAP.
This was the origioanl scope of this l12y bug. I need that part for RTM at least. 

2. if the file name contain non ASCII, in the english build, we want to make
sure it is possible to show correctly, maybe with user's interaction with menu
change This is not the origional scope of this bug but an important issue. We
may choose to defer the fix for this if it is too risky to fix that for nsbeta1.

3. As issue 2, we want to make sure the generated japanese text at the top and
the file name/directory name could be show BOTH correctly, with user's interaction. 

Updated

17 years ago
Blocks: 139338
Assignee

Comment 53

17 years ago
Posted patch revised (obsolete) — Splinter Review
This patch fixes all the problems ftang indicated above.
We can use 
+DirTitle=%1$S \u306e\u4e00\u89a7
+DirGoUp=\u4e0a\u4f4d\u306e\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u3078 
in necko.properties

We rely on the "intl.charset.default" setting in Pref to decode 
and we encode the strings to UTF-8 before calling
NS_NewCStringInputStream.  We denote the page 
as encoding=\"UTF-8" as well.

Bradley: can you review?
Attachment #75066 - Attachment is obsolete: true
Attachment #79023 - Attachment is obsolete: true

Comment 54

17 years ago
Changing nsbeta1+ [adt3] bugs to nsbeta1- on behalf of the adt.  If you have any
questions about this, please email adt@netscape.com.  You can search for
"changing adt3 bugs" to quickly find and delete these bug mails.
Keywords: nsbeta1-

Comment 55

17 years ago
Changing nsbeta1+ [adt3] bugs to nsbeta1- on behalf of the adt.  If you have any
questions about this, please email adt@netscape.com.  You can search for
"changing adt3 bugs" to quickly find and delete these bug mails.
Keywords: nsbeta1+

Updated

17 years ago
Attachment #80510 - Flags: needs-work+

Comment 56

17 years ago
Comment on attachment 80510 [details] [diff] [review]
revised 

Index: streamconv/converters/nsDirIndexParser.cpp

>+#include "nsIPref.h"

nsIPref is deprecated... use nsIPrefService/nsIPrefBranch instead.


>+  nsCOMPtr<nsIPref> prefs(do_GetService(NS_PREF_CONTRACTID));
>+  if (prefs)  {
>+    nsXPIDLCString defCharset;
>+    rv = prefs->GetCharPref("intl.charset.default", getter_Copies(defCharset));
>+    if (NS_SUCCEEDED(rv) && !defCharset.IsEmpty())
>+      mEncoding.Assign(defCharset);
>+    else
>+      mEncoding.Assign("ISO-8859-1");
>+  }
>+  else
>+    mEncoding.Assign("ISO-8859-1");

might be slightly clearer to decl a named literal string, something like this:

    NS_NAMED_LITERAL_CSTRING(kFallbackEncoding, "ISO-8859-1");

and use it in place of "ISO-8859-1" above.


>Index: streamconv/converters/nsIndexedToHTML.cpp

>+    buffer.Append(NS_LITERAL_STRING("\" encoding=\"UTF-8"));
>     buffer.Append(NS_LITERAL_STRING("\"?>\n") +
>                   NS_LITERAL_STRING("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" ") +
>                   NS_LITERAL_STRING("\"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">\n"));

use operator+ instead of adding another Append call... it'll avoid an
extra call into the heap allocator.


>+    nsCOMPtr<nsITextToSubURI> textToSubURI;
>+    textToSubURI = do_GetService(NS_ITEXTTOSUBURI_CONTRACTID, &rv);

would it be useful to cache this service for performance reasons?


>+    uniSpec.Assign(unEscapeSpec);

you can call uniSpec.Adopt(unEscapeSpec) to avoid a call to strdup.
BTW: i don't see unEscapeSpec being free'd anywhere.  if you use
Adopt, then you need not free unEscapeSpec.


>+    nsCString nsaBuffer;
>+    nsaBuffer.Assign(NS_ConvertUCS2toUTF8(buffer));

NS_ConvertUCS2toUTF8 nsaBuffer(buffer);

avoids an extra string copy.


>+    rv = NS_NewCStringInputStream(getter_AddRefs(inputData), nsaBuffer);

this makes a copy of nsaBuffer.  you can avoid this copy like this:

   nsCOMPtr<nsIStringInputStream> strStream(
       do_CreateInstance(NS_STRINGINPUTSTREAM_CONTRACTID, &rv));
   if (NS_FAILED(rv)) return rv;

   rv = strStream->ShareData(nsaBuffer.get(), nsaBuffer.Length());
   if (NS_FAILED(rv)) return rv;

   nsCOMPtr<nsIInputStream> inputData(do_QueryInterface(strStream, &rv));
   if (NS_FAILED(rv)) return rv;


>     if (NS_FAILED(rv)) return rv;
> 
>     rv = mListener->OnDataAvailable(request, aContext,
>-                                    inputData, 0, buffer.Length());
>+                                    inputData, 0, nsaBuffer.Length());

it is valid for strStream to not own a copy of nsaBuffer because both will
be destroyed when this function returns.

>     return rv;
> }


>+    nsCString nsaBuffer;
>+    nsaBuffer.Assign(NS_ConvertUCS2toUTF8(buffer));
>+    rv = NS_NewCStringInputStream(getter_AddRefs(inputData), nsaBuffer);

same exact pattern here.

>     rv = mListener->OnDataAvailable(request, aContext,
>+                                    inputData, 0, nsaBuffer.Length());

maybe a helper function that constructs a string input stream and
calls OnDataAvailable would be good to add??


>+    pushBuffer.Append(tmp);
>     pushBuffer.Append(NS_LITERAL_STRING("</a></td>\n <td>"));

combine these into one call to Append using operator+().


>+    nsCString nsaBuffer;
>+    nsaBuffer.Assign(NS_ConvertUCS2toUTF8(pushBuffer));
>+    nsresult rv = NS_NewCStringInputStream(getter_AddRefs(inputData),
>+                                          nsaBuffer);

looks like the same pattern shows up here as well.  a helper function
definitely makes sense.  perhaps you could even store a instance of
nsIStringInputStream as a member variable and reuse it whenever you
call OnDataAvailable.

so, overall the patch looks fine... i just have a bunch of performance
nits that should be easy enough to fix up.
Assignee

Comment 57

17 years ago
Posted patch Taking Darin's advise (obsolete) — Splinter Review
Darin: I needed to call NS_NewCStringInputStream() 
       in nsIndexedToHTML::FormatInputStream() in order for it to work.
       Browser doesn't come back, just wait for something to 
       happen. Please help. (See _Need Help!_ in attached)
Attachment #80510 - Attachment is obsolete: true

Comment 58

17 years ago
if it's not working, then no worries... just stick to NS_NewCStringInputStream.
 i was just hopeful that we could eliminate a buffer copy.  something to think
about more down the road...

Comment 59

17 years ago
Comment on attachment 81247 [details] [diff] [review]
Taking Darin's advise

sr=darin
Attachment #81247 - Flags: superreview+
Comment on attachment 81247 [details] [diff] [review]
Taking Darin's advise

Patch complains that this patch file is invalid, so I can't test, but:

>@@ -370,11 +393,8 @@
> 
>     nsXPIDLString tmp;
>     aIndex->GetDescription(getter_Copies(tmp));
>-    PRUnichar* escaped = nsEscapeHTML2(tmp.get(), tmp.Length());
>-    pushBuffer.Append(escaped);
>-    nsMemory::Free(escaped);
>-
>-    pushBuffer.Append(NS_LITERAL_STRING("</a></td>\n <td>"));
>+    pushBuffer.Append(tmp + 
>+                      NS_LITERAL_STRING("</a></td>\n <td>"));
> 

If you remove the escaping, what happens when we get a file with '<' in the
name?
Attachment #81247 - Flags: needs-work+

Comment 61

17 years ago
bbaetz: good catch!
Assignee

Comment 62

17 years ago
Posted patch update (obsolete) — Splinter Review
Few changes
1) I needed to use 
   prefs->GetLocalizedUnicharPref(..)
   instead of 
   rv = prefs->GetCharPref(..);
   If I don't change it, then moz doesn't display the ftp diretroy
   with newly created Profile.
2) remove comment 
  *** Need Help! The following **......

>If you remove the escaping, what happens when we get a file with '<' 
>in the name?
 I've tried to create a filename with '<', '>', '*'; but MS Windows 
 won't let me.	Do you have a test ftp site for this?  
 If we escape the "description", then we really didn't fix anything.

Please let me know of better solution if you have.
Attachment #81247 - Attachment is obsolete: true

Comment 63

17 years ago
yokoyama. what is the impact of nsIFile to this ?
Assignee

Comment 64

17 years ago
>yokoyama. what is the impact of nsIFile to this ?
Darin/Bradley, correct me if i'm wrong; 
but I don't think this patch impacts nsIFile. It's rather reverse.

There are number of nsIFile patches to go into the repository
and those patches may break my patch.  I may need to double check.

Can someone review my patch?
You can create such a file on unix. Try a file with &amp; in the name - is that
valid on windows? Otherwise create a file on one of the unix machines running
ftp - darin can probably help you find a machine like that.
Assignee

Comment 66

17 years ago
>darin can probably help you find a machine like that.
darin: can you create me few files with those chars in the filename?

Comment 67

17 years ago
tever should be able to help setup a testcase...

Comment 68

17 years ago
Is this what you need?  

The file Packet&amp;NT.exe is stored on an internal ftp server at

ftp://backwash/pub/
Assignee

Updated

17 years ago
Blocks: 141008
Assignee

Comment 69

17 years ago
Calling nsEscapeHTML2() for description doesn't break the unicode.

bradley: can you review?
Attachment #81582 - Attachment is obsolete: true
Assignee

Comment 70

17 years ago
There are number of bugscape bugs marked as dup 
http://bugscape.netscape.com/show_bug.cgi?id=15647
http://bugscape.netscape.com/show_bug.cgi?id=15655
Please review the patch ASAP.
renominating for nsbeta1
Keywords: nsbeta1-nsbeta1
Comment on attachment 83429 [details] [diff] [review]
Adding support for '<', '>', '&', and '"'

> 
>+nsresult
>+nsIndexedToHTML::FormatInputStream(nsIRequest* aRequest, nsISupports *aContext, nsAString &aBuffer) 
>+{
>+    nsresult rv = NS_OK;
>+    NS_ConvertUCS2toUTF8 buffer(PromiseFlatString(aBuffer));

You don't need the PromiseFlatString - NS_ConvertUCS2toUTF8 has an explicit
construcctor taking a |const nsAString&|
(In fact, the aBuffer param should be const)

>+    nsCOMPtr<nsIInputStream> inputData;
>+    rv = NS_NewCStringInputStream(getter_AddRefs(inputData), buffer);
>+    if (NS_FAILED(rv)) return rv;
>+    rv = mListener->OnDataAvailable(aRequest, aContext,
>+                                    inputData, 0, buffer.Length());
>+    return (rv);
>+}
>+

It looks fine apart from that; r=bbaetz
Attachment #83429 - Flags: review+
Assignee

Comment 72

17 years ago
darin: can you /sr=?
Attachment #83429 - Attachment is obsolete: true
Assignee

Comment 73

17 years ago
Attachment #83819 - Attachment is obsolete: true
Assignee

Comment 74

17 years ago
Comment on attachment 83820 [details] [diff] [review]
Oops, I need to remove dummy properties change

moving r=bbaetz
darin: please sr=?
thanks
Attachment #83820 - Flags: review+

Comment 75

17 years ago
Comment on attachment 83820 [details] [diff] [review]
Oops, I need to remove dummy properties change

>Index: streamconv/converters/nsIndexedToHTML.cpp
>+    nsAutoString uniSpec; 

>+    uniSpec.Adopt(unEscapeSpec);

so, no need for uniSpec to be an "auto" string if you're just going to call
Adopt to set its data.

otherwise, looks good to me .. sr=darin
Attachment #83820 - Flags: superreview+
Assignee

Comment 76

17 years ago
Adding Darin's suggestion
Attachment #83820 - Attachment is obsolete: true
Assignee

Comment 77

17 years ago
Comment on attachment 83899 [details] [diff] [review]
nsAutoString uniSpec -> nsString uniSpec

moving status:
/r=bbaetz
/sr=darin
Attachment #83899 - Flags: superreview+
Attachment #83899 - Flags: review+
Assignee

Comment 78

17 years ago
checked into the trunk
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED

Comment 79

17 years ago
iqa, please verify it. thanks
Whiteboard: [adt3] → [adt2 rtm]

Updated

17 years ago
Keywords: nsbeta1adt1.0.0, nsbeta1+

Comment 80

17 years ago
Verified fixed on trunk.
Whiteboard: [adt2 rtm] → [adt2 rtm], Verified on trunk

Comment 81

17 years ago
makr verified since rui verified on the trunk
Status: RESOLVED → VERIFIED

Updated

17 years ago
Keywords: approval

Comment 82

17 years ago
adding adt1.0.0+ for checkin to the 1.0 branch.  
Keywords: adt1.0.0adt1.0.0+

Comment 83

17 years ago
Comment on attachment 83899 [details] [diff] [review]
nsAutoString uniSpec -> nsString uniSpec

a=chofmann,brendan,scc

please check in on mozilla1.0 branch by midnight
Attachment #83899 - Flags: approval+
Assignee

Comment 84

17 years ago
Checked into the 1.0 branch

Updated

17 years ago
Blocks: 146292
No longer blocks: 141008

Comment 85

17 years ago
changing to adt1.0.1+ for checkin to the 1.0 branch.  Please get drivers
approval before checking in.
Keywords: adt1.0.0+adt1.0.1+

Updated

17 years ago
Keywords: fixed1.0.0

Comment 86

17 years ago
Verified fixed on 2002060508 branch.
seems that this is fixed on the branch (see Comment #86 From Rui Xu  2002-06-07)
= fixed1.0.1.
Blocks: 143047

Updated

17 years ago
Status: VERIFIED → REOPENED
Resolution: FIXED → ---

Comment 88

17 years ago
tested on JA-PR1 on Win XP Pro. JA + SP1 Beta1
Reoccurd.

Comment 89

17 years ago
Sorry. I used wrong build.
 
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → INVALID

Comment 90

17 years ago
verified
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.