Closed Bug 110531 Opened 23 years ago Closed 23 years ago

move ConverterInputStream into uconv

Categories

(Core :: XPCOM, defect, P1)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla0.9.7

People

(Reporter: alecf, Assigned: alecf)

References

Details

Attachments

(6 files, 3 obsolete files)

NS_NewConverterStream currently takes a charset as an argument, which is only
used to go retrieve a unicode converter for that charset. 
There are actually only a few consumers, so this should be easy to fix:
/xpcom/tests/CvtURL.cpp, line 87 -- rv = NS_NewConverterStream(&uin, nsnull, in,
0, cset);
/content/html/style/src/nsCSSLoader.cpp, line 1234 -- result =
NS_NewConverterStream(&uin, nsnull, in);
/content/html/style/src/nsCSSLoader.cpp, line 1574 -- result =
NS_NewConverterStream(&uin, nsnull, in, 0, &mCharset);
/htmlparser/src/nsExpatTokenizer.cpp, line 884 -- nsresult res =
NS_NewConverterStream(&uniIn,
/layout/html/tests/TestCSSScanner.cpp, line 85 -- rv =
NS_NewConverterStream(&uin, nsnull, in);
/layout/html/tests/TestCSSParser.cpp, line 185 -- rv =
NS_NewConverterStream(&uin, nsnull, in);
Argh. I was going to just switch the API to take a converter, but I didn't
realize that nsIUnicodeDecoder was a part of uconv (as well it should be)

So now I'm going to move this whole class into uconv, and let consumers
do_CreateInstance to get to it.
As a part of this, I'm going to change the interface to nsIConverterInputStream
and derive it from the current nsIUnicharInputStream and add SetCharset().

the SetCharset() method will set the charset that we want to convert from.. it
will default to ISO-8859-1 just like the current one does.

Summary: NS_NewConverterStream should take a converter, not a charset → move ConverterInputStream into uconv
This is basically just a copy, cleaned up signifigantly, of
ConverterInputStream from xpcom/io/nsUnicharInputStream.

Ignore the removal of "chardet" from the REQUIRES lines, I won't check those
in.

Can I get an r=dougt here?
The one trick is going to be reading in nsPersistentProperties, where we're
decoding UTF8 on the fly. I'm going to try to use NS_ConvertUTF8toUCS2() to
handle this.
Blocks: 100676
Priority: -- → P1
Target Milestone: --- → mozilla0.9.7
overall looks good.  Couple of comments:

Shouldn't we just bite the bullet and define nsIConverterInputStream in IDL.

In nsConvertInputStream.cpp:
	could you make the default buffer size a #define.  
	In read, is it possible that (mUnicharData->GetBuffer() + mUnicharDataOffset) >
aCount?
I thought about making it in IDL, but then I discovered there's no legal way to
declare a copy-buffer - i.e. a fixed size buffer that you pass in, allocated by
the caller... so it might as well be in C++

on the other stuff, I don't know that well :) Most of this is just copied over
from the implementation in xpcom/io
Status: NEW → ASSIGNED
well the other interesting thing here is that 3 of the 4 non-test uses of
NS_NewConverterStream are actually just converting from UTF8.

Go figure.

In any case, one of the UTF8 uses is in xpcom, so I'm going to make a new
UTF8ConverterStream which does on-the-fly conversion from UTF8, both for speed
and because we need it in xpcom. 
I'm going to try to leverage NS_ConvertUTF8toUCS2() for that. Wish me luck.

just changed the 8192 to be a #define...

in ::Read(), we make sure that if aCount > mUnicharDataLen -
mUnicharDataOffset, then we cap it at aCount (i.e. at "rv") - again, this is
just code that came from the old xpcom code - I've now gone over it (Because I
wrote a custom UTF8 converter based on it) and understand how/why it works :)

Can I get an sr=/r=, and then I'll go update the caller?
Attachment #58177 - Attachment is obsolete: true
Ignore the NS_NewUTF8ConverterStream junk - I won't be checking that part in
until later.
Attachment #59294 - Flags: review+
Comment on attachment 59295 [details] [diff] [review]
convert the CSS loader (ignore the UTF8 change)

although I don't like the nested if-statements and non sequiturs... maybe you
want to clean up this code?  Either way r=dougt.
Attachment #59295 - Flags: review+
I need UTF8Traits to be public so that I can access it from my new UTF8
converter stream.
Comment on attachment 59294 [details] [diff] [review]
address dougt's concerns

sr=darin
Attachment #59294 - Flags: superreview+
This patch changes the unicode converter in XPCOM over to a UTF8-only converter
- this should result in a faster conversion (since were're no longer calling a
bunch of virtual calls every 8k) and drops our dependency on uconv.

It also updates all consumers.

htmlparser/src/nsExpatTokenizer.cpp overlaps with the patch in attachment 59295 [details] [diff] [review]


there is also a bit of bug 100676 that I certainly wouldn't object to someone
reviewing.
cc jag for possible reviews - I still need an r= on attachment 59924 [details] [diff] [review]
Comment on attachment 59924 [details] [diff] [review]
switch to utf8converterstream

>Index: xpcom/io/nsUnicharInputStream.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/xpcom/io/nsUnicharInputStream.cpp,v
>retrieving revision 3.26
>diff -u -r3.26 nsUnicharInputStream.cpp
>--- nsUnicharInputStream.cpp	2001/09/28 20:12:42	3.26
>+++ nsUnicharInputStream.cpp	2001/11/30 22:35:39

> }
> 
>-nsresult ConverterInputStream::Read(PRUnichar* aBuf,
>+nsresult UTF8InputStream::Read(PRUnichar* aBuf,
>                                     PRUint32 aOffset,
>                                     PRUint32 aCount,
>                                     PRUint32 *aReadCount)

Nit: indentation.

r=jag
Attachment #59924 - Flags: review+
Comment on attachment 59295 [details] [diff] [review]
convert the CSS loader (ignore the UTF8 change)

sr=darin
Attachment #59295 - Flags: superreview+
Comment on attachment 59924 [details] [diff] [review]
switch to utf8converterstream

sr=darin
Attachment #59924 - Flags: superreview+
yay! fix is in
I think this means that nsFileSpec/nsLocalFile are now the only consumers of uconv
oops, actually mark that fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
hey alec, could this change to the css loader have caused the tinderbox machines
to all go orange? looks like they are hanging when trying to load some web pages
right after you checked this stuff in. 
The orange was the result of an infinite loop (with NS_WARNING every cycle) in
nsUnicharInputStream.  CountValidUTF8Bytes was given a buffer that was not null
terminated, and the character after the end was not a valid UTF8 character, so
it would go into an infinite loop.  I checked in the following fix, which I
believe is correct but I'd like alecf and others to review it:

Index: nsUnicharInputStream.cpp
===================================================================
RCS file: /cvsroot/mozilla/xpcom/io/nsUnicharInputStream.cpp,v
retrieving revision 3.27
retrieving revision 3.28
diff -u -d -r3.27 -r3.28
--- nsUnicharInputStream.cpp	4 Dec 2001 01:10:35 -0000	3.27
+++ nsUnicharInputStream.cpp	4 Dec 2001 04:22:44 -0000	3.28
@@ -276,7 +276,7 @@
   const char *lastchar = aBuffer;
   
   PRInt32 bytes = 0;
-  while (*c && bytes <= aMaxBytes) {
+  while (*c && bytes < aMaxBytes) {
     lastchar = c;
     if (UTF8traits::isASCII(*c)) {
       c++;


after checking that in, I noticed an additional bit of code that I think isn't
necessary, so I think perhaps we should check the following in as well.  Thoughts?

Index: nsUnicharInputStream.cpp
===================================================================
RCS file: /cvsroot/mozilla/xpcom/io/nsUnicharInputStream.cpp,v
retrieving revision 3.28
diff -u -d -r3.28 nsUnicharInputStream.cpp
--- nsUnicharInputStream.cpp	4 Dec 2001 04:22:44 -0000	3.28
+++ nsUnicharInputStream.cpp	4 Dec 2001 04:40:44 -0000
@@ -273,11 +273,9 @@
 UTF8InputStream::CountValidUTF8Bytes(const char* aBuffer, PRInt32 aMaxBytes)
 {
   const char *c = aBuffer;
-  const char *lastchar = aBuffer;
   
   PRInt32 bytes = 0;
   while (*c && bytes < aMaxBytes) {
-    lastchar = c;
     if (UTF8traits::isASCII(*c)) {
       c++;
       bytes++;
@@ -311,12 +309,6 @@
       NS_WARNING("Unrecognized UTF8 string in
UTF8InputStream::CountValidUTF8Bytes()");
   }
 
-  // if we skipped pas the end of the buffer, back up to the last character
-  if (bytes > aMaxBytes) {
-    c = lastchar;
-    bytes = (c-aBuffer);
-  }
-  
   return bytes;
 }
Comment on attachment 60293 [details] [diff] [review]
also fix massive numbers of threadsafety assertions

sr=waterson, check it in.
Attachment #60293 - Flags: superreview+
Comment on attachment 60293 [details] [diff] [review]
also fix massive numbers of threadsafety assertions

I checked the NS_INIT_ISUPPORTS() to fix the threadsafety assertions in so that
debug builds are usable again.
Comment on attachment 60291 [details] [diff] [review]
ditto, but a little more cleanup

nice. thanks.
Attachment #60291 - Flags: superreview+
You'll want to check |c < end| before you |*c|. Also, compare to the logic I
used in CalculateUTF8Length::write, where the return value is what you're
looking for.
Comment on attachment 60408 [details] [diff] [review]
attachment 60291 [details] [diff] [review], but also fix the UMR jag pointed out by switching order around &&

r=jag
Attachment #60408 - Flags: review+
Comment on attachment 60408 [details] [diff] [review]
attachment 60291 [details] [diff] [review], but also fix the UMR jag pointed out by switching order around &&

sr=alecf
Attachment #60408 - Flags: superreview+
Comment on attachment 60408 [details] [diff] [review]
attachment 60291 [details] [diff] [review], but also fix the UMR jag pointed out by switching order around &&

Checked this in 2001-12-04 19:49 PDT.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: