Closed Bug 340714 Opened 18 years ago Closed 18 years ago

nsIConverterInputStream duplicates the last character in the first buffer


(Core :: Internationalization, defect)

Windows XP
Not set





(Reporter: jgrills, Assigned: smontagu)


(Keywords: verified1.8.1.22)


(2 files, 1 obsolete file)

User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1)
Build Identifier: Celtx

When reading a UTF-16 file using the nsIConverterInputStream API with nsIUnicharLineInputStream and calling the readLine() function on a UTF-16 file encoded in LE format with a starting BOM, the last character in the first buffer (where the buffer size was passed to the init call) will be duplicated and end up in the string twice. 

Reproducible: Always

Steps to Reproduce:
Here's the code that I'm using:

        var captionLocalFile = Components.classes[";1"].createInstance(Components.interfaces.nsILocalFile);
        var captionFileInputStream = Components.classes[";1"].createInstance(Components.interfaces.nsIFileInputStream);
        captionFileInputStream.init(captionLocalFile, 0x01, 0444, 0);
        var captionFile = Components.classes[";1"].createInstance(Components.interfaces.nsIConverterInputStream);
        captionFile.init(captionFileInputStream, "UTF-16",1024, Components.interfaces.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);

        if (!(captionFile instanceof Components.interfaces.nsIUnicharLineInputStream))
          throw "not line input stream";

        var more = true;
        while (more)
          // read the line and check for eof
          var line = {};
          more = captionFile.readLine(line);
Assignee: smontagu → nobody
Component: Internationalization → Networking
QA Contact: amyy → networking
Assignee: nobody → smontagu
Component: Networking → Internationalization
QA Contact: networking → amyy
does this happen for all UTF-16 files, or just a specific one? (could you attach one for easier testing?)
Ever confirmed: true
Attached patch patch (obsolete) — Splinter Review
This happens because when the various implementations of Convert() for UTF16  detect a BOM they pass the buffer without the BOM to UTF16ConvertToUnicode(), after modifying the aSrcLength passed to them, and then pass the changed value back to the caller.
Attachment #224853 - Flags: review?(jshin1987)
ok, so... why was this never an issue for anything else?
Probably because most callers of nsIUnicodeDecoder::Convert() don't do anything with aSrcLength after a successful call.
Comment on attachment 224853 [details] [diff] [review]

sorry for the delay. I thought I had reviewed this patch (I guess I did, but forgot to log-in after pressing submit button. At home, sometimes bugzilla keeps asking me to log-in for every transaction
Attachment #224853 - Flags: review?(jshin1987) → review+
Attachment #224853 - Flags: superreview?(jag)
I tested this with all combinations of UTF16-BE/LE with and without BOM, and using odd and even buffer lengths (the previous version failed horribly with an odd buffer length as jag pointed out it would).
Attachment #224853 - Attachment is obsolete: true
Attachment #229822 - Flags: superreview?(jag)
Attachment #224853 - Flags: superreview?(jag)
Comment on attachment 229822 [details] [diff] [review]
Responding to jag's sr comments on IRC

Looks good. Is there an easy way to add your tests to CVS so the next person to touch this code can run them?

BTW, mFoundBOM doesn't need to be a member variable, it can just be a local. Does anyone actually test for NS_OK_UDEC_NOBOMFOUND though?
Attachment #229822 - Flags: superreview?(jag) → superreview+
(In reply to comment #7)
> (From update of attachment 229822 [details] [diff] [review] [edit])
> Looks good. Is there an easy way to add your tests to CVS so the next person to
> touch this code can run them?

Good idea. I should be able to work out a one-click version of the tests that I can check in.

> BTW, mFoundBOM doesn't need to be a member variable, it can just be a local.

I think it does have to be a member variable. We want to preserve its state across successive buffers for the same input, no?

> Does anyone actually test for NS_OK_UDEC_NOBOMFOUND though?

Not right now: see bug 235090 for why it got added.
Checked in.
Closed: 18 years ago
Resolution: --- → FIXED
I checked in a testcase at intl/uconv/tests/test_nsIConverterInputStream.html
The previous testcase has now been replaced by an xpcshell-based test at intl/uconv/tests/test_bug340714.js
Flags: in-testsuite+
intl/uconv/tests/unit/test_bug340714.js, rather.
Checked in to 1.8 branch (with bug 317216)
Keywords: fixed1.8.1.22
A stand-alone version of the unit test for 1.8.x testing
Verified fixed for with Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: Gecko/20090604 SeaMonkey/1.1.17pre. Stand-alone testcase passes.

Passes in Trunk as well: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090604 Minefield/3.6a1pre
You need to log in before you can comment on or make changes to this bug.