nsIConverterInputStream duplicates the last character in the first buffer




13 years ago
10 years ago


(Reporter: jgrills, Assigned: smontagu)



Windows XP
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)



(2 attachments, 1 obsolete attachment)



13 years ago
User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1)
Build Identifier: Celtx

When reading a UTF-16 file using the nsIConverterInputStream API with nsIUnicharLineInputStream and calling the readLine() function on a UTF-16 file encoded in LE format with a starting BOM, the last character in the first buffer (where the buffer size was passed to the init call) will be duplicated and end up in the string twice. 

Reproducible: Always

Steps to Reproduce:
Here's the code that I'm using:

        var captionLocalFile = Components.classes["@mozilla.org/file/local;1"].createInstance(Components.interfaces.nsILocalFile);
        var captionFileInputStream = Components.classes["@mozilla.org/network/file-input-stream;1"].createInstance(Components.interfaces.nsIFileInputStream);
        captionFileInputStream.init(captionLocalFile, 0x01, 0444, 0);
        var captionFile = Components.classes["@mozilla.org/intl/converter-input-stream;1"].createInstance(Components.interfaces.nsIConverterInputStream);
        captionFile.init(captionFileInputStream, "UTF-16",1024, Components.interfaces.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);

        if (!(captionFile instanceof Components.interfaces.nsIUnicharLineInputStream))
          throw "not line input stream";

        var more = true;
        while (more)
          // read the line and check for eof
          var line = {};
          more = captionFile.readLine(line);
Assignee: smontagu → nobody
Component: Internationalization → Networking
QA Contact: amyy → networking
Assignee: nobody → smontagu
Component: Networking → Internationalization
QA Contact: networking → amyy
does this happen for all UTF-16 files, or just a specific one? (could you attach one for easier testing?)


13 years ago
Ever confirmed: true

Comment 2

13 years ago
Created attachment 224853 [details] [diff] [review]

This happens because when the various implementations of Convert() for UTF16  detect a BOM they pass the buffer without the BOM to UTF16ConvertToUnicode(), after modifying the aSrcLength passed to them, and then pass the changed value back to the caller.
Attachment #224853 - Flags: review?(jshin1987)
ok, so... why was this never an issue for anything else?

Comment 4

13 years ago
Probably because most callers of nsIUnicodeDecoder::Convert() don't do anything with aSrcLength after a successful call.

Comment 5

13 years ago
Comment on attachment 224853 [details] [diff] [review]

sorry for the delay. I thought I had reviewed this patch (I guess I did, but forgot to log-in after pressing submit button. At home, sometimes bugzilla keeps asking me to log-in for every transaction
Attachment #224853 - Flags: review?(jshin1987) → review+


13 years ago
Attachment #224853 - Flags: superreview?(jag)

Comment 6

13 years ago
Created attachment 229822 [details] [diff] [review]
Responding to jag's sr comments on IRC

I tested this with all combinations of UTF16-BE/LE with and without BOM, and using odd and even buffer lengths (the previous version failed horribly with an odd buffer length as jag pointed out it would).
Attachment #224853 - Attachment is obsolete: true
Attachment #229822 - Flags: superreview?(jag)
Attachment #224853 - Flags: superreview?(jag)

Comment 7

13 years ago
Comment on attachment 229822 [details] [diff] [review]
Responding to jag's sr comments on IRC

Looks good. Is there an easy way to add your tests to CVS so the next person to touch this code can run them?

BTW, mFoundBOM doesn't need to be a member variable, it can just be a local. Does anyone actually test for NS_OK_UDEC_NOBOMFOUND though?
Attachment #229822 - Flags: superreview?(jag) → superreview+

Comment 8

13 years ago
(In reply to comment #7)
> (From update of attachment 229822 [details] [diff] [review] [edit])
> Looks good. Is there an easy way to add your tests to CVS so the next person to
> touch this code can run them?

Good idea. I should be able to work out a one-click version of the tests that I can check in.

> BTW, mFoundBOM doesn't need to be a member variable, it can just be a local.

I think it does have to be a member variable. We want to preserve its state across successive buffers for the same input, no?

> Does anyone actually test for NS_OK_UDEC_NOBOMFOUND though?

Not right now: see bug 235090 for why it got added.

Comment 9

13 years ago
Checked in.
Last Resolved: 13 years ago
Resolution: --- → FIXED

Comment 10

13 years ago
I checked in a testcase at intl/uconv/tests/test_nsIConverterInputStream.html

Comment 11

12 years ago
The previous testcase has now been replaced by an xpcshell-based test at intl/uconv/tests/test_bug340714.js
Flags: in-testsuite+

Comment 12

12 years ago
intl/uconv/tests/unit/test_bug340714.js, rather.

Comment 13

10 years ago
Checked in to 1.8 branch (with bug 317216)
Keywords: fixed1.8.1.22

Comment 14

10 years ago
Created attachment 371403 [details]
Test case (requests enhanced privileges)

A stand-alone version of the unit test for 1.8.x testing
Verified fixed for with Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: Gecko/20090604 SeaMonkey/1.1.17pre. Stand-alone testcase passes.

Passes in Trunk as well: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090604 Minefield/3.6a1pre
Keywords: fixed1.8.1.22 → verified1.8.1.22
You need to log in before you can comment on or make changes to this bug.