Last Comment Bug 340714 - nsIConverterInputStream duplicates the last character in the first buffer
: nsIConverterInputStream duplicates the last character in the first buffer
Status: VERIFIED FIXED
: verified1.8.1.22
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: x86 Windows XP
: -- normal (vote)
: ---
Assigned To: Simon Montagu :smontagu
: Yuying Long
: Makoto Kato [:m_kato]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-06-07 12:44 PDT by Jeff Grills
Modified: 2009-06-04 14:20 PDT (History)
3 users (show)
smontagu: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (6.02 KB, patch)
2006-06-08 07:07 PDT, Simon Montagu :smontagu
jshin1987: review+
Details | Diff | Splinter Review
Responding to jag's sr comments on IRC (8.09 KB, patch)
2006-07-19 08:01 PDT, Simon Montagu :smontagu
jag-mozilla: superreview+
Details | Diff | Splinter Review
Test case (requests enhanced privileges) (5.32 KB, text/html)
2009-04-07 00:50 PDT, Simon Montagu :smontagu
no flags Details

Description Jeff Grills 2006-06-07 12:44:32 PDT
User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1)
Build Identifier: Celtx 0.9.7.1RC

When reading a UTF-16 file using the nsIConverterInputStream API with nsIUnicharLineInputStream and calling the readLine() function on a UTF-16 file encoded in LE format with a starting BOM, the last character in the first buffer (where the buffer size was passed to the init call) will be duplicated and end up in the string twice. 

Reproducible: Always

Steps to Reproduce:
Here's the code that I'm using:

        var captionLocalFile = Components.classes["@mozilla.org/file/local;1"].createInstance(Components.interfaces.nsILocalFile);
        captionLocalFile.initWithPath(captionFileName);
        var captionFileInputStream = Components.classes["@mozilla.org/network/file-input-stream;1"].createInstance(Components.interfaces.nsIFileInputStream);
        captionFileInputStream.init(captionLocalFile, 0x01, 0444, 0);
        var captionFile = Components.classes["@mozilla.org/intl/converter-input-stream;1"].createInstance(Components.interfaces.nsIConverterInputStream);
        captionFile.init(captionFileInputStream, "UTF-16",1024, Components.interfaces.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);

        if (!(captionFile instanceof Components.interfaces.nsIUnicharLineInputStream))
          throw "not line input stream";

        var more = true;
        while (more)
        {
          // read the line and check for eof
          var line = {};
          more = captionFile.readLine(line);
        }
Comment 1 Christian :Biesinger (don't email me, ping me on IRC) 2006-06-08 00:31:01 PDT
does this happen for all UTF-16 files, or just a specific one? (could you attach one for easier testing?)
Comment 2 Simon Montagu :smontagu 2006-06-08 07:07:20 PDT
Created attachment 224853 [details] [diff] [review]
patch

This happens because when the various implementations of Convert() for UTF16  detect a BOM they pass the buffer without the BOM to UTF16ConvertToUnicode(), after modifying the aSrcLength passed to them, and then pass the changed value back to the caller.
Comment 3 Christian :Biesinger (don't email me, ping me on IRC) 2006-06-08 09:34:24 PDT
ok, so... why was this never an issue for anything else?
Comment 4 Simon Montagu :smontagu 2006-06-08 14:04:31 PDT
Probably because most callers of nsIUnicodeDecoder::Convert() don't do anything with aSrcLength after a successful call.
Comment 5 Jungshik Shin 2006-07-18 19:08:39 PDT
Comment on attachment 224853 [details] [diff] [review]
patch

r=jshin
sorry for the delay. I thought I had reviewed this patch (I guess I did, but forgot to log-in after pressing submit button. At home, sometimes bugzilla keeps asking me to log-in for every transaction
Comment 6 Simon Montagu :smontagu 2006-07-19 08:01:26 PDT
Created attachment 229822 [details] [diff] [review]
Responding to jag's sr comments on IRC

I tested this with all combinations of UTF16-BE/LE with and without BOM, and using odd and even buffer lengths (the previous version failed horribly with an odd buffer length as jag pointed out it would).
Comment 7 jag (Peter Annema) 2006-07-19 14:31:32 PDT
Comment on attachment 229822 [details] [diff] [review]
Responding to jag's sr comments on IRC

Looks good. Is there an easy way to add your tests to CVS so the next person to touch this code can run them?

BTW, mFoundBOM doesn't need to be a member variable, it can just be a local. Does anyone actually test for NS_OK_UDEC_NOBOMFOUND though?
Comment 8 Simon Montagu :smontagu 2006-07-19 23:26:30 PDT
(In reply to comment #7)
> (From update of attachment 229822 [details] [diff] [review] [edit])
> Looks good. Is there an easy way to add your tests to CVS so the next person to
> touch this code can run them?
> 

Good idea. I should be able to work out a one-click version of the tests that I can check in.

> BTW, mFoundBOM doesn't need to be a member variable, it can just be a local.

I think it does have to be a member variable. We want to preserve its state across successive buffers for the same input, no?

> Does anyone actually test for NS_OK_UDEC_NOBOMFOUND though?

Not right now: see bug 235090 for why it got added.
Comment 9 Simon Montagu :smontagu 2006-07-20 00:50:20 PDT
Checked in.
Comment 10 Simon Montagu :smontagu 2006-07-23 10:55:11 PDT
I checked in a testcase at intl/uconv/tests/test_nsIConverterInputStream.html
Comment 11 Simon Montagu :smontagu 2007-03-01 03:08:44 PST
The previous testcase has now been replaced by an xpcshell-based test at intl/uconv/tests/test_bug340714.js
Comment 12 Simon Montagu :smontagu 2007-03-01 03:09:51 PST
intl/uconv/tests/unit/test_bug340714.js, rather.
Comment 13 Simon Montagu :smontagu 2009-04-07 00:47:20 PDT
Checked in to 1.8 branch (with bug 317216)
Comment 14 Simon Montagu :smontagu 2009-04-07 00:50:15 PDT
Created attachment 371403 [details]
Test case (requests enhanced privileges)

A stand-alone version of the unit test for 1.8.x testing
Comment 15 Al Billings [:abillings] 2009-06-04 14:20:45 PDT
Verified fixed for 1.8.1.22 with Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.22pre) Gecko/20090604 SeaMonkey/1.1.17pre. Stand-alone testcase passes.

Passes in Trunk as well: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090604 Minefield/3.6a1pre

Note You need to log in before you can comment on or make changes to this bug.