nsIConverterInputStream.readString() no longer reliably returns data

RESOLVED FIXED in mozilla26

Status

()

Core
XPCOM
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: Chris Tomlinson, Assigned: mats)

Tracking

({regression})

Trunk
mozilla26
regression
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

5 years ago
I've found a problem with the 2013-08-21 nightly build which might be related to bug #486080. I'm not sure if it's a real bug or more of a problem with the way my addon (https://addons.mozilla.org/en-US/firefox/addon/keefox/) handles network connections but something has definitely changed unexpectedly.

The add-on reads UTF8 data from a network stream using JavaScript code similar to this:

  // this.transport is an SSL connection created by 
  // Components.interfaces.nsISocketTransportService
  this.raw_istream = this.transport.openInputStream(0, 512, 0);
  this.istream = Cc["@mozilla.org/intl/converter-input-stream;1"]
               .createInstance(Ci.nsIConverterInputStream);
  this.istream.init(this.raw_istream, "UTF-8", 0,
    Ci..nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);

  //... Set up an instance of Components.interfaces.nsIInputStreamPump 
  // to call the following function when the network buffers have new data
  //...

  this.onDataAvailable = function(request, ctx, inputStream, offset, count)
  {
    var fullString = "";
    var str = {};
    while (this.istream.readString(4096, str) != 0)
      fullString += str.value;
  }

The 2nd call to readString returns 0 in Nightly 2013-08-21 and later.

There is obviously a lot of other add-on code around the edges so I can't say whether it is a problem with the 2nd readString call, 2nd and subsequent calls or just something about the data that gets sent during that 2nd call. The latter is unlikely since it's always just a small UTF8 encoded string (very similar to the data received in the first onDataAvailable call).

onDataAvailable is being called with the same offset and count parameters as in the working Nightly 2013-08-20 but I don't actually use those parameters because the istream offers no way to do so.

As requested in #486080, I first used the 2013-08-20 Nightly debug build and verified that it worked correctly. Then using the 2013-08-30 Nightly build I repeated the test and found some debug errors.

1) Loaded Firefox to the point of full UI responsiveness and the stream of console debug warnings had slowed enough to suggest all startup tasks were complete
2) Opened KeePass (this opens a listening TCP port)
3) Within a few seconds the KeeFox add-on attempted its regular connection to the KeePass TCP port
4) This triggered a debug assertion on the console followed by a break in VS2010 (I ran Firefox via VS2010 as per my usual add-on debug procedure)

These two lines were output to the console (not sure if the first is relevant but it is repeatable, bar the address):

pldhash: for the table at address 0A77764C, the given entrySize of 80 definitely
 favors chaining over double hashing.
[Parent 9688] ###!!! ASSERTION: OnDataAvailable implementation consumed no data:
 'Error', file e:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/netwerk
/base/src/nsInputStreamPump.cpp, line 550

(Repeated tests were similar but without the "[Parent 9688] " output).

The VS2010 break ended up on this line:

C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\crt\src\free.c:50: retval = HeapFree(_crtheap, 0, pBlock);

I don't have any symbols loaded so can't really give any more information about the circumstances around the call to HeapFree.

I guess it makes sense that the "OnDataAvailable implementation consumed no data" because the underlying call to readString failed to consume any data so I'm not sure this helps identify the cause of the change in readString behaviour or not.
(Assignee)

Comment 1

5 years ago
Thanks for the detailed report!

I'm not very familiar with the details xpcom streams, but perhaps Benjamin
can help us.

Can you try running the failing debug build with XPCOM_DEBUG_BREAK=warn
set in the environment:
https://developer.mozilla.org/en-US/docs/Debugging_Mozilla_on_Windows_FAQ#Disabling_ASSERTIONS
then you shouldn't get a trap to the debugger, just a warning on the console
when the assertion occurs.

Also, perhaps it might be worth trying a dump(str.value) in the loop
(Assignee)

Comment 2

5 years ago
oops, hit Submit to soon...  continuing:

Also, perhaps it might be worth trying a dump(str.value) in the loop
in both builds to see if there's any difference in what your read?

Fwiw, the documentation for OnDataAvailable seems to suggest you should
read 'count' bytes:
https://developer.mozilla.org/en-US/docs/XPCOM_Stream_Guide#Stream_Listeners
https://developer.mozilla.org/en-US/docs/XPCOM_Interface_Reference/nsIStreamListener

Still, I don't see why bug 486080 would cause any change in behavior.
We should try to make your code snippet above into a regression test
if possible.  Perhaps a xpcshell test?
https://developer.mozilla.org/en-US/docs/Mozilla/QA/Automated_testing

Also, we have debug symbols if you need them:
https://developer.mozilla.org/en/docs/Using_the_Mozilla_symbol_server

Comment 3

5 years ago
This is reasonably likely to be a regression from bug 486080, because even though that bug is labeled as removing unused interfaces, they were in fact used internally and the patch in that bug did change the converter streams internally.

Michael, are you able to look at this report and help Chris figure out what's going on?
Blocks: 486080
Flags: needinfo?(mjh563)
Keywords: regression

Comment 4

5 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> Michael, are you able to look at this report and help Chris figure out
> what's going on?

Yes, I'll take a look.
Flags: needinfo?(mjh563)
(Reporter)

Comment 5

5 years ago
I just put a few dump()s around that code and can confirm that the counts returned from the readString call match the string output length.

1) onDataAvailable called with aCount = 46
2) readString(4096,str) returns 46; str.value contains a 46 character string
3) readString(4096,str) returns 0; str.value contains an empty string; loop exits and onDataAvailable ends
4) onDataAvailable called with aCount = 61
5) readString(4096,str) returns 0; str.value contains an empty string; loop exits and onDataAvailable ends

In the working build str.value contains the expected 61 character string in step 5 but is otherwise identical.

Disabling the assertion break worked as expected - it's still logged to console but there are no more VS2010 breaks. Thanks for the tip.

I had always assumed I was supposed to be doing something with the count passed to onDataAvailable but couldn't see how to because readString deals with (potentially multi-byte) characters and onDataAvailable deals in bytes.

As it happens, in the example outlined above the UTF-8 strings contain only single-byte ASCII characters but once an initial handshake has finished, the add-on protocol goes on to transfer data that may contain multi-byte characters.

I've managed to get Firefox to build on my Win7 machine and have discovered the "mach xpcshell-test" command which I'm currently running. I'll have a little bit of time to spend on this tomorrow but perhaps not enough to learn how to write a suitable xpcshell test.
(Assignee)

Comment 6

5 years ago
Hmm, I wonder if the error is caused by SetLength(0) in NS_FillArray.

It leads to RemoveElementsAt(0, Length()):
http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTArray.h#1261
and then ShiftData calls ShrinkCapacity:
http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTArray-inl.h#236
  if (mHdr->mLength == 0) {
    ShrinkCapacity(elemSize, elemAlign);
which resets the header to EmptyHdr():
http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTArray-inl.h#208
which has zero capacity.  That's not what we want...
(Assignee)

Comment 7

5 years ago
Created attachment 798074 [details] [diff] [review]
debug patch

This confirms my hunch, I see a dozen or so of:
oops, SetLength(0) changed our capacity 8192 => 0
just when starting up...
(Assignee)

Comment 8

5 years ago
Created attachment 798076 [details] [diff] [review]
hack

This (UGLY) hack should fix it.
We should probably add a blessed SetLengthButNotCapacity method on nsTArray.

Comment 9

5 years ago
(In reply to Mats Palmgren (:mats) from comment #6)
> Hmm, I wonder if the error is caused by SetLength(0) in NS_FillArray.
> 
> It leads to RemoveElementsAt(0, Length()):
> http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTArray.h#1261
> and then ShiftData calls ShrinkCapacity:
> http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTArray-inl.h#236
>   if (mHdr->mLength == 0) {
>     ShrinkCapacity(elemSize, elemAlign);
> which resets the header to EmptyHdr():
> http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTArray-inl.h#208
> which has zero capacity.  That's not what we want...

I actually spotted the problem with SetLength(0) and commented on it when submitting my original patch (see bug 486080 comment 18 point 2). But I thought it would be OK, because it would only happen when there was no more data to be read.

But clearly that reasoning was incorrect, if a read of 0 bytes isn't the end of the stream. :(
(Reporter)

Comment 11

5 years ago
Yep that fixes it. Thanks.
(Assignee)

Comment 12

5 years ago
Created attachment 798197 [details] [diff] [review]
fix+test

Green on Try, fwiw.
Assignee: nobody → matspal
Attachment #798074 - Attachment is obsolete: true
Attachment #798076 - Attachment is obsolete: true
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attachment #798197 - Flags: review?(benjamin)

Updated

5 years ago
Attachment #798197 - Flags: review?(benjamin) → review+
(Assignee)

Comment 13

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/6f837a184a90
Flags: in-testsuite+
OS: Windows 7 → All
Hardware: x86_64 → All
https://hg.mozilla.org/mozilla-central/rev/6f837a184a90
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
(Reporter)

Comment 15

5 years ago
I can confirm that the problem is resolved in Nightly 2013-09-09. Thanks!
You need to log in before you can comment on or make changes to this bug.