Closed Bug 1263088 Opened 8 years ago Closed 8 years ago

Fix nsMsgI18NConvertFromUnicode to handle NS_OK_UENC_MOREOUTPUT or use GetMaxLength()

Categories

(MailNews Core :: Composition, defect)

defect
Not set
normal

Tracking

(thunderbird45 affected, thunderbird46 affected, thunderbird47 affected, thunderbird48 affected)

RESOLVED INVALID
Tracking Status
thunderbird45 --- affected
thunderbird46 --- affected
thunderbird47 --- affected
thunderbird48 --- affected

People

(Reporter: jorgk-bmo, Assigned: jorgk-bmo)

Details

See bug 1251120 comment #37 or bug 1255863 comment #21 (secure bug, quoting relevant comment here):

[Use GetMaxLength()] or handle NS_OK_UENC_MOREOUTPUT correctly. The current nsMsgI18N code does not differentiate it with NS_OK, so it does not look correct for me.

Code is here:
https://dxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgI18N.cpp#91

P.S.: MailNews::Composition is not the correct component, but I didn't see a better one.
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Masatoshi-san, I've been looking it the MailNews code and added a few debug prints, but I can't see anything wrong with it:
https://dxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgI18N.cpp#91

This is the code a little reduced and commented:

  int32_t originalUnicharLength = inString.Length();
  int32_t srcLength;
  int32_t dstLength;
  char localbuf[512];
  int32_t consumedLen = 0;

  outString.Truncate();
  // convert
  while (consumedLen < originalUnicharLength) {
    srcLength = originalUnicharLength - consumedLen;  
    dstLength = 512;
// Here we pass in the full source length, so generally much more than
// can be returned in 512 bytes. Therefore the function mostly returns
// NS_OK_UENC_MOREOUTPUT. But it also populates srcLength with the number of
// characters that got processed.
    rv = encoder->Convert(currentSrcPtr, &srcLength, localbuf, &dstLength);

// NS_OK_UENC_MOREOUTPUT doesn't trigger NS_FAILED, so we continue.
    if (NS_FAILED(rv) || dstLength == 0)
      break;
// we append the returned result.
    outString.Append(localbuf, dstLength);

// we move on in the input, but only by the number of characters that got consumed.
    currentSrcPtr += srcLength;
    consumedLen = currentSrcPtr - originalSrcPtr; // src length used so far
  }

The debug I see is this, the first number is srcLength and the second dstLength:

===== NS_OK_UENC_MOREOUTPUT returned, 512, 512
===== NS_OK_UENC_MOREOUTPUT returned, 509, 512
===== NS_OK_UENC_MOREOUTPUT returned, 512, 512
===== NS_OK_UENC_MOREOUTPUT returned, 509, 512
===== NS_OK_UENC_MOREOUTPUT returned, 509, 512
===== NS_OK_UENC_MOREOUTPUT returned, 503, 512
===== NS_OK_UENC_MOREOUTPUT returned, 506, 512
===== NS_OK_UENC_MOREOUTPUT returned, 512, 512
===== NS_OK_UENC_MOREOUTPUT returned, 512, 512
===== NS_OK_UENC_MOREOUTPUT returned, 506, 512
===== NS_OK_UENC_MOREOUTPUT returned, 512, 512
===== NS_OK_UENC_MOREOUTPUT returned, 512, 512
===== NS_OK_UENC_MOREOUTPUT returned, 512, 512
===== NS_OK returned, 54, 54

So, starting with a very long string of many KB, the encoder works it's way thought the input, encoding chunks of about 512 characters every time, sometimes less. Apart from the last time, it *always* returns NS_OK_UENC_MOREOUTPUT.

I don't see any problem, do you?
Flags: needinfo?(VYV03354)
Just out of interest: Is it expected that always 512 bytes are returned? Couldn't there be cases where the output buffer is not quite full, but encoding one more input character would overflow it. In this case I would expect an "underful" buffer to be returned.
Ah, I missed the while loop and currentSrcPtr updates based on the srcLength output. This code looks fine. Sorry for the false alarm.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(VYV03354)
Resolution: --- → INVALID
Can you please still answer my question from comment #2. How come that the dstLength is always returned as the full buffer length passed in? I've tried passing in other length, like 511, 499, 77, even down to 7 and I tried with other data, but I've never seen a case were less would be returned.

(repeating): Couldn't there be cases where the output buffer is not quite full, but encoding one more input character would overflow it. In this case I would expect an "underful" buffer to be returned.
Flags: needinfo?(VYV03354)
You should not depend on the implementation detail of the encoder unless it is guaranteed in the nsIUnicodeEncoder contract, period.
Flags: needinfo?(VYV03354)
Thanks for the reply. As you can see in comment #1, we're not relying on the implementation detail as we continue with the next "non-consumed" chunk from the input and copy as many bytes of output as we're told via the output length parameter.

However, you haven't answered my question, so let me repeat:

===
How come that the dstLength is always returned as the full buffer length passed in? I've tried passing in other lengths, like 511, 499, 77, even down to 7 and I tried with other data, but I've never seen a case were less would be returned.

Couldn't there be cases where the output buffer is not quite full, but encoding one more input character would overflow it. In this case I would expect an "underful" buffer to be returned, so dstLength being returned less than was passed it.
===

So please explain a little of the implementation detail just for my personal interest if that's not too much trouble.
Flags: needinfo?(VYV03354)
NS_OK_UENC_MOREOUTPUT means that caller did not supply enough buffers to store the output. So it is natural that the encoder implementation tries to fill the buffer as much as possible. But it is up to implementation's decision. Again, you can't assume that the encoder will always fill the buffer.
Flags: needinfo?(VYV03354)
(In reply to Masatoshi Kimura [:emk] from comment #7)
> Again, you can't assume that the encoder will always fill the buffer.
Again, we don't rely on it.

OK, let me ask more directly. I'm just a layman in the field. The input is char16_t and the output is simple char. Let's assume that you have two input characters, each stored in two char16_t. Let's further assume that both encode to three bytes.

So in bytes:
Input:  i1b1 i1b2 i2b1 i2b2
Output: o1b1 o1b2 o1b3 o2b1 o2b2 o2b3.

Let's assume we provide a 4-byte output buffer.

So I'd assume that srcLength would be returned as 1 and dstLength would be returned as 3.

It would be more complicated to return srcLength as 2, dstLength as 4 and buffer the bytes that you still need to return next time, that's o2b2 o2b3.

But maybe that's what you're doing.
You need to log in before you can comment on or make changes to this bug.