Closed Bug 11981 Opened 25 years ago Closed 25 years ago

nsStr/nsString off-by-N bugs

Categories

(Core :: XPCOM, defect, P3)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: brendan, Assigned: brendan)

Details

(Whiteboard: [09271999]waiting to hear from reporter)

Attachments

(3 files)

While implementing javascript: URLs, I found a couple of bugs in xpcom/ds/nsStr*
and made the fixes yesterday, in the heat of stomach flu fever and in order to
get javascript: working for M9 (see bug 1646).  Here's my analysis and summary
of changes.  Attached unified diffs (ignoring whitespace changes) of the files
themselves.  Rick, could you bless these or let me know if I've missed something
subtle?  With your and jevering's approval I'd like to get these in before the
M9 branch is created.  Thanks,

/be
-----
Bugfixes:

- Size classification and log2 problems:

  nsPoolingMemoryAgent::Alloc computed the ceiling-log2 of the string length,
  then multiplied by 1 or 2 depending on mCharSize, then subtracted 4 (the
  log2 of the smallest pool's string length).  But nsPoolingMemoryAgent::Free
  did not scale by mCharSize before subtracting 4.  What's more, once you're
  in log2-space, law of exponents says to add 0 or 1, not multiply by 1 or 2,
  to scale for ASCII or UCS2.

  So an allocation for a 33 character Unicode string would pull from the pool
  at index 6*2-4 or 8, while a Free on that string would push its storage in
  the pool at index 6*1-4 or 2.

  Fixed to compute ceiling-log2 correctly, subtract 4 from the resulting log2,
  and add 0 or 1 (add mCharSize, which is encoded as 0 for ASCII, 1 for UCS2;
  neat!) to find the right byte-size-classified pool.  For the 33-character
  Unicode string example, ceiling-log2 of 33 is 6, +1 for mCharSize, -4 for
  the log2 of the minimum pooled length, gives index 3 as the pool to use for
  Alloc and Free.

- Net (string length excluding NUL) vs. gross (buffer size) capacity:

  nsString.cpp's nsCAutoString::nsCAutoString was calling nsStr::Initialize
  with aBuffer.mCapacity, but nsStr mCapacity is net, not gross.  Fixed to
  pass aBuffer.mCapacity-1.

  The same problem existed in nsAutoString::nsAutoString.

  It's not clear that the same problem doesn't arise in both the PRUnichar*
  and char* flavors of nsSubsumeCStr::nsSubsumeCStr -- they compute strlen
  if given aLength==-1, otherwise they set mLength and mCapacity to aLength-1.
  That says that any non-negative value for aLength should be taken as a gross
  rather than net capacity, as with nsString::ToCString's aBufLength argument.

  I can't find any callers who pass the third aLength parameter, so it always
  defaults to -1 in currently exercised code.  This net vs. gross issue could
  be resolved by using "aBufLength" or "aBufCapacity" or some such to suggest
  that the argument is gross, not net.  CBufDescriptor::mCapacity might also
  be renamed to better spell out what's going on.

Cleanups:

- Add to comment in nsStr.h about mCapacity for an nsStr being in character
  count units, and not counting the allocated space for the final NUL char;
  whereas mCapacity for a CBufDescriptor is a gross capacity, not net of NUL.

- Define kNumPools and use it consistently to declare mPools and bound loops
  over, and indexes into, that array.  Previous revision used 16, 10, and 12
  respectively.

- Rename eDefaultSize eMinLength (it's a net string length or character count,
  not a "sizeof" byte size or buffer capacity), derived from eMinLengthLog2.
  Use eMinLengthLog2 consistently instead of magic 4.

- Use PR_CeilingLog2 instead of ad-hoc exp/log2-computing loops.  I changed
  nsMemoryAgent::Alloc to use this NSPR function too.

- Make comments about "given amount in charunits... have to scale by charsize"
  apply to the scaling code, and to the particular amount variable.

- Defer theNewCapacity++ in nsPoolingMemoryAgent::Alloc until we enter the
  losing case for sure, where we need to call 'new char[theSize]' -- and then
  avoid gratuitous ++ by adding 1 when computing theSize.

- Expand tabs to 2-space-equivalent (actually, Emacs does this based on the
  magic comment at the top of the file -- it says indent-tabs-mode: nil).
Status: NEW → ASSIGNED
Sorry, I'm just flat wrong about the second point (net vs. gross capacity):
CBufDescriptor.mCapacity is net excluding the terminating null character, just
as nsStr.mCapacity is.  I don't know what made me think otherwise in the heat of
debugging yesterday.

I noticed that nsString2.cpp's nsString::ToCString is doing two copies of the
string, one 2to1 and the next 1to1 (char-size-wise, that is).  The first is from
a copy-constructor for nsCString called to convert *this from nsString type to
nsCString type, which is the only matching type here for the thrice-overloaded
nsCString::Assign's first argument.  This is all hidden within the line:

    temp.Assign(*this);

around line 604 of nsString2.cpp.

Was "2to1" Assign removed from the nsString package?  It seems just as bad to
have a "2to1" copy-constructor, in terms of risk of Unicode data loss.  In any
case, this code needs a nsCString::Assign(const nsString& aString...) in order
to avoid a gratuitous copy.

/be
So that assertion (
+    NS_ASSERTION(temp.mStr==aBuf, "lost buffer");
) that I added to nsString::ToCString is botching here:

nsDebug::Assertion(const char * 0x100740d8, const char * 0x100740c8, const char
* 0x100740a0, int 0x0000025d) line 176 + 13 bytes
nsString::ToCString(char * 0x006af328, unsigned int 0x00000100, unsigned int
0x00000000) line 605 + 33 bytes
CToken::DebugDumpSource(ostream & {...}) line 142 + 29 bytes
WriteTokenToLog(CToken * 0x01ffbe50) line 1050
CNavDTD::HandleStartToken(CToken * 0x01ffbe50) line 1250 + 9 bytes
NavDispatchTokenHandler(CToken * 0x01ffbe50, nsIDTD * 0x02a09d20) line 241 + 12
bytes
CTokenHandler::operator()(CToken * 0x01ffbe50, nsIDTD * 0x02a09d20) line 80 + 14
bytes
CNavDTD::HandleToken(CNavDTD * const 0x02a09d20, CToken * 0x01ffbe50, nsIParser
* 0x02b427e0) line 731 + 18 bytes
CNavDTD::BuildModel(CNavDTD * const 0x02a09d20, nsIParser * 0x02b427e0,
nsITokenizer * 0x02a0bb30, nsITokenObserver * 0x00000000, nsIContentSink *
0x02b425a0) line 551 + 20 bytes
nsParser::BuildModel() line 941 + 34 bytes
nsParser::ResumeParse(nsIDTD * 0x00000000, int 0x00000000) line 886 + 11 bytes
nsParser::OnDataAvailable(nsParser * const 0x02b427e4, nsIChannel * 0x02b37310,
nsISupports * 0x00000000, nsIInputStream * 0x02b40300, unsigned int 0x00000000,
unsigned int 0x000005c8) line 1168 + 19 bytes
nsDocumentBindInfo::OnDataAvailable(nsDocumentBindInfo * const 0x02b323f0,
nsIChannel * 0x02b37310, nsISupports * 0x00000000, nsIInputStream * 0x02b40300,
unsigned int 0x00000000, unsigned int 0x000005c8) line 2093 + 32 bytes
nsHTTPResponseListener::OnDataAvailable(nsHTTPResponseListener * const
0x02b40170, nsIChannel * 0x024bf040, nsISupports * 0x02b37310, nsIInputStream *
0x02b40300, unsigned int 0x00001158, unsigned int 0x000005c8) line 163 + 47
bytes
nsOnDataAvailableEvent::HandleEvent(nsOnDataAvailableEvent * const 0x02b43f70)
line 350
nsStreamListenerEvent::HandlePLEvent(PLEvent * 0x02b43f74) line 149 + 12 bytes
PL_HandleEvent(PLEvent * 0x02b43f74) line 509 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00abd640) line 470 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x00000c40, unsigned int 0x0000d32a, unsigned int
0x00000000, long 0x00abd640) line 932 + 9 bytes
KERNEL32! bff7363b()

It looks like ToCString ignores its char* buffer argument and converts into an
auto-grown malloc buffer, which it then leaks.  Also, the anOffset parameter
seems to be ignored, although I didn't remember seeing anyone use it in the
calls I surveyed.

Attached is my attempt at a fix, which also fixes the double-copy hiding in the
unintended copy-constructor.

/be
I'll review the patch, but first a few comments. I did in fact remove the 2to1
assignment operator in an attempt to make things safer, but have changed my
mind. Look for it to reappear real soon. I'm not clear on the issue you're
raising with toCString (I'll read your comments more closely), but let's make
sure of one thing: if you give me a buffer, I assume I have to reduce it to
account for a null terminator. It's possible therefore for me to auto-alloc a
heap buffer if you give me a length of N, rather than N+1.
ToCString's interface does not provide for auto-alloc, because it returns a
char* to the same char* buffer passed in.  If it sometimes were to return the
auto-grown buffer, the caller would have to test (retptr != buffer) and if true
call free or delete.  But today, per the assertion and the backtrace involving
CToken::DebugDumpSource, the pointer to the original buffer is returned and the
autogrown buffer is leaked.

It seems better to me to truncate the string to fit the user's C buffer than not
to use the buffer at all.

Adding back 2to1 will still result in a copy-construct and extra copy -- is my
patch to use nsStr::Assign ok?  Feel free to take this bug away from me!

/be
Assignee: brendan → rickg
Status: ASSIGNED → NEW
Assignee: rickg → brendan
I've fixed the major problems cited here: ToCString() logic error, the extra
copy construction, and I've improved the docs. Feel free to address the
remaining issues (which I hope are largely cosmetic) yourself. Thanks for the
bug report.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Thanks for fixing this stuff.

/be
Whiteboard: [09271999]waiting to hear from reporter
Status: RESOLVED → VERIFIED
marking verified
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: