Closed Bug 328755 Opened 14 years ago Closed 6 years ago

assigning a zero-length string should not alloc a buffer

Categories

(Core :: String, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: bryner, Assigned: jkitch)

References

Details

(Keywords: dev-doc-needed, helpwanted, Whiteboard: [mentor=benjamin@smedbergs.us][lang=c++])

Attachments

(2 files, 8 obsolete files)

Currently, calling str.Assign(EmptyString()) or str.Assign(blah, 0) allocates a buffer for the terminating null.  Instead, it should just call Truncate() and use the shared empty buffer.
Attached patch patch (obsolete) — Splinter Review
Attachment #213345 - Flags: review?(darin)
Attachment #213345 - Flags: superreview+
Attachment #213345 - Flags: review?(darin)
Attachment #213345 - Flags: review+
Comment on attachment 213345 [details] [diff] [review]
patch

We'll need this on the branch for my mork import changes to perform decently.
Attachment #213345 - Flags: approval-branch-1.8.1?(darin)
Attachment #213345 - Flags: approval-branch-1.8.1?(darin) → approval-branch-1.8.1+
checked in, trunk and branch.
Status: NEW → RESOLVED
Closed: 14 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Might this have caused bug 328842 on the 1.8 branch only?
I backed this out due to crashes -- some callers such as this one expect to be able to write to an empty string buffer, and I don't have time to track them all down.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: bryner → nobody
Status: REOPENED → NEW
Keywords: fixed1.8.1
Keywords: helpwanted
Priority: -- → P3
Whiteboard: [mentor=bsmedberg][lang=c++]
Hi

I'd like to work on this.

I haven't fixed a bug before, so I'd appreciate it if I could get a little help getting started.
Assignee: nobody → jkitch.bug
I gather the purpose of the bug is to track down places where the patch causes bad behaviour and to fix them.  The question is how should I go about finding them?

The try run is green (https://tbpl.mozilla.org/?tree=Try&rev=2585762e5697), and the crash reported in bug 328842 isn't reproducible.
Flags: needinfo?(benjamin)
Perhaps obviously, nobody should be writing to an empty buffer. The only case I can think of where code would be *trying* to do that would be this:

str.SetLength(0);
// Make sure it's null-terminated
*str.BeginWriting() = '\0';

But SetLength should ensure null-termination already, and that code is bad and should be fixed. If a try run doesn't show anything, then you could probably just reland it.

However, I am looking through the code, and I'm afraid that the current null-string buffer isn't `const`. This means that you could modify it and nobody would know. See

http://hg.mozilla.org/mozilla-central/annotate/308e3cf5ba75/xpcom/string/src/nsSubstring.cpp#l30
http://hg.mozilla.org/mozilla-central/annotate/308e3cf5ba75/xpcom/string/public/nsCharTraits.h#l103

I think that as part of this patch, you should probably also make those "const". That might help catch at runtime any violations where code was in fact trying to write to the empty buffer.
Flags: needinfo?(benjamin)
I'm not quite sure how to accomplish this.

> static char_type * const sEmptyBuffer;
prevents sEmptyBuffer from being reassigned but does nothing to prevent the contents from being overwritten

> static char_type const * sEmptyBuffer;
Will prevent the contents from being overwritten, except that it cannot be assigned to mData without a const_cast.  My understanding is that this will then cause undefined behaviour when overwritten, rather than a runtime error.

I suppose a flag could be added to indicate if mData should be treated as const, which defers allocating a new buffer until functions with the potential of changing the buffer are called but I'm not sure if this is what you had in mind.
Attached patch string.diff (obsolete) — Splinter Review
This rebases the previous patch and changes sEmptyBuffer to be "static char_type * const".  

https://tbpl.mozilla.org/?tree=Try&rev=0a5cd188f23b
I've sent it through an almost exhaustive try run and found no problems.

My concerns from comment 9 still apply.  Also, do you have any suggestions regarding run-time checks or assertions that can be performed?
Attachment #213345 - Attachment is obsolete: true
Attachment #822895 - Flags: review?(benjamin)
Comment on attachment 822895 [details] [diff] [review]
string.diff

I think we should also change gNullChar to be

static const PRUnichar gNullChar; You'll need to add const_cast when you assign its address to the sEmptyBuffer, but it will prevent people (by crashing) from accidentally overwriting it.

r=me with that change
Attachment #822895 - Flags: review?(benjamin) → review+
Attached patch string.diff (obsolete) — Splinter Review
The suggested change has been applied, and it caught two situations where gNullChar was overwritten with the same value. As the problem statements were useful in properly terminating other strings, I have changed it to only write the null terminator when the destination isn't already '\0'.  Flagging for review to verify this expedient is acceptable.

There may be other places where this problem occurs, but if so they are not covered by the unit tests.
Attachment #822895 - Attachment is obsolete: true
Attachment #825848 - Flags: review?(benjamin)
Attached patch string.diff (obsolete) — Splinter Review
superficial whitespace fix.
Attachment #825848 - Attachment is obsolete: true
Attachment #825848 - Flags: review?(benjamin)
Attachment #825860 - Flags: review?(benjamin)
Comment on attachment 825860 [details] [diff] [review]
string.diff

+  char destEW = *aDest.EndWriting();

This needs to be a reference or a pointer, not a value.

Another way to structure these tests, though, is to do

if (str.Length())
  *str.EndWriting() = '\0';
Attachment #825860 - Flags: review?(benjamin) → review-
Attached patch string.diff (obsolete) — Splinter Review
Fixed
Attachment #825860 - Attachment is obsolete: true
Attachment #826276 - Flags: review?(benjamin)
Attachment #826276 - Flags: review?(benjamin) → review+
Greenish try run:
https://tbpl.mozilla.org/?tree=Try&rev=e977e708b357

Android 4.0 Opt rc2 is intermittently orange, but I believe it to be manifestation of bug 915348 or bug 907830.


Note:  Crashes caused by writing to a dereferences character pointer are probably caused by this patch and expose pre-existing bugs.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/2e6063aa9b77
Status: NEW → RESOLVED
Closed: 14 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Comment on attachment 826276 [details] [diff] [review]
string.diff

> bool
> nsTSubstring_CharT::Assign( const char_type* data, size_type length, const fallible_t& )
>   {
>-    if (!data)
>+    if (!data || length == 0)
This change may have changed the return value of BeginWriting on an empty string. By my reading of the code EnsureMutable tries to assign the string to itself to promote to a string buffer, which no longer happens in this case, thus causing EnsureMutable to believe that it ran out of memory.

What I don't know is whether this is a breaking change.
Depends on: 935280
Blocks: 935376
Backed out to try to fix bug 935376:
http://hg.mozilla.org/mozilla-central/rev/9ba3faa35c96
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla28 → ---
So the fix for bug 935280 is simple.

https://hg.mozilla.org/mozilla-central/annotate/770de5942471/xpcom/ds/nsWindowsRegKey.cpp#l285
> - if (size == 0) {
> + if (size == 0 || size == 2) { // size <= 2 if negative sizes are not a concern

The testcase necessary to ensure it doesn't reoccur will take more effort.

The issue is how to proceed from here.  There are attempts to write to zero length strings that aren't covered by the unit tests.  Is there a way to discover them without landing and waiting for crash reports to highlight the problem areas?
Flags: needinfo?(benjamin)
Blocks: 935257
Blocks: 935430
(In reply to James Kitchener (:jkitch) from comment #21)
> > + if (size == 0 || size == 2) { // size <= 2 if negative sizes are not a concern
DWORD is unsigned so that's fine.

But is it correct to truncate in the "2" case? What if the value is a single character, not null terminated? MSDN claims it can happen:

> If the data has the REG_SZ, REG_MULTI_SZ or REG_EXPAND_SZ type, this size includes any terminating null character or characters unless the data was stored without them. For more information, see Remarks.

I wonder if there is a bug in this assumption:
>   // |size| includes room for the terminating null character
(There may be external invariants that ensure that the weird case doesn't happen; I don't know the code well enough to say)
(In reply to neil@parkwaycc.co.uk from comment #19)
> Comment on attachment 826276 [details] [diff] [review]
> string.diff
> 
> > bool
> > nsTSubstring_CharT::Assign( const char_type* data, size_type length, const fallible_t& )
> >   {
> >-    if (!data)
> >+    if (!data || length == 0)
> This change may have changed the return value of BeginWriting on an empty
> string. By my reading of the code EnsureMutable tries to assign the string
> to itself to promote to a string buffer, which no longer happens in this
> case, thus causing EnsureMutable to believe that it ran out of memory.
>
I can't see that.  There is just a reinterpret_cast within nsStringBuffer::FromData(), which I don't think overwrites anything, and a check against mRefCount which doesn't seem to be incremented whenever sEmptyBuffer is assigned to mData.

If it was the case, shouldn't it crash during the attempt to write to itself?  I haven't found crash reports blaming that function.
Depends on: 936886
I don't think there is a way to discover these issues without landing and getting crash reports. I do think we should fix the issues that were uncovered the first time and then just try again.
Flags: needinfo?(benjamin)
Is there a way to write tests to verify low level string functionality?  I haven't been able to get compiled code tests to work due to nsString's #ifdef MOZILLA_INTERNAL_API constraint.

Not all of the crash reports include steps to reproduce and I would prefer not to take it on faith that I have identified and resolved the problem.
Flags: needinfo?(benjamin)
Depends on: 883339
Attached patch string.diff (obsolete) — Splinter Review
This patch
1. Now uses char16_t
2. Includes some changes to nsUnescapeCount in the hope of avoiding this crash:

https://crash-stats.mozilla.com/report/list?signature=nsUnescapeCount&build_id=20131105030206&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&reason=EXCEPTION_ACCESS_VIOLATION_WRITE&hang_type=any&date=2013-11-30+07%3A00%3A00&range_value=4#reports

I'm not sure why it should be crashing for users as the length 0 src case should be triggered by the automated tests.  The code is rearranged slightly in case an over-zealous optimiser removed the null length protection.
Attachment #826276 - Attachment is obsolete: true
The remaining work for this patch is to analyse the following crash reports to identify crash bugs and fix them.

https://crash-stats.mozilla.com/query/?product=Firefox&version=ALL%3AALL&range_value=4&range_unit=weeks&date=11%2F30%2F2013+07%3A00%3A00&query_search=signature&query_type=contains&query=&reason=EXCEPTION_ACCESS_VIOLATION_WRITE&release_channels=&build_id=20131105030206&process_type=any&hang_type=any

The single biggest issue (the one with nsWindowsRegKey::ReadStringValue() in the stack chain) should have been fixed by bug 936886, but there are several others still outstanding.

The process of solving the remaining issues I defer to someone with greater expertise in the code or in identifying the causes from the limited information provided by sorocco.

Unless there are significant memory efficiency or performance gains, I'm inclined to leave this bug unfixed.  As the automated tests aren't comprehensive enough to cover all of the resulting bugs, I expect it will be easy to introduce subtle crashes that only manifest themselves in specific situations.  This has the potential to make things very unpleasant for some users, perhaps resulting in repeated startup crashes and limited opportunities for them to provide steps to reproduce them.
Assignee: jkitch.bug → nobody
jkitch, your query is for every EXCEPTION_ACCESS_VIOLATION_WRITE crash, I doubt you imply that they are all caused by the same pattern.

Benjamin, can you take a look at the two patches here and follow up on the last comment from jkitch here?
Flags: needinfo?(benjamin)
Only those stack traces that touch strings are likely to be caused by this patch.

It is possible that all of the ntdll.dll crashes are solved by bug 936886, which would simplify things greatly, but for a given crash signature the stack traces can be very different. Compare 
https://crash-stats.mozilla.com/report/index/543f373c-ce8c-4f3c-8d87-5a69c2131119
and 
https://crash-stats.mozilla.com/report/index/9694c532-fa75-489f-9188-5a9d82131112

Can the former be treated as having the same cause as the latter by attributing it to stack corruption?
The example crash reports here aren't walking the stack very well because we're missing symbols for ntdll.dll and kernelbase, so we really don't know whether this change is related to it at all.

I don't think we should live in fear here. Let's fix the issues that we found last time this landed. Land it again and if we see odd things in nightly builds we can fix them or back it out again. There should be no shame in backouts for things that aren't caught by automated tests.
Flags: needinfo?(benjamin)
Whiteboard: [mentor=bsmedberg][lang=c++] → [mentor=benjamin@smedbergs.us][lang=c++]
I've analysed the top 20-30 crashes from that build, and the ones that I can attribute to this bug fall into three categories.

1. memmove/LocalBaseRegQueryValue/ntdll.dll.
The stack traces are very noisy, with many including only a single call to Assign().  If these are all put down to an inability to correctly generate the stack trace, then this crash should have been resolved by bug 936886.

2. nsUnescapeCount
I thought I had fixed this bug, but I suspect it is due to an over-eager compiler.  The null check in "if(!a) {a = 0}" may be viewed as unnecessary on the incorrect assumption that a is always mutable.  This patch moves the null check earlier in the function in the hope that the compiler is no longer confused.

3. ConvertAndWrite
Covered by a separate patch
Assignee: nobody → jkitch.bug
Attachment #8365802 - Attachment is obsolete: true
Attachment #8369825 - Flags: review?(benjamin)
Attached patch Part2: ConvertAndWrite crash fix (obsolete) — Splinter Review
The buffer of zero length strings are no longer mutable, so attempts to write to them (including setting the null-terminator) will crash.

This patch checks for this condition and returns early if GetMaxLength() returns 0.  This should only occur when the input aString has zero length (and certain encoders are used).  If there is nothing to convert, there is nothing to write and I see no harm in returning early.
Attachment #8365796 - Attachment is obsolete: true
Attachment #8369828 - Flags: review?(bzbarsky)
Comment on attachment 8369828 [details] [diff] [review]
Part2: ConvertAndWrite crash fix

r=me, but this seems like a somewhat worrying change to the API...
Attachment #8369828 - Flags: review?(bzbarsky) → review+
Er, except the indentation needs fixing, of course.
Attachment #8369828 - Attachment is obsolete: true
dev-doc:

When this lands the buffers of length 0 internal nsAStrings/nsACStrings are now immutable.  You can still can still change the length or call Assign as normal, but the buffer returned BeginWriting() cannot be written to.

In particular the following pattern will crash if str is a length 0 string.
nsString str;
str.Truncate();  // sets buffer size to 0.
char* c str.BeginWriting();
*c = '\0'; //crash.

Recommended practice is to ensure the buffer has length greater than 0 before writing.

I would avoid the following pattern for null terminating as I have suspicions that an over-eager compiler will optimise away the null check.
char* c str.BeginWriting();
if (*c) {
*c = '\0'
}

The motive is to stop poorly behaving code redefining the empty string, as the empty string's buffer is shared by all empty strings.
Keywords: dev-doc-needed
Attachment #8369825 - Flags: review?(benjamin) → review+
Lets try again.
Keywords: checkin-needed
Keywords: checkin-needed
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/6f1e6f7d3041
https://hg.mozilla.org/mozilla-central/rev/b95d0fb14026
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Blocks: 1000030
Depends on: 1000079
You need to log in before you can comment on or make changes to this bug.