Closed Bug 139649 Opened 22 years ago Closed 22 years ago

Fix string code to use IsDependentOn instead of depending on nsAPromiseString type

Categories

(Core :: XPCOM, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla1.0

People

(Reporter: jag+mozilla, Assigned: jag+mozilla)

References

Details

(Keywords: topembed+, Whiteboard: [driver:scc])

Attachments

(4 files, 4 obsolete files)

Consider the code:

  nsString A;
  //...
  A = Substring(A, 2, 3);

Normally when you assign something into A you'll clear A's buffer, then copy the
string you want to assign into it. That creates a problem for this code because
the (sub)string we want to copy from depends on the string we want to assign
into, so when we cleared A's buffer we thus lost the characters we wanted to copy.

The solution is to make a copy of the substring first, then clear A's buffer,
and then store the copy in A.

We're currently doing this by looking at the type of the string we want to copy
from, and if it's an nsAPromiseString, we know we need to check whether there is
a dependency on the string we're assigning into. The problem with this approach
is that we can't depend on the string's type, since someone might cast that type
information away, for example:

  void foopy(const nsAString& inStr, nsAString& outStr)
    {
      outStr = inStr; // very silly example
    }

  //...
  nsString A;
  foopy(Substring(A, 2, 3), A);

Here the type information is cast away, so we don't do the dependency check, and
A isn't what it's supposed to be.

The real solution is to always check for such dependencies when modifying a
string (Assign, Append, Insert, ...). This will allow us to get rid of (the now
defunct) |nsAPromiseString|. We might also be able to special case a few
assignments (e.g. from PRUnichar*) where we know there is no dependency.
Blocks: 125389
Status: NEW → ASSIGNED
Keywords: mozilla1.0, topembed
Priority: -- → P3
Target Milestone: --- → mozilla1.0
it just occured to me that there still isn't anything we can do about this
broken pattern:

{
  nsString foopy("loves bar");
  PRUnichar *p = foopy.get();
  foopy = p;
}

as well as all the variations on this that might accidentally show up in real code.

should we add an IsDependentOn that takes a raw pointer?  or is that just overkill?
A pointer to a buffer inside a string is only valid for as long as you know that
the string doesn't change. I think that clearly covers this case as "don't do that".
i guess, but conceptually it's not that much different then A = Substring(A,x,y);
especially given the example you mentioned.  i could just as easily replace
inStr with a |const PRUnichar *| value.  users might fall into the same trap, right?

at any rate, i don't feel that strongly about this issue... i just wanted to
point out that there is a bit of bias toward objects for no apparent reason :-/
Assignee: darin → jaggernaut
Status: ASSIGNED → NEW
-> jag
Attached patch Fix (obsolete) — Splinter Review
So darin measured the effect of this patch on Tp and Ts and it doesn't have a 
negative impact on either.
Ignore the assertions I added in nsDependentSubstring, they're something I was 
playing with.
Blocks: 138000
i ran the perf numbers on a 733Mhz redhat 7.2 box w/ 256 Mb RAM and an IDE drive.

compiled with these options:

ac_add_options --enable-crypto
ac_add_options --enable-xterm-updates
ac_add_options --enable-extensions
ac_add_options --enable-svg
mk_add_options MOZ_INTERNAL_LIBART_LGPL=1
MOZ_INTERNAL_LIBART_LGPL=1
ac_add_options --enable-optimize=-O2
ac_add_options --disable-debug
ac_add_options --enable-strip
No longer blocks: 138000
bug 128288 probably depends on this or is a dupe of this...
Comment on attachment 82053 [details] [diff] [review]
Fix

sr=darin (nice!)
Attachment #82053 - Flags: superreview+
Comment on attachment 82053 [details] [diff] [review]
Fix

hmmm, I don't think the |IsDependentOn| test in the |nsDependentSubstring|
classes is strong enough.  Shouldn't it be

  return mString->IsDependentOn(aString);

?  In any case where there is a more complex underlying string, or the
possibility of one characterized by a reference to an abstract type, then we
have to forward the |IsDependentOn| call into it.  Only in simple types can you
answer the question directly.  Is this not true?  Or am I confused?

All classes that directly own the data can answer the question with a simple
test.  All classes that only refer to the data owned by some other class must
forward the |IsDependentOn| call.

Hmmm, maybe I'm reading this wrong, but it looks like one could circumvent the
|IsDependentOn| test by just assigning from an element pointer and length that
happen to point into ones self (e.g., trying to trim both ends of a string in a
single operation, not using |SubString|).  Similarly with other element pointer
based operations.

Other than those points, I like this patch.  If I'm right about the points
above, however, those would be killers.  I'm marking this 'needs-work' so you
can determine if my observations are valid.  If they aren't, I'll update that
to an r/sr.
Attachment #82053 - Flags: needs-work+
(IsDependentOn in nsDependentSubstring should indeed forward the request)

scc and I just discussed adding support for detecting char iterator based
substring dependencies, which if you look at it slightly differently means
you'll just check for overlap of buffers in all cases, which in turn means
IsDependentOn can be (protected) a non-virtual function on nsAString.

I'll come up with a patch that does this in a m*n (where m is always 1 because
you can only write into single fragment strings, and n is most likely 1 or close
to it) fashion, scc is looking into an m+n algorithm.
Keywords: topembedtopembed+
noting the bugs for which I am the driver
Whiteboard: [driver:scc]
Attachment #82053 - Attachment is obsolete: true
Attachment #83396 - Attachment is obsolete: true
Attached file outline of a O((m+n)log(n)) algorithm (obsolete) —
This is an outline of the faster algorithm.  Like the brute force approach it
is O(m) when the number of source buffers is 1.  Using a algorithm designed to
handle two muli-fragment strings from the start would allow us to make
|IsDependentOn| a non-virtual function in |nsAC?String| --- even if we didn't
turn on the ability to do multi-fragment sources.
1) are we really concerned about performance w.r.t. the number of string
fragments? I mean how often are we talking about strings with more than say 4
string fragments? The only way I even know how to create a multi-fragment string
is via the concatentation operator.
2) do we really need the faster (and significantly more complex!) algorithm to
freeze strings?
i have to agree with alecf... the current patch seems like the right one for the
1.0 branch (it is much simpler).  can we defer the optimization work until after
mozilla 1.0?
I'm fine with that, since at the moment, we only do Mx1 comparisons.  My
algorithm _looks_ a lot more complicated; but most of that is comments.  The
straight-line flow for an Mx1 case is actually simpler than jag's algorithm
_except_ for my binary search, which looks bulky, but in the one pair case only
expends an add, a shift, and a couple of (register) assigments over jag's.  I'm
not disagreeing; mine does have bigger brain-print, and _currently_ jag's fills
the bill.

However, the test in jag's |IsDependentOn| is not strong enough.  It currently reads

  if ( f2.mStart >= f1.mStart && f2.mStart <  f1.mEnd )

it should read

    // if it _isn't_ the case that
    //  one fragment starts after the other ends,
    //  or ends before the other starts,
    // then, they conflict
 
  if ( !(f2.mStart>=f1.mEnd || f2.mEnd<=f1.mStart) )

simplifying we get

  if ( f2.mStart<f1.mEnd && f2.mEnd>f1.mStart )

Let's get jag's patch in as soon as possible (like tonight, according to
drivers).  We can upgrade to my algorithm post 1.0 as this doesn't change the API.
jag explained to me that his check is strong enough since |this| is always a
single fragment string (we should assert this) and any overlap would mean that
f2.mStart >= f1.mStart (given that we only call IsDependentOn before writing to
a string).

that said, there's no runtime difference between what scott suggests and what
jag has already implemented, so why not use what scott suggests?  it certainly
seems clearer and less fragile (just count the number of conditionals in my
first paragraph!).
The address of a string buffer is completely unrelated to the address of any
other independent buffer.  The buffers of a single string are not necessarily in
the same order in memory.  Nor is a buffer of one string guaranteed to start
later than a buffer of another string.  So nothing about strings themselves
forces |f2| to be after f1.  I don't see anything in the algorithm that
explictly assigns |f2| to be the later of the two fragments.  I must be missing
something, because I don't see how jag's test can catch

  f1     |----------|
  f2   |--------|

Perhaps by coincidence, because currently our writable strings all totally own
the characters they manage, and thus, own them from the beginning of the buffer.
 This would be invalidated, however, by several ideas that easily fit into the
scheme we have now, e.g., writable substrings and strings with smarter buffer
strategies (where the string start might not be the buffer start).  Besides, we
don't know what needs people have to determine string intersection in the
future.  It may be that future clients what to see if two strings interfere with
each other for reasons totally unrelated to assigning into one of them.  This
broken test compromises that notion, and makes the function itself, er, uh,
well, false.  The function claims to test if two strings of any number of
fragments intersect.  With this test, it doesn't really do that.

With the UNSTATED coincidence mentioned above, I concede his test will work for
a little while :-)  The fact that this assumption is not obvious to an ordinary
reader (it had to be explained to you, Darin), and is easily invalidated, makes
this test weaker IMHO that my test of equivalent effort.

Jag: you need better comments (er, _any_ comments in this function :-)  I'm not
trying to be hard on you Jag, don't take these comments personally or anything
like that; it's just a test with one too many assumptions.
scc: agreed.
drivers are drooling.  how much testing have seen with this?  Can we fix the
test and check it in tonight?  Drivers are saying "directly into the branch". 
Can I get some driver approval for that?  Jag, Darin, do you feel it's ready for
the branch.  If we don't get it in now, and I mean _right_now_ ... it might not
make 1.0.
i've tested the patch with a trunk build (under linux), and everything seems
fine.  i've run Ts and Tp performance tests (under linux), and there don't
appear to be any performance regressions.  in short, i'd say this patch is ready!
Attachment #83449 - Attachment is obsolete: true
adding adt1.0.0+ pending r=/sr= on the final patch.  Please get drivers approval
before checking into the 1.0 branch. 
Keywords: adt1.0.0+
Attachment #83613 - Flags: superreview+
Comment on attachment 83613 [details] [diff] [review]
Using a better check (thanks scc).

r/sr=darin
Comment on attachment 83613 [details] [diff] [review]
Using a better check (thanks scc).

r=scc
Attachment #83613 - Flags: review+
Checked in on trunk and branch.
Keywords: fixed1.0.0
Minor change.
Comment on attachment 83714 [details] [diff] [review]
I knew I had forgotten something.

sr=scc
Attachment #83714 - Flags: superreview+
Comment on attachment 83714 [details] [diff] [review]
I knew I had forgotten something.

r=alecf
Attachment #83714 - Flags: review+
Apparently Tstartup went up a little as a result of my first checkin, see bug
144863.
Depends on: 144863
Fix has been checked in. I'll deal with Tstart issues in bug 144863.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Use |GetReadableFragment| instead of iterators (I should have remembered that
would be faster.  Re-ordered some tests, where putting the more likely test
first can short-circuit the whole expression.  Fixed one test that needed to be
a little stronger.  Added some optional optimizations (intent indicated with
|#if|), that could be measured.

With respect to the start-up slow down reported (see bug 144863), I don't think
sticking this algorithm in will help --- except for one thing easily inserted
into jag's patch: the |Empty()| tests at the top.  It seems likely to me that
during startup, a lot of string assignment is into empty strings.  This will
short circuit that case.  It would be easy to intrument this in a debug build
by having |IsDependentOn| just count the total number of times it's called, and
the total number of times it's called on empty destination.  If the latter
dominates  other uses of |IsDependentOn|, add the |Empty| test.
Attachment #83551 - Attachment is obsolete: true
What about a string that's a dependent concatenation of multiple empty strings?
Here's working code demonstrating the heap sort and overlap resolution phase of
the algorithm above
Component: String → XPCOM
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: