Closed Bug 139649 Opened 22 years ago Closed 22 years ago

Fix string code to use IsDependentOn instead of depending on nsAPromiseString type

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.0

People

(Reporter: jag+mozilla, Assigned: jag+mozilla)

References

Details

(Keywords: topembed+, Whiteboard: [driver:scc])

Attachments

(4 files, 4 obsolete files)

Fix 22 years ago jag (Peter Annema) 38.34 KB, patch	darin.moz : superreview+	Details \| Diff \| Splinter Review
determine dependencies by checking overlap of buffers 22 years ago jag (Peter Annema) 43.13 KB, patch		Details \| Diff \| Splinter Review
Cleaned up the previous patch and simplify check 22 years ago jag (Peter Annema) 41.19 KB, patch		Details \| Diff \| Splinter Review
outline of a O((m+n)log(n)) algorithm 22 years ago Scott Collins 7.62 KB, text/plain		Details
Using a better check (thanks scc). 22 years ago jag (Peter Annema) 43.60 KB, patch	scc : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review
I knew I had forgotten something. 22 years ago jag (Peter Annema) 6.71 KB, patch	alecf : review+ scc : superreview+	Details \| Diff \| Splinter Review
outline of the advanced algorithm + using \|GetReadableFragment\| directly, smarter tests, and optional optimizations for comparison 22 years ago Scott Collins 9.91 KB, text/plain		Details
the heap sort and overlap resolution phase 22 years ago Scott Collins 7.64 KB, text/plain		Details

jag (Peter Annema)

Assignee

Description

•

22 years ago

Consider the code:

  nsString A;
  //...
  A = Substring(A, 2, 3);

Normally when you assign something into A you'll clear A's buffer, then copy the
string you want to assign into it. That creates a problem for this code because
the (sub)string we want to copy from depends on the string we want to assign
into, so when we cleared A's buffer we thus lost the characters we wanted to copy.

The solution is to make a copy of the substring first, then clear A's buffer,
and then store the copy in A.

We're currently doing this by looking at the type of the string we want to copy
from, and if it's an nsAPromiseString, we know we need to check whether there is
a dependency on the string we're assigning into. The problem with this approach
is that we can't depend on the string's type, since someone might cast that type
information away, for example:

  void foopy(const nsAString& inStr, nsAString& outStr)
    {
      outStr = inStr; // very silly example
    }

  //...
  nsString A;
  foopy(Substring(A, 2, 3), A);

Here the type information is cast away, so we don't do the dependency check, and
A isn't what it's supposed to be.

The real solution is to always check for such dependencies when modifying a
string (Assign, Append, Insert, ...). This will allow us to get rid of (the now
defunct) |nsAPromiseString|. We might also be able to special case a few
assignments (e.g. from PRUnichar*) where we know there is no dependency.

jag (Peter Annema)

Assignee

Updated

•

22 years ago

Blocks: 125389

Darin Fisher

Updated

•

22 years ago

Status: NEW → ASSIGNED

Keywords: mozilla1.0, topembed

Priority: -- → P3

Target Milestone: --- → mozilla1.0

Darin Fisher

Comment 1

•

22 years ago

it just occured to me that there still isn't anything we can do about this
broken pattern:

{
  nsString foopy("loves bar");
  PRUnichar *p = foopy.get();
  foopy = p;
}

as well as all the variations on this that might accidentally show up in real code.

should we add an IsDependentOn that takes a raw pointer?  or is that just overkill?

jag (Peter Annema)

Assignee

Comment 2

•

22 years ago

A pointer to a buffer inside a string is only valid for as long as you know that
the string doesn't change. I think that clearly covers this case as "don't do that".

Darin Fisher

Comment 3

•

22 years ago

i guess, but conceptually it's not that much different then A = Substring(A,x,y);
especially given the example you mentioned.  i could just as easily replace
inStr with a |const PRUnichar *| value.  users might fall into the same trap, right?

at any rate, i don't feel that strongly about this issue... i just wanted to
point out that there is a bit of bias toward objects for no apparent reason :-/

Darin Fisher

Updated

•

22 years ago

Assignee: darin → jaggernaut

Status: ASSIGNED → NEW

Darin Fisher

Comment 4

•

22 years ago

-> jag

jag (Peter Annema)

Assignee

Comment 5

•

22 years ago

Attached patch Fix (obsolete) — Details — Splinter Review

jag (Peter Annema)

Assignee

Comment 6

•

22 years ago

So darin measured the effect of this patch on Tp and Ts and it doesn't have a 
negative impact on either.

jag (Peter Annema)

Assignee

Comment 7

•

22 years ago

Ignore the assertions I added in nsDependentSubstring, they're something I was 
playing with.

jag (Peter Annema)

Assignee

Updated

•

22 years ago

Blocks: 138000

Darin Fisher

Comment 8

•

22 years ago

i ran the perf numbers on a 733Mhz redhat 7.2 box w/ 256 Mb RAM and an IDE drive.

compiled with these options:

ac_add_options --enable-crypto
ac_add_options --enable-xterm-updates
ac_add_options --enable-extensions
ac_add_options --enable-svg
mk_add_options MOZ_INTERNAL_LIBART_LGPL=1
MOZ_INTERNAL_LIBART_LGPL=1
ac_add_options --enable-optimize=-O2
ac_add_options --disable-debug
ac_add_options --enable-strip

No longer blocks: 138000

Darin Fisher

Comment 9

•

22 years ago

bug 128288 probably depends on this or is a dupe of this...

Darin Fisher

Comment 10

•

22 years ago

Comment on attachment 82053 [details] [diff] [review]
Fix

sr=darin (nice!)

Attachment #82053 - Flags: superreview+

Scott Collins

Comment 11

•

22 years ago

Comment on attachment 82053 [details] [diff] [review]
Fix

hmmm, I don't think the |IsDependentOn| test in the |nsDependentSubstring|
classes is strong enough.  Shouldn't it be

  return mString->IsDependentOn(aString);

?  In any case where there is a more complex underlying string, or the
possibility of one characterized by a reference to an abstract type, then we
have to forward the |IsDependentOn| call into it.  Only in simple types can you
answer the question directly.  Is this not true?  Or am I confused?

All classes that directly own the data can answer the question with a simple
test.  All classes that only refer to the data owned by some other class must
forward the |IsDependentOn| call.

Hmmm, maybe I'm reading this wrong, but it looks like one could circumvent the
|IsDependentOn| test by just assigning from an element pointer and length that
happen to point into ones self (e.g., trying to trim both ends of a string in a
single operation, not using |SubString|).  Similarly with other element pointer
based operations.

Other than those points, I like this patch.  If I'm right about the points
above, however, those would be killers.  I'm marking this 'needs-work' so you
can determine if my observations are valid.  If they aren't, I'll update that
to an r/sr.

Attachment #82053 - Flags: needs-work+

jag (Peter Annema)

Assignee

Comment 12

•

22 years ago

(IsDependentOn in nsDependentSubstring should indeed forward the request)

scc and I just discussed adding support for detecting char iterator based
substring dependencies, which if you look at it slightly differently means
you'll just check for overlap of buffers in all cases, which in turn means
IsDependentOn can be (protected) a non-virtual function on nsAString.

I'll come up with a patch that does this in a m*n (where m is always 1 because
you can only write into single fragment strings, and n is most likely 1 or close
to it) fashion, scc is looking into an m+n algorithm.

Judson Valeski

Updated

•

22 years ago

Keywords: topembed → topembed+

Scott Collins

Comment 13

•

22 years ago

noting the bugs for which I am the driver

Whiteboard: [driver:scc]

jag (Peter Annema)

Assignee

Comment 14

•

22 years ago

Attached patch determine dependencies by checking overlap of buffers (obsolete) — Details — Splinter Review

jag (Peter Annema)

Assignee

Updated

•

22 years ago

Attachment #82053 - Attachment is obsolete: true

jag (Peter Annema)

Assignee

Comment 15

•

22 years ago

Attached patch Cleaned up the previous patch and simplify check (obsolete) — Details — Splinter Review

Attachment #83396 - Attachment is obsolete: true

Scott Collins

Comment 16

•

22 years ago

Attached file outline of a O((m+n)log(n)) algorithm (obsolete) — Details

This is an outline of the faster algorithm.  Like the brute force approach it
is O(m) when the number of source buffers is 1.  Using a algorithm designed to
handle two muli-fragment strings from the start would allow us to make
|IsDependentOn| a non-virtual function in |nsAC?String| --- even if we didn't
turn on the ability to do multi-fragment sources.

Alec Flett

Comment 17

•

22 years ago

1) are we really concerned about performance w.r.t. the number of string
fragments? I mean how often are we talking about strings with more than say 4
string fragments? The only way I even know how to create a multi-fragment string
is via the concatentation operator.
2) do we really need the faster (and significantly more complex!) algorithm to
freeze strings?

Darin Fisher

Comment 18

•

22 years ago

i have to agree with alecf... the current patch seems like the right one for the
1.0 branch (it is much simpler).  can we defer the optimization work until after
mozilla 1.0?

Scott Collins

Comment 19

•

22 years ago

I'm fine with that, since at the moment, we only do Mx1 comparisons.  My
algorithm _looks_ a lot more complicated; but most of that is comments.  The
straight-line flow for an Mx1 case is actually simpler than jag's algorithm
_except_ for my binary search, which looks bulky, but in the one pair case only
expends an add, a shift, and a couple of (register) assigments over jag's.  I'm
not disagreeing; mine does have bigger brain-print, and _currently_ jag's fills
the bill.

However, the test in jag's |IsDependentOn| is not strong enough.  It currently reads

  if ( f2.mStart >= f1.mStart && f2.mStart <  f1.mEnd )

it should read

    // if it _isn't_ the case that
    //  one fragment starts after the other ends,
    //  or ends before the other starts,
    // then, they conflict
 
  if ( !(f2.mStart>=f1.mEnd || f2.mEnd<=f1.mStart) )

simplifying we get

  if ( f2.mStart<f1.mEnd && f2.mEnd>f1.mStart )

Let's get jag's patch in as soon as possible (like tonight, according to
drivers).  We can upgrade to my algorithm post 1.0 as this doesn't change the API.

Darin Fisher

Comment 20

•

22 years ago

jag explained to me that his check is strong enough since |this| is always a
single fragment string (we should assert this) and any overlap would mean that
f2.mStart >= f1.mStart (given that we only call IsDependentOn before writing to
a string).

that said, there's no runtime difference between what scott suggests and what
jag has already implemented, so why not use what scott suggests?  it certainly
seems clearer and less fragile (just count the number of conditionals in my
first paragraph!).

Scott Collins

Comment 21

•

22 years ago

The address of a string buffer is completely unrelated to the address of any
other independent buffer.  The buffers of a single string are not necessarily in
the same order in memory.  Nor is a buffer of one string guaranteed to start
later than a buffer of another string.  So nothing about strings themselves
forces |f2| to be after f1.  I don't see anything in the algorithm that
explictly assigns |f2| to be the later of the two fragments.  I must be missing
something, because I don't see how jag's test can catch

  f1     |----------|
  f2   |--------|

Perhaps by coincidence, because currently our writable strings all totally own
the characters they manage, and thus, own them from the beginning of the buffer.
 This would be invalidated, however, by several ideas that easily fit into the
scheme we have now, e.g., writable substrings and strings with smarter buffer
strategies (where the string start might not be the buffer start).  Besides, we
don't know what needs people have to determine string intersection in the
future.  It may be that future clients what to see if two strings interfere with
each other for reasons totally unrelated to assigning into one of them.  This
broken test compromises that notion, and makes the function itself, er, uh,
well, false.  The function claims to test if two strings of any number of
fragments intersect.  With this test, it doesn't really do that.

With the UNSTATED coincidence mentioned above, I concede his test will work for
a little while :-)  The fact that this assumption is not obvious to an ordinary
reader (it had to be explained to you, Darin), and is easily invalidated, makes
this test weaker IMHO that my test of equivalent effort.

Jag: you need better comments (er, _any_ comments in this function :-)  I'm not
trying to be hard on you Jag, don't take these comments personally or anything
like that; it's just a test with one too many assumptions.

Darin Fisher

Comment 22

•

22 years ago

scc: agreed.

Scott Collins

Comment 23

•

22 years ago

drivers are drooling.  how much testing have seen with this?  Can we fix the
test and check it in tonight?  Drivers are saying "directly into the branch". 
Can I get some driver approval for that?  Jag, Darin, do you feel it's ready for
the branch.  If we don't get it in now, and I mean _right_now_ ... it might not
make 1.0.

Darin Fisher

Comment 24

•

22 years ago

i've tested the patch with a trunk build (under linux), and everything seems
fine.  i've run Ts and Tp performance tests (under linux), and there don't
appear to be any performance regressions.  in short, i'd say this patch is ready!

jag (Peter Annema)

Assignee

Comment 25

•

22 years ago

Attached patch Using a better check (thanks scc). — Details — Splinter Review

Attachment #83449 - Attachment is obsolete: true

scottputterman

Comment 26

•

22 years ago

adding adt1.0.0+ pending r=/sr= on the final patch.  Please get drivers approval
before checking into the 1.0 branch.

Keywords: adt1.0.0+

Darin Fisher

Updated

•

22 years ago

Attachment #83613 - Flags: superreview+

Darin Fisher

Comment 27

•

22 years ago

Comment on attachment 83613 [details] [diff] [review]
Using a better check (thanks scc).

r/sr=darin

Scott Collins

Comment 28

•

22 years ago

Comment on attachment 83613 [details] [diff] [review]
Using a better check (thanks scc).

r=scc

Attachment #83613 - Flags: review+

jag (Peter Annema)

Assignee

Comment 29

•

22 years ago

Checked in on trunk and branch.

Keywords: fixed1.0.0

jag (Peter Annema)

Assignee

Comment 30

•

22 years ago

Attached patch I knew I had forgotten something. — Details — Splinter Review

Minor change.

Scott Collins

Comment 31

•

22 years ago

Comment on attachment 83714 [details] [diff] [review]
I knew I had forgotten something.

sr=scc

Attachment #83714 - Flags: superreview+

Alec Flett

Comment 32

•

22 years ago

Comment on attachment 83714 [details] [diff] [review]
I knew I had forgotten something.

r=alecf

Attachment #83714 - Flags: review+

jag (Peter Annema)

Assignee

Comment 33

•

22 years ago

Apparently Tstartup went up a little as a result of my first checkin, see bug
144863.

Depends on: 144863

jag (Peter Annema)

Assignee

Comment 34

•

22 years ago

Fix has been checked in. I'll deal with Tstart issues in bug 144863.

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Scott Collins

Comment 35

•

22 years ago

Attached file outline of the advanced algorithm + using |GetReadableFragment| directly, smarter tests, and optional optimizations for comparison — Details

Use |GetReadableFragment| instead of iterators (I should have remembered that
would be faster.  Re-ordered some tests, where putting the more likely test
first can short-circuit the whole expression.  Fixed one test that needed to be
a little stronger.  Added some optional optimizations (intent indicated with
|#if|), that could be measured.

With respect to the start-up slow down reported (see bug 144863), I don't think
sticking this algorithm in will help --- except for one thing easily inserted
into jag's patch: the |Empty()| tests at the top.  It seems likely to me that
during startup, a lot of string assignment is into empty strings.  This will
short circuit that case.  It would be easy to intrument this in a debug build
by having |IsDependentOn| just count the total number of times it's called, and
the total number of times it's called on empty destination.  If the latter
dominates  other uses of |IsDependentOn|, add the |Empty| test.

Attachment #83551 - Attachment is obsolete: true

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 36

•

22 years ago

What about a string that's a dependent concatenation of multiple empty strings?

Scott Collins

Comment 37

•

22 years ago

Attached file the heap sort and overlap resolution phase — Details

Here's working code demonstrating the heap sort and overlap resolution phase of
the algorithm above

Nobody; OK to take it and work on it

Updated

•

3 years ago

Component: String → XPCOM

You need to log in before you can comment on or make changes to this bug.