Add JSAPI for accessing strings in a rope

RESOLVED INVALID

Status

()

Core
JavaScript Engine
RESOLVED INVALID
2 years ago
2 years ago

People

(Reporter: mwu, Assigned: mrrrgn)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

2 years ago
Being able to access the branches in the rope lets us iterate over the contents of the string without making a copy to a linear string.
(Assignee)

Updated

2 years ago
Assignee: nobody → winter2718
A couple questions to resolve here.

Is it okay for the JSAPI access to be rooting-unsafe, or to require no heap accesses happen while the iteration is in progress?  That simplifies life somewhat, because it means the iteration doesn't have to take extra steps to ensure the string being iterated over (or its subcomponents) will disappear underneath iteration.

What's the strategy to track current iteration state?  Ropey stuff doesn't point to its parent (there might be multiple parents), so you can't just do it saving just the current node.  Gobbling up all the parts into a vector is easy but may involve allocation, which may well defeat the purpose of this JSAPI, depending.  If you can guarantee no heap accesses, you probably can mutate the pointer contents of the JSString structure in-place to track iteration state.  JSRope::flattenInternal does this right now.  But it's horrifically complicated and tricky (and much worse since moving GC/barriers came into play), and I wouldn't wish upon my worst enemies debugging mistakes made if someone did access the string in the heap.  :-)
Suppose we do this:

    namespace JS {
    JSString* NewRope(JSContext* cx, StringHandle left, StringHandle right);
    bool StringIsRope(JSContext* cx, StringHandle str);
    void GetRopeHalves(JSContext* cx, MutableStringHandle left, MutableStringHandle right);
    }

mwu: Is that good enough? Every string is either a rope or a JSLinearString.
Flags: needinfo?(mwu)
(Assignee)

Comment 3

2 years ago
Created attachment 8697037 [details] [diff] [review]
ropeapi.diff

Still double checking my usage of jsapi here since I'm a newb. Hopefully this is close.
Attachment #8697037 - Flags: feedback?(jorendorff)
(Assignee)

Comment 4

2 years ago
Created attachment 8697052 [details] [diff] [review]
ropeapi.diff
Attachment #8697052 - Flags: review?(jorendorff)
(Assignee)

Updated

2 years ago
Attachment #8697037 - Flags: feedback?(jorendorff)
(Reporter)

Comment 5

2 years ago
Weird, I remember commenting here. Where did the comment go...

I like the API, though JS::NewRope seems to be redundant with JS_ConcatStrings. Maybe JS::NewRope gives you a better guarantee that you'll get a rope back? (In which case, DOM can abuse js ropes as a binary tree which could be pretty fun.)
Flags: needinfo?(mwu)
(In reply to Michael Wu [:mwu] from comment #5)
> I like the API, though JS::NewRope seems to be redundant with
> JS_ConcatStrings. Maybe JS::NewRope gives you a better guarantee that you'll
> get a rope back? (In which case, DOM can abuse js ropes as a binary tree
> which could be pretty fun.)

(ConcatStrings also ensures the length does not exceed JSString::MAX_LENGTH, and optimizes things like x + "" to just return x.)
Comment on attachment 8697052 [details] [diff] [review]
ropeapi.diff

Review of attachment 8697052 [details] [diff] [review]:
-----------------------------------------------------------------

OK, r=me, but please don't land this yet! I want to hear from mwu first whether or not this is helpful.

::: js/src/jsapi.cpp
@@ +4772,5 @@
> +JS::GetRopeHalves(JSContext *cx, HandleString rope, MutableHandleString left, MutableHandleString right) {
> +    AssertHeapIsIdle(cx);
> +    CHECK_REQUEST(cx);
> +
> +    MOZ_ASSERT(rope->isRope());

Please delete this assertion. The house style is

*   don't assert things if they're about to be asserted anyway, and

*   all downcast-methods, like asRope(), do the assertion for you.
Attachment #8697052 - Flags: review?(jorendorff) → review+
(Assignee)

Comment 8

2 years ago
(In reply to Jan de Mooij [:jandem] from comment #6)
> (In reply to Michael Wu [:mwu] from comment #5)
> > I like the API, though JS::NewRope seems to be redundant with
> > JS_ConcatStrings. Maybe JS::NewRope gives you a better guarantee that you'll
> > get a rope back? (In which case, DOM can abuse js ropes as a binary tree
> > which could be pretty fun.)
> 
> (ConcatStrings also ensures the length does not exceed JSString::MAX_LENGTH,
> and optimizes things like x + "" to just return x.)

I'll add that check into JS::NewRope, I think it makes sense to have a function that _only_ returns ropes since concat can't guarantee that.
(Assignee)

Comment 9

2 years ago
Created attachment 8697372 [details] [diff] [review]
ropeapi.diff

Sorry for the extra review, I went ahead and added zone and max string length checks to JS::NewRope.
Attachment #8697037 - Attachment is obsolete: true
Attachment #8697052 - Attachment is obsolete: true
Attachment #8697372 - Flags: review?(jorendorff)
After talking with mwu about his use case, I think we probably shouldn't land this.


## Background

mwu's plan is for Servo to support:

1.  doing document.createTextNode(s) on the main thread, where s is a rope
2.  running layout on a background thread and laying out the text without copies

GC and JS would be disabled during (2). Ultimately, mwu wants to allow scripts to run on the main thread, racing freely with (2) as long as they don't touch the DOM or do GC; there would be some locking around the DOM to prevent bad races.


## Is this a good idea?

Runtimes are single-threaded. The JSAPI asserts like mad if you use it off-thread (mwu has disabled "one" assertion to get this to work, I don't know where).

It's generally not a super great idea to use a massive single-threaded C++ codebase in a multi-threaded way, reasoning that you understand well enough what the library does with memory[*] and layering your own locking on top. It's C++: there's mutation in unexpected places. We all agreed that accessing ropes while main-thread content can touch them is unsafe; there are less obvious uses of state in SM too.

And yet if we ever want to self-host the DOM in JS for real, we'll need to access data off-thread.


## Conclusion

For now, mwu can linearize the strings on the main thread.

If self-hosting the DOM wins out, we'll have to offer new API for off-thread access.



[*]: A locking scheme requires the cooperation of all code accessing the lock-protected memory. Making this explicit and enforcing it with the borrow checker is what makes Rust's Mutex safe.
(Assignee)

Updated

2 years ago
Attachment #8697372 - Flags: review?(jorendorff)
(Assignee)

Comment 11

2 years ago
Putting this one on the shelf. :)
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.