Closed Bug 1009466 Opened 11 years ago Closed 10 years ago

Add a wrapper class for JSString char access

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: jandem, Assigned: jandem)

References

Details

For bug 903519 (allocating strings in the nursery) we need to ensure there are no raw jschar* pointers on the stack (for chars owned by JSStrings at least). After discussing this with Terrence we decided to add some RAII class that holds a jschar*, and this pointer will be updated when a GC moves the string. Bug 998392 will be touching a lot of the same code and this new char wrapper could make it easier to support multiple encodings. I want to prototype something and see how it works out.
After looking at some code in jsstr.cpp, here are a few thoughts: (1) In many/most cases, no GC can happen while we are using the chars. (2) There are a few functions, like UnrolledMatch, that do pointer arithmetic etc and need high performance. To support these use cases, we could add "raw" types like this: class TwoByteCharPtr { jschar *chars_; } class Latin1CharPtr { jschar8 *chars_; } // Or UTF8 etc. class CharPtr { union of char* types + enum value jschar operator[](size_t i) const; } And have these implement operator+ etc for pointer arithmetic. Then the analysis could treat these as GC hazards if they are used across GC calls. Terrence is this possible? I think this would take care of most string->chars() calls, with pretty much no overhead. In debug builds these types would also give us bounds checks. If we *can* GC while holding the chars (or don't care about the overhead), we use another variant that does the GC magic: RootedTwoByteCharPtr ptr = string->twoByteChars(); RootedCharPtr ptr = string->chars(); TwoByteCharPtr can be assigned to RootedTwoByteCharPtr; RootedTwoByteCharPtr can be passed to helper functions that take TwoByteCharPtr etc. Thoughts?
Flags: needinfo?(terrence)
(In reply to Jan de Mooij [:jandem] from comment #1) > After looking at some code in jsstr.cpp, here are a few thoughts: > > (1) In many/most cases, no GC can happen while we are using the chars. > (2) There are a few functions, like UnrolledMatch, that do pointer > arithmetic etc and need high performance. > > To support these use cases, we could add "raw" types like this: > > class TwoByteCharPtr { jschar *chars_; } > class Latin1CharPtr { jschar8 *chars_; } // Or UTF8 etc. > class CharPtr { > union of char* types + enum value > jschar operator[](size_t i) const; > } The type=encoding classes already exist in js/public/CharacterEncoding.h. There are variants for Latin1, UTF8, and TwoByteChars in both null-terminated and pointer-range varieties. They are based on mozilla::Range/RangePtr, so we get efficient access and range assertions automatically. > And have these implement operator+ etc for pointer arithmetic. Then the > analysis could treat these as GC hazards if they are used across GC calls. > Terrence is this possible? Absolutely. That's an extremely elegant solution, too! With that, we would get protection against unwanted GC's and at the same time let C++ types ensure we have the right string encoding at all times. You just need to add the new names to |isRootedPointerTypeName| in js/src/devtools/rootAnalysis/annotations.js; should be trivial. > I think this would take care of most string->chars() calls, with pretty much > no overhead. In debug builds these types would also give us bounds checks. > > If we *can* GC while holding the chars (or don't care about the overhead), > we use another variant that does the GC magic: > > RootedTwoByteCharPtr ptr = string->twoByteChars(); > RootedCharPtr ptr = string->chars(); > > TwoByteCharPtr can be assigned to RootedTwoByteCharPtr; RootedTwoByteCharPtr > can be passed to helper functions that take TwoByteCharPtr etc. In theory, we should be able to implement GCMethods<TwoByteChars> and get these classes for free. At the moment, no extra marking would be needed for these, since the chars can't move. When we do move them, the owning string would simply traverse the Rooted<TwoByteChars> list and forward the chars manually. > Thoughts? As always, we're thinking in the same direction. Your idea of making the underlying chars types into GC pointers for the analysis is brilliant. It should make this conversion straightforward.
Flags: needinfo?(terrence)
(In reply to Terrence Cole [:terrence] from comment #2) > The type=encoding classes already exist in js/public/CharacterEncoding.h. > There are variants for Latin1, UTF8, and TwoByteChars in both > null-terminated and pointer-range varieties. They are based on > mozilla::Range/RangePtr, so we get efficient access and range assertions > automatically. That's true, but those are probably also used for chars that are not owned by strings and we don't want the static analysis to complain about that. Maybe I could implement the other classes in terms of these though... Thanks for the feedback!
Good point! The analysis should be seeing the declared names, so even a typedef would work there, although it would be nice to have the compiler catch accidental casts.
I'm now beginning to doubt we really need a "rooted char" class at all, if we have an "unrooted char" type that the static analysis can check. In many cases you could just store an index or something, for instance ReplaceData::dollar/dollarEnd. I'll see how far we get with that.
Depends on: 1011693
I'm guessing this really isn't necessary, given the state of the latin1 patches.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.