Closed Bug 654301 Opened 13 years ago Closed 13 years ago

TM: JSAPI's intern handling looks wrong

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: cdleary, Assigned: cdleary)

References

Details

(Whiteboard: [sg:high][fixed-in-tracemonkey])

Attachments

(1 file, 3 obsolete files)

The API contract for interned strings isn't very explicit. The constituent members are:

- JS_StringHasBeenInterned(str)
- JS_InternJSString(cx, str)
- JS_InternString(cx, cstr)
- JS_InternUCStringN(cx, jschars, len)
- JS_InternUCString(cx, jschars)
- JS_GetInternedStringChars(str)
- JS_GetInternStringCharsAndLength(str, plen)

JS_StringHasBeenInterned just tests whether a value is an atom.
JS_InternJSString calls js_AtomizeString with no flags.
JS_InternString call js_Atomize with ATOM_INTERNED.
JS_InternUCStringN calls js_AtomizeChars with ATOM_INTERNED.
JS_InternUCString calls JS_InternUCStringN.

jorendorff pointed out that JS_InternJSString probably has a bug in that it doesn't pass an ATOM_INTERNED flag to js_AtomizeString. However, even if it did, js_AtomizeString doesn't mutate the flags in the atom state if str->isAtom already. For users that rely on interning strings to effectively root string values, this causes GC hazards.

Some notes on the wonderful world of atoms:

- Static atoms actually live in static program memory as static const members of JSAtom (in jsstr.h) -- as a result, they're trivially pinned.
- The atom state is just a big hash set with two flag bits hacked into it: ATOM_PINNED and ATOM_INTERNED. Both of these flags indicate that the GC should not collect the flagged atoms (even if they are unmarked by the GC marking phase).
- ATOM_PINNED is used by the engine internally to prevent GC of commonly used atoms. For example, lazy resolution of standard classes in JS_ResolveStandardClass calls StdNameToAtom which creates pinned atom entries in the atom state table and memoizes them in atom state struct slots.
- Calls to Atomize that succeed in the lookup correctly OR in the flags that have been received from the atomize call.
- On the first js_NewContext InitCommonAtoms is called, performing a js_Atomize with the ATOM_PINNED flag for all the common atoms. InitCommonAtoms also NULLs out the lazy atom memoization slots in the atom state.
- On the last js_DestroyContext FinishCommonAtoms is called, which clears out the ATOM_PINNED bit on every atom in the atom state hash set.

Issues:

- The concept of "interning" a string via the JSAPI inconsistently acts as a rooting mechanism.
- The distinction between ATOM_PINNED and ATOM_INTERNED only seems necessary if the API user can keep atoms across the last js_DestroyContext. All of the ATOM_PINNED flags get cleared out at that point, and you need to distinguish the atoms pinned by the engine internals from those pinned via the API (during a potential GC that occurs during the first JSContext creation).
- If the user wants to be able to "temporarily" get O(1) comparison capability for a set of strings, there is no way of doing it without interning (and leaking the memory for, until JSRuntime destruction) all of that set of strings.

Good things:

- If the API tells you that a string is interned you are guaranteed O(1) comparison against other interned strings.

Proposed changes, assuming that we want atoms to be able to persist across the last context destruction:

- Make JS_StringHasBeenInterned check that the ATOM_INTERNED flag is set.
- Make JS_InternJSString pass ATOM_INTERNED to js_AtomizeString and make js_AtomizeString update the flag set for an existing atom entry if a non-zero flags argument is provided.

Marked as sensitive because presumed interned-string roots that get GC'd could lead to arbitrary memory reads.
Whiteboard: [sg:high]
Blocks: 653083
(In reply to comment #0)
> - Make JS_StringHasBeenInterned check that the ATOM_INTERNED flag is set.

That makes sense.

> - Make JS_InternJSString pass ATOM_INTERNED to js_AtomizeString and make
> js_AtomizeString update the flag set for an existing atom entry if a non-zero
> flags argument is provided.

Definitely -- this fixes what I find scary here.

I think I also favor ripping out JSATOM_PINNED and interning the common atoms.
- API fixes.
- Unify pinning categories as "intern" to match the API nomenclature.
- C++ify atom state entries.
- Get rid of atom flags. I always hated those trailing zero arguments.

Now have to build against the browser... this part is a little trickier because taking the atom compartments lock to do the intern lookup requires a cx request whereas before we pretended it was a property of the string itself.

The other uses are kind of troubling -- seem to just be using "interned" to mean "atomized": http://mxr.mozilla.org/mozilla-central/ident?i=INTERNED_STRING_TO_JSID -- phase 2 of the patch will address those.
Pushed to try.
Looked good on try. Sorry it's smushed together -- I accidentally the qfold.
Attachment #531200 - Attachment is obsolete: true
Attachment #531201 - Attachment is obsolete: true
Attachment #531472 - Flags: review?(jwalden+bmo)
Comment on attachment 531472 [details] [diff] [review]
Logically consistent string interning.

Review of attachment 531472 [details] [diff] [review]:
-----------------------------------------------------------------

I feel like I've not really, fully, completely reviewed this even given that it seems to be mostly mechanical changes to remove a trailing parameter, but I don't see anything actually *wrong* with it...so, um, yes.  r+

::: js/src/jsatom.cpp
@@ +446,5 @@
> +/*
> + * This call takes ownership of 'chars' if ownChars is set.
> + *
> + * FIXME: but if it turns out those chars are in static we drop them on the
> + * floor -- is this bad?

Please file a followup on this, we should go through the callers and just fix this, shouldn't be too difficult.

::: js/src/jsatom.h
@@ +634,5 @@
>   * Find or create the atom for a string. Return null on failure to allocate
>   * memory.
>   */
>  extern JSAtom *
> +js_AtomizeString(JSContext *cx, JSString *str, bool intern = false);

The bool parameter here should be a named enum like enum InternBehavior { Intern, DontIntern } for self-documenting code.

@@ +639,4 @@
>  
>  extern JSAtom *
> +js_Atomize(JSContext *cx, const char *bytes, size_t length,
> +           bool intern = false, bool useCESU8 = false);

Same InternBehavior thing here.  Also do the same for the useCESU8 parameter.  I'd even argue it'd be best for the useCESU8 behavior to be a differently-named method, personally.

@@ +643,3 @@
>  
>  extern JSAtom *
> +js_AtomizeChars(JSContext *cx, const jschar *chars, size_t length, bool intern = false);

InternBehavior again.
Attachment #531472 - Flags: review?(jwalden+bmo) → review+
http://hg.mozilla.org/tracemonkey/rev/7fe5fb2450f4

Now working on a quick followup bug to behaviorify the CESU8 thing: bug 657537.
Whiteboard: [sg:high] → [sg:high][fixed-in-tracemonkey]
Backed out temporarily.
Whiteboard: [sg:high][fixed-in-tracemonkey] → [sg:high]
http://hg.mozilla.org/tracemonkey/rev/28a1e03302e6
Whiteboard: [sg:high] → [sg:high][fixed-in-tracemonkey]
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 659902
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: