Open Bug 316338 Opened 18 years ago Updated 1 year ago
Possible to introduce bogus UTF-16 into Gecko
This shows the three problems; it crashes on shutdown as we try to remove some of the atoms involved from the atom hashtables.
(In reply to comment #0) > There's also the following comment in our "UCS216ToUnicode" converter: > > // XXX : illegal surrogate code points are just passed through !! > and also in UTF32ToUnicode: // XXX Do we have to convert surrogate code points to the replacement // character (0xfffd)? > but writing a testcase for that didn't seem to be worth the trouble and that > might be fodder for a separate bug. Yes, please :)
Filed bug 317216
roc said something to me that sounded like he disagreed about this being a bug.
To keep text nodes from containing invalid UTF-16, the following functions might need checks: * document.createTextNode() * element.textContent = ... * textNode.data = ... * textNode.splitText() * textNode.insertData(), etc * The bug 194231 behavior
Or we could just thrown on invalid UTF-16 in XPConnect... or in the JS engine. That would help a lot more than whack-a-moling all sorts of DOM methods. We'd still need to deal with splitText/insertData. But that's preatty easy -- just have to check for a surrogate pair around the insertion/split point.
Shouldn't the DOM spec say what to do with invalid UTF16?
Yeah, INVALID_CHARACTER_ERR should be thrown. It's not explicitly mentioned in all cases where it's needed, but that's certainly the intent.
We're going to have to whack-a-mole these, so any help with coming up with more APIs that needs fixing would be useful. jesse and I came up with these ones as well: * .innerHTML * loading an XML file with illegal UTF-16 * .setAttribute along with CSS to get the attribute displayed * :before/:after and the CSS content property That said, do we know that this is actually a problem. I really don't like to play whack-a-mole so it'd be great if we could fix the places in our code that can't deal with illegal utf-16 rather than to try to prevent it.
Assignee: dougt → jonas
Isn't Johnny saying in comment #11 that we need to whack-a-mole in order to be standards compliant? We can certainly make text rendering handle invalid UTF16, as a backstop. I actually don't know of any outstanding bugs in that area.
Once bug 377360 is fixed, I'll be able to test and find out whether text rendering has issues with lonely surrogates and such ;)
Pretty much any place that uses atoms can't deal with illegal UTF-16. And if we do the checking in either XPConnect or better yet in the JS engine (e.g. at initial String object creation time), it shouldn't be too whack-a-moly... I don't know whether JS can actually enforce, this, of course, since as far as it's concerned there is nothing outside the BMP... :( > * :before/:after and the CSS content property That comes through the CSS parser. Which doesn't allow through stuff that's not actually characters.... Unless the CSS starts out encoded in UTF16 and containing bogus bytes to start with, of course; see the part about the converted in comment 0.
You can't modify :before/:after through CSSOM? Jst says we can't block this at a JS level.
> You can't modify :before/:after through CSSOM? Hmm.... You can, yes. That basically comes back to the "UTF16 data that is bogus to start with" case as far as the CSS parser is concerned... Can we block this in XPConnect? e.g. have a flag on JSStrings if they're valid UTF16, have JS engine propagate it if desired (e.g. when concatenating two valid strings the concatenation is valid), and have XPConnect check strings that don't have the flag set before passing them into Gecko (and flag them if they're ok)? Or would that cost too much?
(In reply to comment #13) > Isn't Johnny saying in comment #11 that we need to whack-a-mole in order to be > standards compliant? I'm fine with fixing the DOM APIs as well, I'm just not comfortable with relying on that for security.
Assignee: jonas → nobody
You need to log in before you can comment on or make changes to this bug.