Created attachment 583162 [details] Test case From http://krijnhoetmer.nl/irc-logs/whatwg/20111220#l-590 <annevk> in shift_jis <annevk> 84 3C 73 63 72 69 70 74 20 84 3E <annevk> gives <script> in Gecko/Chrome, but "�script �" in Opera This might confuse blacklist-based XSS filters (that are inherently unsafe, of course), so doing what Opera does would be on the safe side.
I'm not sure if this is a real issue. If you remove the 0x20 for the space, the script doesn't execute. The 0x84 before the > is being parsed as an attribute name. This would be similar to using <script a> to bypass a blacklist filter.
Thinking about it more, the issue may be that we are interpreting the 0x84 0x3C and 0x84 0x3E sequences as individual bytes rather than as one character.
Masatoshi-san, would this problem be fixed by implementing the Encoding Standard for Shift_JIS (bug 747762) ?
new TextDecoder("shift_jis").decode(new Uint8Array([0x84,0x3C,0x73,0x63,0x72,0x69,0x70,0x74,0x20,0x84,0x3E])) "�<script �>" But I don't think this needs to be "fixed". - The Encoding Standard requires this behavior. - Now virtually all browsers (including Blink Opera) are "vulnerable" to this. - No valid shift_jis sequence uses 0x3C/0x3E as a second byte. If some XSS filters miss this sequence, it should be considered as a serious bug of the filters. - It will lead other vulnerability if we eat the second byte unconditionally. (consider <a href="<0x84>">). I suggest WONTFIX.
Thanks. I'm resolving as invalid since the Encoding Standard requires this behavior it's not a bug. Henri, please raise a spec issue if you think there's something wrong with the required behavior.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → INVALID
Anne, is the state of the Encoding Standard on this topic intentional?
Partially. "Eating" the second byte if that is ASCII is itself a vulnerability as emk points out.
You need to log in before you can comment on or make changes to this bug.