Created attachment 460178 [details] Notes from making the other patch From bug 572487 comment 4: > On the extended testcase attachment 341759 [details], though, there are a bunch of places > where Firefox draws "missing-glyph boxes" (e.g., tries to render the glyph) > where it seems like it would make sense to not decode at all (e.g., %00 to > %08). Safari doesn't decode any of the characters/codepoints in that testcase, > except, oddly, U+FFFC. > > Also, it seems like it would be a cleaner solution if we could create a list > that excluded whole Unicode ranges (where applicable) and then add the specific > other characters we want to exclude, rather than listing each > character/codepoint individually. (We could pick up a very few using > controlCharacterSet and whitespaceAndNewlineCharacterSet, but I'm very leery of > using illegalCharacterSet, since it's everything that's illegal or was not > defined in *Unicode 3.2* and we're on Unicode *5.2*, with lots of new > characters defined.) We can't use NSCharacterSets, because we can't get the contents of an NSCharacterSet into an NSString, and the NSString/CFStringRef functions don't support ranges. In the meeting, Stuart suggested writing some code that will iterate or loop through a range, and that's probably what we'll end up needing here. I've attached some notes from when I was making attachment 453245 [details] [diff] [review] (the non-fix with NSCharacterSet); that patch has a better collection of ranges, but these notes were somewhat useful when I was creating them, and I want the notes not to be lost.
I think now all we're missing after take 2 from bug 572487 is * unassigned (u+fff0 to u+fff8) <-- FFEF - FFF8 * FDD0 to FDD7, FDDA to FDDF <-- FDC8-FDDF * Noncharacters 1ffff-fffff, 10ffff * U+E0000 to U+E007F but when we go to finish this, we should double-check.