Apparently, expat does eat utf-8 BOMs in DTDs fine, so let's add that to the DTDParser. py only. Filing a bug as there is some naughtiness to be documented: The UTF-8 BOM '\xef\xbb\xbf' is parsed by the codecs.open line, and is converted to u'\ufeff'. The endianess of the platform doesn't pose an issue here, as I verified by testing on a Windows XP PC and a Mac PPC. I.e., codecs.BOM differs with '\xff\xfe' and '\xfe\xff', resp., but the unicode char is the same. I'll attach a patch for reference.
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.