The current DTD Checks don't know that < and it's XML-defined friends are totally fine. That's both stupid and easy to fix. Taking.
Created attachment 483198 [details] [diff] [review] exclude list of pre-defined xml entities, too. I'm adding the known xml entities amp, lt, gt, apos, and quot. They're used in two places, once, they're stripped from the reflist entities, as they don't need to be added to the list of known entities. XML parsers know those themselves. Secondly, I'm removing them from the l10n referenced entities, for the same reason. That's having two fall-outs: - We're not warning about quots etc anymore, if they're not in en-US. - We're not offering quots etc anymore as "known" entities, if there's a missing one in l10n and en-US would use one of the xml entities. Trying stas with a review request, feel free to punt.
please, also add : the French locale uses it a lot. Thanks
…and maybe bdquo, rdquo.
Comment on attachment 483198 [details] [diff] [review] exclude list of pre-defined xml entities, too. >+ xmllist = set(('amp', 'lt', 'gt', 'apos', 'quot')) Calling this a "list" is a bit misleading, since it's a set. I'd go for something like `xml_accepted`. > reflist = set(m.group(1).encode('utf-8') >- for m in self.eref.finditer(refValue)) >+ for m in self.eref.finditer(refValue)) - self.xmllist Entirely optional and subjective, but maybe split this into 2 statements for better readability? reflist -= self.xml_accepted or even reflist = reflist.difference(self.xml_accepted) Again, entirely your call. Also, do you want to add a test for this?
Re comment 2 and 3, All of those are HTML entities, and you'll die if you use them in XUL. I'm really warning on those on purpose, but you'll continue to see those warnings. You can game the system, if you're converting your references to numeric entity refs, aka Ӓ (use the right number) Regarding the review comment, yeah, the choice of ...list is awkward, I went for minimal impact. All the foolist are sets, so I made it xmllist, too. Otherwise I'd have to rename both reflist and l10nlist, too. I do prefer the command to be a single command instead of piecing it into two, because that's conceptually more like it, IMHO. I could do a \ and put the - self.xmllist in a new line, though. Tests, hrm, yeah, totally slipped my eye that I have unitChecks.py, let me dig into that.
(In reply to comment #5) > Regarding the review comment, yeah, the choice of ...list is awkward, I went > for minimal impact. All the foolist are sets, so I made it xmllist, too. > Otherwise I'd have to rename both reflist and l10nlist, too. OK, I vote for consistency here then. Leave 'xmllist'. > I do prefer the command to be a single command instead of piecing it into two, > because that's conceptually more like it, IMHO. I could do a \ and put the - > self.xmllist in a new line, though. A backslash could do it. I guess the reason why I'd prefer it is that I didn't notice at first that you were making a set out of that list comprehension, and I wondered how come the "-" operator worked on a list and a set :) Not a big deal, honestly, I'd leave it as it is right now if you prefer it that way.
Tests added, and moved -xmllist to a new line. http://hg.mozilla.org/build/compare-locales/rev/85f239b38def, marking FIXED. It's also deployed. Not yet in a public release, prolly do a 0.9.1 soon.