Last Comment Bug 604404 - [compare-locales] DTD Checks need to learn about XML pre-defined entities
: [compare-locales] DTD Checks need to learn about XML pre-defined entities
Status: RESOLVED FIXED
:
Product: Mozilla Localizations
Classification: Client Software
Component: Infrastructure (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: Axel Hecht [:Pike]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-14 09:38 PDT by Axel Hecht [:Pike]
Modified: 2010-10-14 13:31 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
exclude list of pre-defined xml entities, too. (1.21 KB, patch)
2010-10-14 10:25 PDT, Axel Hecht [:Pike]
stas: review+
Details | Diff | Splinter Review

Description Axel Hecht [:Pike] 2010-10-14 09:38:07 PDT
The current DTD Checks don't know that < and it's XML-defined friends are totally fine.

That's both stupid and easy to fix. Taking.
Comment 1 Axel Hecht [:Pike] 2010-10-14 10:25:00 PDT
Created attachment 483198 [details] [diff] [review]
exclude list of pre-defined xml entities, too.

I'm adding the known xml entities amp, lt, gt, apos, and quot.

They're used in two places, once, they're stripped from the reflist entities, as they don't need to be added to the list of known entities. XML parsers know those themselves.
Secondly, I'm removing them from the l10n referenced entities, for the same reason.

That's having two fall-outs:

- We're not warning about quots etc anymore, if they're not in en-US.
- We're not offering quots etc anymore as "known" entities, if there's a missing one in l10n and en-US would use one of the xml entities.

Trying stas with a review request, feel free to punt.
Comment 2 Cédric Corazza 2010-10-14 10:32:52 PDT
please, also add   : the French locale uses it a lot.
Thanks
Comment 3 Stefan Plewako [:stef] 2010-10-14 10:47:46 PDT
…and maybe bdquo, rdquo.
Comment 4 Staś Małolepszy :stas 2010-10-14 10:57:35 PDT
Comment on attachment 483198 [details] [diff] [review]
exclude list of pre-defined xml entities, too.

>+    xmllist = set(('amp', 'lt', 'gt', 'apos', 'quot'))

Calling this a "list" is a bit misleading, since it's a set. I'd go for something like `xml_accepted`.

>         reflist = set(m.group(1).encode('utf-8') 
>-                      for m in self.eref.finditer(refValue))
>+                      for m in self.eref.finditer(refValue)) - self.xmllist

Entirely optional and subjective, but maybe split this into 2 statements for better readability?

  reflist -= self.xml_accepted

or even

  reflist = reflist.difference(self.xml_accepted)

Again, entirely your call.

Also, do you want to add a test for this?
Comment 5 Axel Hecht [:Pike] 2010-10-14 11:02:36 PDT
Re comment 2 and 3,

All of those are HTML entities, and you'll die if you use them in XUL.

I'm really warning on those on purpose, but you'll continue to see those warnings.

You can game the system, if you're converting your references to numeric entity refs, aka Ӓ (use the right number)

Regarding the review comment, yeah, the choice of ...list is awkward, I went for minimal impact. All the foolist are sets, so I made it xmllist, too. Otherwise I'd have to rename both reflist and l10nlist, too.

I do prefer the command to be a single command instead of piecing it into two, because that's conceptually more like it, IMHO. I could do a \ and put the - self.xmllist in a new line, though.

Tests, hrm, yeah, totally slipped my eye that I have unitChecks.py, let me dig into that.
Comment 6 Staś Małolepszy :stas 2010-10-14 11:09:46 PDT
(In reply to comment #5)
> Regarding the review comment, yeah, the choice of ...list is awkward, I went
> for minimal impact. All the foolist are sets, so I made it xmllist, too.
> Otherwise I'd have to rename both reflist and l10nlist, too.

OK, I vote for consistency here then. Leave 'xmllist'.

> I do prefer the command to be a single command instead of piecing it into two,
> because that's conceptually more like it, IMHO. I could do a \ and put the -
> self.xmllist in a new line, though.

A backslash could do it. I guess the reason why I'd prefer it is that I didn't notice at first that you were making a set out of that list comprehension, and I wondered how come the "-" operator worked on a list and a set :) Not a big deal, honestly, I'd leave it as it is right now if you prefer it that way.
Comment 7 Axel Hecht [:Pike] 2010-10-14 13:31:54 PDT
Tests added, and moved -xmllist to a new line.

http://hg.mozilla.org/build/compare-locales/rev/85f239b38def, marking FIXED.

It's also deployed.

Not yet in a public release, prolly do a 0.9.1 soon.

Note You need to log in before you can comment on or make changes to this bug.