Closed Bug 726994 Opened 12 years ago Closed 12 years ago

A valid DOCTYPE in install.rdf should be allowed

Categories

(addons.mozilla.org Graveyard :: Add-on Validation, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jwkbugzilla, Assigned: basta)

References

()

Details

The next versions of my add-ons should have a DOCTYPE in their install.rdf file, the install.rdf template can be seen here: https://hg.adblockplus.org/buildtools/file/01447dbda7f2/install.rdf.tmpl. I noticed however that AMO rejects such add-ons. See https://addons.mozilla.org/en-US/developers/upload/c48ff0660fea42d1b6edcbe6819588a7 for example:

> DOCTYPEs are not permitted in install.rdf
> Error: The add-on's install.rdf file contains a DOCTYPE. It must be removed
> before your add-on can be validated.

No explanation as to why a DOCTYPE is problematic but the problem here is apparently bug 635092.

I can understand why a DOCTYPE referencing a chrome:// URL would be an issue - already because it will break for a disabled extension. However, I am using the DOCTYPE like a "macro", to avoid repeating the same lengthy list of contributors and translators for each of the 45 locales. This DOCTYPE should work fine both with Gecko as well as rdflib. The idea was to drop the custom About dialog currently used by Adblock Plus and Element Hiding Helper in favor of the built-in one.
A simple solution would be to try RDFParser(install_rdf_data) first and only look for a DOCTYPE if parsing failed (as well as produce a meaningful error message for other parsing errors).
Sounds good to me. Matt, is this something that can be implemented soon?
I'm not sure that this is a good idea. Allowing DOCTYPE use could result in fragmentation of install manifest code. This could lead to install.rdf code being hidden throughout other parts of an XPI/JAR, which could easily be missed by the editors.

From a technical standpoint, rdflib does not provide any mechanism for accessing chrome:// URIs (obviously), so packaged doctype references would need to be loaded and dealt with manually. The only type of DOCTYPE which even could work automatically is a doctype with an http(s):// URI. That's not acceptable, though, since it poses a security risk both externally (the developer could load arbitrary information after review) and internally (the validator would be making HTTP requests). This means that rdflib could not, in fact, handle doctypes on its own.

For what you're using it for, Wladimir, I'd be curious if there was a better means of achieving the same result. Are you simply adding the same set of translators/contributors to each <em:localized> element? Why can't you simply list them once in the body of the manifest as "standalone" <em:translator> and <em:contributor> elements? Unless the translator/contributor names are transliterated or changed in some way between locales, I'm not sure what the purpose of duplicating the information is.
(In reply to Matt Basta from comment #3)
> I'm not sure that this is a good idea. Allowing DOCTYPE use could result in
> fragmentation of install manifest code. This could lead to install.rdf code
> being hidden throughout other parts of an XPI/JAR, which could easily be
> missed by the editors.

Well, there isn't really much to hide in install.rdf, it's mostly metadata. I agree that it's necessary to forbid using remote or chrome URLs, which I understand is difficult to do.

(In reply to Matt Basta from comment #3)
> Unless the
> translator/contributor names are transliterated or changed in some way
> between locales, I'm not sure what the purpose of duplicating the
> information is.

If you add an <em:localized> block in install.rdf, there are certain values that *have* to be in it, otherwise they won't show up at all. The Add-ons Manager doesn't have the necessary logic to look for "common" elements and apply them to all locales.
> The Add-ons Manager doesn't have the necessary logic to look for "common" elements and apply them to all locales.

It would probably be a good idea to file a bug for Firefox/Thunderbird that says to recognize when there are <em:(translator|contributor)> elements in an install.rdf but not in the <em:localized> elements, to display the root translators and contributors in those locales. It doesn't make any sense that contributors to the default locale wouldn't also be contributors to other locales, no?

I'll have a look later at what it would take to monkey patch the doctype handler in rdflib, but from what I remember, it isn't going to be easy or pretty.
(In reply to Matt Basta from comment #5)
> It would probably be a good idea to file a bug for Firefox/Thunderbird that
> says to recognize when there are <em:(translator|contributor)> elements in
> an install.rdf but not in the <em:localized> elements, to display the root
> translators and contributors in those locales.

Mossop knows this stuff best. Is there a bug filed for this, or a reason the AOM works this way now?
(In reply to Matt Basta from comment #5)
> It would probably be a good idea to file a bug for Firefox/Thunderbird that
> says to recognize when there are <em:(translator|contributor)> elements in
> an install.rdf but not in the <em:localized> elements, to display the root
> translators and contributors in those locales.

Good idea - see bug 416350 ;)
I proposed a fix in bug 562290 but didn't have time to push it through.
(In reply to Jorge Villalobos [:jorgev] from comment #6)
> (In reply to Matt Basta from comment #5)
> > It would probably be a good idea to file a bug for Firefox/Thunderbird that
> > says to recognize when there are <em:(translator|contributor)> elements in
> > an install.rdf but not in the <em:localized> elements, to display the root
> > translators and contributors in those locales.
> 
> Mossop knows this stuff best. Is there a bug filed for this, or a reason the
> AOM works this way now?

Yeah this stuff is probably not the way I would have implemented it if I were to do so today. What I can say is that doctypes referencing chrome urls wouldn't work anyway because the chrome wouldn't be registered for the add-on at that point, and I'm not sure any other urls are possible for doctypes so I think this restricts it to the kind of inline replacement that Wladimir is doing here so I'm not sure there is any issue (beyond potentially making the rdf harder to read) with allowing this
Yes, DOCTYPEs in Gecko are restricted to chrome:// URLs (bug 22942). The extension isn't limited to its own chrome:// namespace of course so catching it is good - this is a potential source of strange errors. But rdflib will throw an exception in this case anyway (unknown url type: 'chrome'), it merely needs to be presented to the user properly.

The main (only?) issue seems to be that rdflib will actually download doctypes from HTTP - not something you would want to do on a server.
(In reply to Matt Basta from comment #5)
> I'll have a look later at what it would take to monkey patch the doctype
> handler in rdflib, but from what I remember, it isn't going to be easy or
> pretty.

This seems to do the job:

from rdflib.plugins.parsers import rdfxml

orig_create_parser = rdfxml.create_parser
def new_create_parser(target, store):
  parser = orig_create_parser(target, store)
  parser.setEntityResolver(None)
  return parser
rdfxml.create_parser = new_create_parser

The function rdfxml.create_parser() creates and sets up a SAX parser. Clearing the entity resolver on that parser makes sure that an exception is thrown for external entity references. Inline doctypes are still processed correctly.
We don't use rdfxml for the parsing, we use rdflib.Graph.

https://github.com/mattbasta/amo-validator/blob/master/validator/rdf.py
(In reply to Matt Basta from comment #11)
> We don't use rdfxml for the parsing, we use rdflib.Graph.

You do - indirectly. rdflib.graph.Graph.parse() calls rdflib.plugin.get("xml", Parser)(). And if you look at the calls at the bottom of rdflib.plugin, the registered parser for the XML format is rdflib.plugins.parsers.rdfxml.RDFXMLParser. Obviously, to change the parsing behavior you have to patch up (or replace) the parser, not the Graph class that is merely storing the data.
Just in case: comment 10 is a minimal code example. A better course of action would be registering an EntityResolver instance that will throw a custom exception if its resolveEntity method is invoked. This way you can recognize the exception and display an appropriate error message to the user ("DOCTYPE referencing an external entity isn't allowed in install.rdf") without having to use regular expressions.
Any progress on this? This issue is blocking the next Element Hiding Helper release, should I wait or look for workarounds?
Matt, is this something that can be fixed soon?
Assignee: nobody → mattbasta
I'll work on this as soon as I can get bug 729316 and bug 717536 knocked out.
So, to give Wladimir an estimate, this will take at least a week to be resolved, probably longer. We want this fixed, but if you need to push the update sooner I would recommend looking for workarounds.
Merged:

https://github.com/mozilla/amo-validator/pull/158
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Blocks: 815269
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.