Closed Bug 522253 Opened 15 years ago Closed 15 years ago

Validate the exported XML against a DTD.

Categories

(Webtools :: ISPDB Server, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0rc1

People

(Reporter: bwinton, Assigned: LouieDinh)

References

Details

Attachments

(1 file, 7 obsolete files)

First we would need to create the dtd, based off of the example in https://wiki.mozilla.org/Thunderbird:Autoconfiguration:ConfigFileFormat
Then we would need to fix our export to match it.

(Take a look at the differences between http://svn.mozilla.org/mozillamessaging.com/sites/autoconfig.mozillamessaging.com/trunk/mozilla.com%20disabled and http://ispdb.mozillamessaging.com/export_xml/17 for examples of what needs fixing.)
Flags: blocking-thunderbird3?
I'll give this one a shot.
Assignee: nobody → LouieDinh
ispdb as a whole isn't a release blocker for TB3 (we can hand-edit XML files)
Flags: blocking-thunderbird3? → blocking-thunderbird3-
Blake, I'm a bit confused about your comment. Tell me if I'm understanding correctly.

Current output: 
http://svn.mozilla.org/mozillamessaging.com/sites/autoconfig.mozillamessaging.com

Desired output:
http://svn.mozilla.org/mozillamessaging.com/sites/autoconfig.mozillamessaging.com/trunk/mozilla.com%20disabled

The Thunderbird:Autoconfig link is rather confusing because there are so many comments + some fields seem optional.
Attached patch A WIP patch (obsolete) — Splinter Review
This is a WIP patch containing the following:
1. Relaxng schema for exported XML file
2. Proof of Concept validation of a working (mozilla_expected.xml) and current xml (mozilla_exported.xml). To run the POC just run a "python validate.py" in ./tests/
3. New unit test that validates the exported XML against the relaxng schema. 

Please comment on the following:
Directory layout - Is it ok to just have the relaxng.xml (temporary name) inside the tests directory? Will we want a more permanent spot?
Schema - Is the schema accurate? The information on the mozilla wiki is slightly confusing

Once I know the schema is accurate, I will start modifying the export_xml function to match. 

Note - I modified <username> to <usernameForm> in the same xml file (detailed at: http://svn.mozilla.org/mozillamessaging.com/sites/autoconfig.mozillamessaging.com/trunk/mozilla.com%20disabled) to match the export because it makes more sense.
There were a couple issues with the models (i.e missing fields) that I noted in the code. I've hacked around them for now until we come up with a long term solution.
Attachment #407472 - Attachment is obsolete: true
Attachment #407952 - Flags: review?(bwinton)
Blocks: 526475
Please take a look at the problems mentioned in bug 526475
The XML was specced by me. Sorry when it was unclear. I'll fix the definition and add some description, when I have time. If there are things unclear, please just ask.
Modified previous patch a little because relaxng schema wasn't quite accurate.
Attachment #407952 - Attachment is obsolete: true
Attachment #410733 - Flags: review?(bwinton)
Attachment #407952 - Flags: review?(bwinton)
Comment on attachment 410733 [details] [diff] [review]
Fixed previous patch - Now prints out all domains

>Index: tests/relaxng_schema.xml

We should really get BenB to review this, too, since he wrote https://wiki.mozilla.org/Thunderbird:Autoconfiguration:ConfigFileFormat

>+   <element name="emailProvider">
>+        <attribute name="id">
>+            <text />
>+        </attribute>

I think there’s a way to specify that more explicitly.  Something like:
        <attribute name="id">
            <data type="xsd:ID" />
        </attribute>
which should cause an XML processor to make sure that there’s only element with that id in the file.

(I’m using http://books.xmlschemata.org/relaxng/page2.html to figure this stuff out.)

>+            <element name="port">
>+                <text />
>+            </element>

Similarly, how about:
           <element name="port">
               <data type="xsd:unsignedInt" />
           </element>

>+             <element name="socketType">
>+                <choice>
>+                    <value>None</value>
>+                    <value>SSL</value>
>+                    <value>TLS</value>
>+                    <value></value>
>+                </choice>

What would the blank choice mean?

>+             <element name="authentication">
>+                <choice>
>+                    <value></value>
>+                    <value>plain</value>
>+                </choice> 

I think you’re missing "<value>secure</value>".

>+            </element>
>+        </element>
>+
>+        <element name="outgoingServer">
[snip…]
>+            <element name="hostname">
>+                <text />
>+            </element>
>+            <element name="port">
>+                <text />
>+            </element>
>+             <element name="socketType">
>+                <choice>
>+                    <value></value>
>+                    <value>None</value>
>+                    <value>TLS</value>
>+                    <value>SSL</value>
>+                </choice>
>+            </element>
>+            <element name="usernameForm">
>+                <text />
>+            </element>

We should be able to share these with the incoming server’s definitions.
Check out http://books.xmlschemata.org/relaxng/relax-CHP-5.html

>+            <element name="authentication">
>+                <choice>
>+                    <value>plain</value>
>+                    <value></value>
>+                </choice>

This one is missing "secure" and "smtp-after-pop".

That’s probably enough to get us started.

A good test for how good your schema is would be to run it against the files in http://svn.mozilla.org/mozillamessaging.com/sites/autoconfig.mozillamessaging.com/trunk/ and see if they all pass.  (If they don’t, it’s probably your schema that’s wrong, not the files.  ;)

>Index: tests/test_xml.py
> from django.core.urlresolvers import reverse
> from django.test.client import Client
> import nose.tools
>-import re
>+import re,os

One import per line, please.

>+from StringIO import StringIO

We should order these, as per http://code.basieproject.org/trunk/STYLE
(Specifically, "Import one module per line.  Group them by stdlib, django, 3rd party, local.".)

>+from lxml import etree

That doesn’t work for me:

(ispdb)Fin:ispdb bwinton$ python
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named lxml
>>> from xml import etree

(Python 2.5 also has xml.etree.  I don’t know about Python 2.4, but I also don’t know if we support it.)

I suppose you could add "install lxml" to the list of instructions, but if Python already has an etree, we should probably just use it.

> class XMLTest(TestCase):
>-    def test_export_xml(self):
>-        reference_string = """<?xml version="1.0" encoding="UTF-8"?>
>-        <clientConfig>
>+    def test_validated_export_xml(self):
>         domain = models.Domain.objects.get(name="test.com")
>+        #nose.tools.set_trace()

You can just remove this line entirely.  ;)

>         config_xml = domain.config.as_xml()
>-        config_xml = re.sub(r'\s','',config_xml)
>-        assert config_xml == reference_string
>+        config_xml = StringIO(config_xml) 
>+        #skip over the first line because etree parser doesn't like utf encoding
>+        config_xml.readline() 

It looks like there’s a trailing space on that line of code.  Also, it would shock me if the etree XML parser didn’t handle an XML declaration as the first line.  Perhaps there’s a different error there that we can handle in a cleaner way?

>+        doc = etree.parse(config_xml)
>+        xml_schema = etree.RelaxNG(file=os.path.join(os.getcwd(), 
>+                                                     'tests/relaxng_schema.xml'))
>+        xml_schema.assertValid(doc)

So, we used to test that the returned xml had the same content, but now we just test that the returned xml has the right format.  It would be good to have a test that made sure that the right values appeared in the right places.

And while you’re doing that, please use etree to get the values.  ;)

>Index: config/models.py
>@@ -46,14 +46,27 @@
>     """
>     Return the configuration using the XML document that Thunderbird is expecting.
>     """
>+    desiredOutput = {'outgoing_use_global_preferred_server': 1,
>+    'incoming_socket_type': 1, 'outgoing_port': 1,
>+    'incoming_authentication': 1, 'display_name': 1, 
>+    'outgoing_add_this_server': 1, 'outgoing_socket_type': 1,
>+    'outgoing_authentication': 1, 'incoming_hostname': 1, 
>+    'outgoing_username_form': 1, 'display_short_name': 1,'incoming_port': 1,
>+    'incoming_username_form': 1, 'outgoing_hostname': 1,
>+    'incoming_type': 1}
>     indent = '  '

Would you mind using "s for those, instead of 's?

>     indentLevel = 1
>     lines = ['<?xml version="1.0" encoding="UTF-8"?>',
>              '<clientConfig>',

It has often been said that if you’re writing XML by hand, you’re doing it wrong.
Now that we’re importing etree, what do you think about using its api to generate the XML for you, so that you don’t mess up when someone enters an email_provider_id of “">delete your data”.  ;)

Later,
Blake.
Attachment #410733 - Flags: review?(bwinton) → review-
Blocks: 526466
Blocks: 526471
Blocks: 526559
What should I do with the field usernameForm?

In the xml spec it lists the field as <username> but the model field is called <usernameForm>. Should I just add a special case that replaces usernameform with username? How should I go about resolving this?
Attachment #410733 - Attachment is obsolete: true
Attachment #414458 - Flags: review?(bwinton)
Attachment #414458 - Attachment is patch: true
Attachment #414458 - Attachment mime type: application/octet-stream → text/plain
Well, I believe Thunderbird is using "username", so we'll probably have to go with that.

One way to do it would be to have a dictionary that maps keys to values, like this:

namesToTagnames = {
  "usernameForm" : "username",
}

def constructXMLTag(name):
  if '_' in name:
    firstword, others = name.split('_', 1)
    words = others.split('_')
    xmlfield = firstword + ''.join([word.capitalize() for word in words])
  else:
    xmlfield = name
  return namesToTagnames.get(xmlfield, xmlfield)

(The .get on a dictionary will get the value associated with the first argument, but if it's not in the dictionary, it will return the second argument.)
Attachment #414458 - Flags: review?(bwinton) → review-
Comment on attachment 414458 [details] [diff] [review]
Rewrote export_xml to use etree and fixed up according to comments

>Index: README
>+    lxml

Nice.  :)

>Index: tests/relaxng_schema.xml
>@@ -0,0 +1,104 @@
>+<?xml version="1.0" encoding="utf-8"?>
>+<grammar xmlns="http://relaxng.org/ns/structure/1.0" 

There's some whitespace at the ends of some lines in this file.

>+<define name="userform-Content">
>+    <element name="usernameForm">

As mentioned before, I think this needs to be "username".

>Index: tests/test_xml.py

There's some trailing whitespace in this file, too.

>@@ -1,57 +1,63 @@
>+    def test_validated_export_xml(self):
>+        domain = models.Domain.objects.get(name="test.com")
>+        config_xml = domain.config.as_xml()
>+        # decodes config xml as a normal string. 
>+        # etree parser will handle the encoding declaration later.
>+        config_xml = config_xml.encode('utf-8')
>+        config_xml = StringIO(config_xml) 
>+        doc = etree.parse(config_xml)

You could change those 5 lines to:
    doc = etree.XML(config_xml)

>+        #gets incoming/outgoing server element root
>+        root = doc.getiterator();
>+        incoming = doc.find("emailProvider/incomingServer")
>+        outgoing = doc.find("emailProvider/outgoingServer")
>+        for element in root:
>+            if element.tag in expected:
>+                if type(expected[element.tag]) == list:
>+                    assert element.text in expected[element.tag]
>+                else:
>+                    assert element.text == expected[element.tag]
>+        for element in incoming:
>+            assert element.text == expected_incoming[element.tag]
>+        for element in outgoing:
>+            assert element.text == expected_outgoing[element.tag]

If you change the "import nose.tools" to "from nose.tools import *", you could write this as:
---------
        #gets incoming/outgoing server element root
        root = doc.getiterator()
        for element in root:
            if element.tag in expected:
                if type(expected[element.tag]) == list:
                    assert element.text in expected[element.tag]
                else:
                    assert_equal(element.text, expected[element.tag])
        for element in doc.find("emailProvider/incomingServer"):
            assert_equal(element.text, expected_incoming[element.tag])
        for element in doc.find("emailProvider/outgoingServer"):
            assert_equal(element.text, expected_outgoing[element.tag])
---------
which I think is a little cleaner.

>Index: config/models.py
>@@ -10,9 +12,9 @@
>+    ("rejected", "domain can\"t be accepted"),

I think this should be:
    ("rejected", "domain can't be accepted"),
:)

>@@ -46,56 +56,41 @@
>     """
>     Return the configuration using the XML document that Thunderbird is expecting.
>     """
>-    indent = '  '
>-    indentLevel = 1
>-    lines = ['<?xml version="1.0" encoding="UTF-8"?>',
>-             '<clientConfig>',
>-             indent * indentLevel + '<emailProvider id=\"' + self.email_provider_id +'\">']
>-    in_incoming = False
>-    in_outgoing = False
>+    desiredOutput = {'outgoing_use_global_preferred_server': 1,
>+    'incoming_socket_type': 1, 'outgoing_port': 1,
>+    'incoming_authentication': 1, 'display_name': 1, 
>+    'outgoing_add_this_server': 1, 'outgoing_socket_type': 1,
>+    'outgoing_authentication': 1, 'incoming_hostname': 1, 
>+    'outgoing_username_form': 1, 'display_short_name': 1,'incoming_port': 1,
>+    'incoming_username_form': 1, 'outgoing_hostname': 1 }

Hmm…  We already have this list of fields.  I wonder if we could have the values be the keys, so it would be more like:
    desiredOutput = {
      "display_name": "displayName",
      "display_short_name": "displayShortName",
      "incoming_hostname": "hostname",
      "incoming_port": "port",
      "incoming_socket_type": "socketType",
      "incoming_username_form": "username",
      "incoming_authentication": "authentication",
      "outgoing_hostname": "hostname",
      "outgoing_port": "port",
      "outgoing_socket_type": "socketType",
      "outgoing_username_form": "username",
      "outgoing_authentication": "authentication",
      "outgoing_add_this_server": "addThisServer",
      "outgoing_use_global_preferred_server": "useGlobalPreferredServer",
    }

Then, instead of having the constructXMLTag, you could say:
name = desiredOutput[name]

(Also, see how I ordered those in the same order as they are in the schema? :)

>+    config = ET.Element("clientConfig")
>+    emailProvider = ET.SubElement(config,"emailProvider")
>+    emailProvider.attrib["id"] = self.email_provider_id
>+    for domain in Domain.objects.filter(config=self):
>+      d = ET.SubElement(emailProvider,"domain")
>+      d.text  = domain.name

There's an extra space between the "d.text" and the "=", but that kind of leads me to ask why not write:
    for domain in Domain.objects.filter(config=self):
      ET.SubElement(emailProvider,"domain").text  = domain.name
instead?

>+    return ET.tostring(config)

I would prefer something more like:
----
from StringIO import StringIO
retval = StringIO("w")
xml = ET.ElementTree(config)
xml.write(x, "UTF-8")
return retval.getvalue()
----
so that it writes out the <xml> header for you.

I think that's probably going to be the last set of changes I ask for, so if you can get a new patch up with them, I'll give it a quick r+, and we can ask DavidA for the second review.

(As a side note, I've got a patch for bug 530972 which is waiting on these changes, so the sooner you can do them, the happier I'll be.  :)

Thanks,
Blake.
Blocks: 530972
Attached patch New Patch (obsolete) — Splinter Review
I ran my relaxng schema against the valid xml files in autoconfig and I ran into a few problems.
The elements:<pop3>,<enableURL> are not included in my schema right now. Should they be?

SocketType choices should be "plain" according to XML files but the choice is "None" in the ispdb model. Should this be filed as a bug?
Attachment #414458 - Attachment is obsolete: true
Attachment #414698 - Flags: review?(bwinton)
> The elements:<pop3>,<enableURL> are not included in my schema right now.
> Should they be?

Yes. Everything that is on the example XML or implied by it should be in the schema.
> SocketType choices should be "plain" according to XML files but the choice is
> "None" in the ispdb model. Should this be filed as a bug?

Yes. The example XML on the wiki [1] is the spec. If the current implementation differs, the implementation should be fixed in all cases, no need to ask.

I added new authentication method values to the spec [1]. Please use "password-clear" and "password-encrypted" where possible.

[1] https://wiki.mozilla.org/Thunderbird:Autoconfiguration:ConfigFileFormat#XML
(In reply to comment #13)
> > The elements:<pop3>,<enableURL> are not included in my schema right now.
> > Should they be?
> Yes. Everything that is on the example XML or implied by it should be in the
> schema.

I agree, although that wasn't quite the question Louie asked.

Louie, the files from autoconfig are what Thunderbird is currently using, so the files you generate need to look the same as those, and the schema should enforce that.  In fact, a great test of your code would be to import the autoconfig files and generate new files from that data.  Anywhere they're different is a probably bug in your code.  ;)

(In reply to comment #14)
> > SocketType choices should be "plain" according to XML files but the choice
> > is "None" in the ispdb model. Should this be filed as a bug?
> I added new authentication method values to the spec [1]. Please use
> "password-clear" and "password-encrypted" where possible.

Wait up a second there, Ben.  Thunderbird's implementation is what I'm _actually_ interested in this conforming to.  Does the autoconfig code in TB3 RC1 handle those new values?  Cause if not, I recommend we remove them from the spec, at least until 3.0 final is released.

(I'll dig into this a little more when I get to the office, and post my results.)

Thanks,
Blake.
Status: NEW → ASSIGNED
> Does the autoconfig code in TB3 RC1 handle those new values?
> Cause if not, I recommend we remove them from the
> spec, at least until 3.0 final is released.

No, TB 3.0 RC1 will indeed bail with these new values (my fault of not writing it future-proof).
Good point, the exporter should for now return "plain" and "secure" as authentication mechanism, not "password-clear" and "password-encrypted" as I said.

I filed bug 531267 on making 3.0 more liberal in what it accepts.

However, after bug 525238, these values will be accepted in (hopefully) 3.1.

I'll make it explicit in the spec what 3.0 accepts. Does that work for you?
Note: Before we change anything about the schema, we should come up with a versioning scheme, and implement in TB3.x.
(In reply to comment #16)
> I filed bug 531267 on making 3.0 more liberal in what it accepts.
> 
> However, after bug 525238, these values will be accepted in (hopefully) 3.1.
> 
> I'll make it explicit in the spec what 3.0 accepts. Does that work for you?

Yeah, that sounds like a great idea.

I don't know yet if that will let us change the values, or if we will need to add them in to a new element.  (If we change the values and someone gets a copy of TB 3.0 from somewhere, will it fail to create a new account first, or will it update itself to a more recent version first?)  Perhaps we could also change 3.1 to include a schema version somewhere in its request, so that we can tell what format of data to return.  (i.e. /gmail.com gets you the 3.0 version, but /gmail.com.3.1 gets you the 3.1 version…  Better than that might be for /3.1/gmail.com to give you the 3.1 version.)

Thanks,
Blake.
Attachment #414698 - Attachment is patch: true
Attachment #414698 - Attachment mime type: application/octet-stream → text/plain
Right, that's why somewhere I mentioned that we probably want to version the query itself.

autoconfig/gmail.com?version=1.1

or whatever.
Note that we fetch by HTTP and with that already send the user-agent.
If this ends up being common, of course a version of the format would be better, also to allow other clients.
However, my goal is to avoid versioning entirely and have the format and the client backwards- and forward-compatible.
Note that I have thoughts on that in bug 531267.
Since we're already not forwards-compatible, I think I'll do the versioning changes in bug 530972, since I have more time remaining and more time per day to work on this than Louie.  ;)
I've added a new boolean pop3 server setting - downloadOnBiff
Attached patch patch v2 (obsolete) — Splinter Review
relaxng schema now passes against all current autoconfig data.
Attachment #414698 - Attachment is obsolete: true
Attachment #415578 - Flags: review?(bwinton)
Attachment #414698 - Flags: review?(bwinton)
Attachment #415578 - Attachment is patch: true
Attachment #415578 - Attachment mime type: application/octet-stream → text/plain
Attachment #415578 - Flags: review?(bwinton) → review+
Comment on attachment 415578 [details] [diff] [review]
patch v2

You've got a mis-indented "</optional>", but other than that, I like it.

Who are you going to pick for your second review?

Later,
Blake.
Attached patch Patch v3 (obsolete) — Splinter Review
Fixed an indent
Attachment #415578 - Attachment is obsolete: true
Attachment #416045 - Flags: review?(david.ascher)
Comment on attachment 416045 [details] [diff] [review]
Patch v3

BTW, Louie, when you post a patch, you need to click the "patch" checkbox, or the reviewers don't get to see the diff in all its glory.
Attachment #416045 - Attachment is patch: true
Attachment #416045 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 416045 [details] [diff] [review]
Patch v3

>Index: tests/relaxng_schema.xml
>===================================================================
>--- tests/relaxng_schema.xml	(revision 0)
>+++ tests/relaxng_schema.xml	(revision 0)

this is going to be great.  I think we'll probably want to make this schema available somewhere more prominent.  I think it should become the authoritative spec for the format, rather than just a test support file.  But that can happen later.

(In particular, it would make sense to publish it in the app somewhere).




>Index: config/models.py

> #admin.site.register(Domain)

should that line be deleted?

>+        if not 'incoming' in locals():
>+            incoming = ET.SubElement(emailProvider,"incomingServer")
>+            incoming.attrib["type"] = self.incoming_type


rather than play (cute) tricks w/ locals(), I suggest setting incoming=None at the top, and then testing for "if incoming is not None:"



>+        if not 'outgoing' in locals():
>+            outgoing = ET.SubElement(emailProvider,"outgoingServer")
>+            outgoing.attrib["type"] = "smtp"

idem.


looks great, thanks.

r=davida
Attachment #416045 - Flags: review?(david.ascher) → review+
> I think we'll probably want to make this schema available somewhere
> more prominent. I think it should become the authoritative spec for the format

Yup, agreed.
I need to learn RelaxNG (although it seems easy and great), but I'll take a look at it when I get into the config fetch from ISP, and double-check it, and then put it on the wiki [1] as official spec, including the value documentation that's currently in the sample XML file there. I'll keep the sample file for easy read, but the RelaxNG should have the exhaustive spec.
Changed as per David's suggests. Ready for check in.
Attachment #416045 - Attachment is obsolete: true
Keywords: checkin-needed
Checked in as http://viewvc.svn.mozilla.org/vc?view=revision&revision=57387

Gozer: Note the new dependency in this one.

Louie: I'm eager to finally see your patch for bug 521173!  :)

Later,
Blake.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Component: ispdb → ISPDB Server
Product: Mozilla Messaging → Webtools
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: