Closed Bug 557401 Opened 14 years ago Closed 13 years ago

Script to convert v1.1 config files to v1.0, and to copy secondary domains to primary ones.

Categories

(Webtools :: ISPDB Server, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: BenB, Assigned: misager)

References

Details

Attachments

(2 files, 7 obsolete files)

We currently have all files as copies for master domain and all listed secondary domains, so that each change needs to be copied to all secondary domains.
We also have different SVN dirs for v1.0 and v1.1 files, so that all changes to existing configs, and newly added configs, have to be done twice, manually. That's not a situation that can hold for long, it's already annoying me.

With a script, we could maintain only the master domain in SVN and have the script create the secondary domains (by just cp, like now). And we'd maintain only v1.1, v1.0 would be generated.

Sp, we need a script that
- reads v1.1 config files
- writes v1.0 config files
- copies v1.1 config files to the secondary domains,

Some ISPDB code may be of use, or maybe it's faster to start from scratch.

gozer, which languages can run on the server? Just Python? I guess running a Mozilla xpcshell is out of question?
Blocks: 544644
If you think it would be useful I could have a go at creating this script.
Mogens, yes, that'd be great! We are currently all overloaded with the upcoming 3.1 feature freeze.
What language were you thinking of doing it in, Mogens?
(I'm pretty excited to see this implemented as well!)
Ben suggested Python and unless I hear complaints I will go with that. It has some quite nice XML handling out of the box.

Do you have any preferences to how the "interface" of the script should be?

Btw, looking at bug 556729 I think it would make sense to do many of the updates using a similar script (at least points 1, 2, 3, and 5 - 4 would be more tricky).
The spec for the XML is at <https://wiki.mozilla.org/Thunderbird:Autoconfiguration:ConfigFileFormat#XML>.

Yes, bug 556729 is a good hint which - in initial description and the patch - shows what changed between v1.1 concretely.
Note that I have also done other cleanup in the same patch there. I'd have to untangle both again (and probably still have conflicts), if we want to do v1.1 -> v1.0 with a script.
I like Python.  Javascript would have been another reasonable choice.

For the script, I usually follow Guido's conventions (listed at <http://www.artima.com/weblogs/viewpost.jsp?thread=4829>).

And yeah, having some way to convert 1.0->1.1 configs seems like a good idea.  :)
Attached file First shot at config utility (obsolete) —
Here comes a first shot at the script. Let me know what you think.

Usage:

Convert a single file from v1.0 to v1.1:
python tb-conf-conv.py -v 1.1 config-v1.0.xml > config-v1.1.xml

Convert a bunch of files from v1.1 to v1.0 and use directory 'out' as output directory for all <domain>s:
python tb-conf-conv.py -v 1.0 -d out *.xml

Write domain files for a bunch of files to directory 'live':
python tb-conf-conv.py -d live *.xml
Some comments:

1) I think that if you don't specify a version, it should convert 1.1 to 1.0.  (Basically, our sysadmin is going to have to run this every now and then, and it would be nice for him to be able to put "./tb-conf-conv.py src-dir dest-dir" in a cron job somewhere. and have it all work.  ;)

2) There are some lines which have trailing spaces which we should delete.

3) Please, please, please use argparse instead of optparse.  It handles help for you, and is way easier for people like me to read.

4) I think I would separate out the reading and writing, so that I can convert from v1.1 to v1.1, or v1.0 to v1.0.  (Yeah, that seems dumb now, but pretend we're five years in the future, and we want to convert between all of v1.0, v1.1, v1.2, v2.0, and v2.1.  If you write readers and writers for each, that's 10 functions.  If you try to write a set of 1.0to1.1, 1.1to1.0, 1.0to1.2, etc… functions, you'll have 20 of them.  And it just gets worse from there.  ;)

4a) We also shouldn't need to specify the input format, cause that'll be specified in the XML itself.

Thanks,
Blake.
(In reply to comment #8)
> 1) I think that if you don't specify a version, it should convert 1.1 to 1.0. 
> (Basically, our sysadmin is going to have to run this every now and then, and
> it would be nice for him to be able to put "./tb-conf-conv.py src-dir dest-dir"
> in a cron job somewhere. and have it all work.  ;)

This was how I did it initially but then what about the use case where you want to write all domain files from a "master" config file?

The sysadmin command you envision would be "./tb-conf-conv.py -v 1.0 -d dest-dir src-dir/*" with the current code.

> 2) There are some lines which have trailing spaces which we should delete.

Fixed

> 3) Please, please, please use argparse instead of optparse.  It handles help
> for you, and is way easier for people like me to read.

Done, I did not know about argparse. It sure is easier to use (and read) but it does add a dependency since it is not part of the Python standard library.

> 4) I think I would separate out the reading and writing, so that I can convert
> from v1.1 to v1.1, or v1.0 to v1.0.  (Yeah, that seems dumb now, but pretend
> we're five years in the future, and we want to convert between all of v1.0,
> v1.1, v1.2, v2.0, and v2.1.  If you write readers and writers for each, that's
> 10 functions.  If you try to write a set of 1.0to1.1, 1.1to1.0, 1.0to1.2, etc…
> functions, you'll have 20 of them.  And it just gets worse from there.  ;)

Hm, since there really isn't an internal common format that could be read into and written from I do not see how this could be feasible. The conversion from one version to another will always be a function of the version you read in.

> 4a) We also shouldn't need to specify the input format, cause that'll be
> specified in the XML itself.

No, definite not
Attachment #437629 - Attachment mime type: text/x-python → text/plain
> since there really isn't an internal common format

You can assume that the newest format is the best and common format.

I agree with bwinton that the internal representation and then read and write functions is the best way to go.
However, in the current case where we only need to modify a few small things, I think it's OK to have special functions. Your code shows that it's very simple and short, and full read/write functions would be *much* more code, so I suggest to leave bwinton's suggestion for a future version. (And maybe ISPDB will be ready by that time and we'll just use that.)
Attached file Conversion utility based on argparse (obsolete) —
Argument parsing is now much cleaner (IMHO). I have tested the script with all the config files in the svn trunk directory.

I have added a new option -a that specifies to write out all domains.

Some usage scenarios:

Convert all files in direcory 'src' from 1.0 to 1.1 and place output files in 'dst':
./tb-conf-conv.py -v 1.1 -d dst src/*

Write domain files for master config files in directory 'svn' to 'live' (no conversion):
./tb-conf-conv.py -d live -a svn/*

Same as before but convert to 1.0 as well:
./tb-conf-conv.py -v 1.0 -d live-10 -a svn/*
Attachment #437629 - Attachment is obsolete: true
Comment on attachment 438748 [details]
Conversion utility based on argparse

Some more comments.

>    # 4. Turn ISP help page URLs in XML comment into <instructionsURL>
>    # This will be tricky

We should probably fill out this step a little more.  ;)

>def convert_11_to_10(doc):
>    # select one of possibly two incomingServer's (IMAP over POP3)

So, I don't think that's right.  While _I_ prefer IMAP over POP3, the ISP has expressed their own preference by listing the incomingServers in a certain order, and we should respect that by selecting the first incomingServer as the default.  (And putting the other one in a comment. ;)

>        if   args.v == "1.1":
>            convert_10_to_11(doc)
>        elif args.v == "1.0":
>            convert_11_to_10(doc)

I still think it would be better if you did that automatically, by seeing what the input format it, and converting it to the other format.

(Also, I'm not a big fan of the if/elif/etc style.  I much prefer using a dictionary of functions.  Like:
conversions = {
  "1.0": convert_11_to_10,
  "1.1": convert_10_to_11,
}

And please don't line up the conditions, it looks really messy.  ;)

>        if args.a:
>            if args.d:

Shouldn't this be "args.dir"?  And I think we should use "args.all", or "args.write_all".  Brevity isn't really a virtue here.

>                write_domains(doc, args.d)
>            else:
>                print 'When you want to write domain files you should also specify an output directory using -d dir'
>                parser.print_usage()
>                exit(2)

Shouldn't we just write them to the current directory?  (I'm thinking of the case where we get a canonical set of configs in v1.1 format, and write out all the other domains (also in v1.1 format).  Yeah, we could force everyone to write "-d .", but why bother?)

>        elif args.d:
>            write_config(doc, args.d +'/'+ os.path.basename(f))
>        else:
>            print_config(doc)
>
>        doc.unlink()

Since we're throwing away the doc, I think we could skip the call to unlink.

Thanks,
Blake.
> ./tb-conf-conv.py -v 1.1 -d dst src/*

I wouldn't know for sure what happens there. How does it know the input version? from the file? implicitly assuming 1.0? no converion, i.e. assuming 1.1? Does it expand all domains or write just the other format?

Personally, I'd prefer explicit --from-version=1.0 --to-version=1.1 (or -f 1.1 -t 1.0). And in case of "no conversion", but write to live, use --expand or something.
(I am also fine with explicit "-d .".)
(In reply to comment #13)
> > ./tb-conf-conv.py -v 1.1 -d dst src/*
> 
> I wouldn't know for sure what happens there. How does it know the input
> version? from the file? implicitly assuming 1.0? no converion, i.e. assuming
> 1.1? Does it expand all domains or write just the other format?
> 
> Personally, I'd prefer explicit --from-version=1.0 --to-version=1.1 (or -f 1.1
> -t 1.0).

The idea is to read the version in the clientConfig tag (I think this was Blakes intention as well). I have not implemented this yet since all the files in svn are 1.0 (no version attribute, although I did notice a few with version="3.0").

> And in case of "no conversion", but write to live, use --expand or something.

I like --expand, I will use that in the next round.
(In reply to comment #12)

> >    # 4. Turn ISP help page URLs in XML comment into <instructionsURL>
> >    # This will be tricky
> We should probably fill out this step a little more.  ;)

Yeah, but how? I have some code that detects URLs inside xml comments. The problem is that in some cases there are more URLs and the accompanying description might be several lines.

> >    # select one of possibly two incomingServer's (IMAP over POP3)
> So, I don't think that's right.  While _I_ prefer IMAP over POP3, the ISP has
> expressed their own preference by listing the incomingServers in a certain
> order, and we should respect that by selecting the first incomingServer as the
> default.  (And putting the other one in a comment. ;)

Hm, I seem to recall reading somewhere that IMAP should be preferred, but I can't find it again. I have no problem with just selecting the first one but are we sure that the order is deliberate in all files?

> I still think it would be better if you did that automatically, by seeing what
> the input format it, and converting it to the other format.

I plan on implementing that at some point but for now it is not really needed.

> >        if args.a:
> Shouldn't this be "args.dir"?  And I think we should use "args.all", or
> "args.write_all".  Brevity isn't really a virtue here.

Right, I will add some longer option names as well.

> Shouldn't we just write them to the current directory?  (I'm thinking of the
> case where we get a canonical set of configs in v1.1 format, and write out all
> the other domains (also in v1.1 format).  Yeah, we could force everyone to
> write "-d .", but why bother?)

First of all I think it is nice to have a rather explicit way of triggering the 
writing of all those files. Secondly you mentioned yourself that the admin might want to put this in a cron job and in that case the current directory is most likely not what you want. 

I am thinking of a use model where you have your SVN managed files in one directory and from there generate the live version 1.0 and 1.1 files in the web server directories.

> >        doc.unlink()
> Since we're throwing away the doc, I think we could skip the call to unlink.

The minidom documentation mentions that is the good practice to unlink the DOM before dropping the reference.
> >    # 4. Turn ISP help page URLs in XML comment into <instructionsURL>
> >    # This will be tricky
> We should probably fill out this step a little more.  ;)

No need for that!
I have already converted that manually in bug 556729.

The v1.1 -> v1.0 conversion is what's important.

> just selecting the first one

Yes, that's the correct way. By definition, the first choice is the most preferred one.

> but are we sure that the order is deliberate in all files?

Yes, it is.

> I have not implemented this yet since all the files in svn are 1.0
> (no version attribute, although I did notice a few with version="3.0").

They will be in v1.1 very soon (bug 556729, gozer already reviewed, just waiting for bwinton's review). Once your script is ready, we can then remove the v1.0 files from SVN. We'll then generate the v1.0 and v1.1 files for deployment from the v1.1 files in SVN.
Bug 546278 reminded me that we also need to expand variables in hostnames when converting to v1.0, because Thunderbird 3.0 won't.

(Also, I've taken the liberty of assigning this bug to you, Mogens, because you seem to be doing all of the work on it.  ;)

Thanks,
Blake.
Assignee: nobody → misager
Status: NEW → ASSIGNED
See Also: → 546278
(In reply to comment #17)
> Bug 546278 reminded me that we also need to expand variables in hostnames when
> converting to v1.0, because Thunderbird 3.0 won't.

Just to be clear does this means that %EMAILDOMAIN% should be replaced by the hostname in <domain>?
Yup, and we'll have to do it separately for each domain we write out.
A few comments from giving this script a go:
 - needs a --verbose option
 - needs to be smart enough to only regenerate what needs generating

I tried using it to convert all v1.1 configs we have to v1.0 and the process seemed stuck in some infinite loop.


$> svn co http://svn.mozilla.org/mozillamessaging.com/sites/autoconfig.mozillamessaging.com/trunk autoconfig
$> cd autoconfig
$> mkdir v1.0
$> python convert.py -a -v 1.0 -d v1.0 v1.1/*
> the process seemed stuck in some infinite loop.

I think it's not an infinite loop, just iterating over all the files including the alternative domains. E.g. biglobe.ne.jp has 1400 domains - they are listed in the config file, and they currently all exist as files in SVN as well. If you run the script over them, it will generate 1400*1400 files. Not infinite, but still crazy.

I sent you a list of main files. In your local checkout, delete all files but those. Then run the script. Then check that it generates all files that you had before the deletion. (If not, my main files list may not be complete.)
(BenB and Gozer are happy with the script, but thought that you might want to glance over it before I commit it.)

Thanks,
Blake.
Attachment #438748 - Attachment is obsolete: true
Attachment #443722 - Flags: feedback?(david.ascher)
Attachment #443722 - Attachment mime type: text/x-python → text/plain
Comment on attachment 443722 [details]
The previous script, with some additions requested by Gozer.


>    if a:
>        if a[0].childNodes[0].wholeText.find('true') != -1:
>            #a[0].parentNode.removeChild(a[0])

get rid of that?

>            a[0].parentNode.replaceChild(doc.createComment(a[0].toxml()), a[0])
>
>    a = doc.getElementsByTagName('useGlobalPreferredServer')
>    if a:
>        if a[0].childNodes[0].wholeText.find('false') != -1:
>            #a[0].parentNode.removeChild(a[0])

get rid of that?

>            a[0].parentNode.replaceChild(doc.createComment(a[0].toxml()), a[0])
>
>
>    # 4. Turn ISP help page URLs in XML comment into <instructionsURL>
>
>    # This will be tricky

why?


>
>
>def convert_11_to_10(doc):
>    if doc.getElementsByTagName('clientConfig')[0].getAttribute('version'):
>        doc.getElementsByTagName('clientConfig')[0].removeAttribute('version')
>
>    # select one of possibly two incomingServer's (IMAP over POP3)
>    hasImap = False
>    for a in doc.getElementsByTagName('incomingServer'):
>        if a.getAttribute('type').find('imap') != -1:
>            hasImap = True
>            break
>
>    if hasImap:
>        for a in doc.getElementsByTagName('incomingServer'):
>            if a.getAttribute('type').find('pop') != -1:
>                a.parentNode.replaceChild(doc.createComment(a.toxml()), a)
>                break

That's a weird one.  Why not put the second for loop inside the if block of the first loop, and end it with a return?

>
>        if   args.v == "1.1":
>            convert_10_to_11(doc)
>        elif args.v == "1.0":
>            convert_11_to_10(doc)

get rid of the extra space after if.  cf. guido indentation rules, PEP 8.
Attachment #443722 - Flags: feedback?(david.ascher) → feedback+
(In reply to comment #23)
> >    # 4. Turn ISP help page URLs in XML comment into <instructionsURL>
> >
> >    # This will be tricky

Just remove that entirely. I already converted them manually from v1.0 to v1.1.
So, the v1.0 -> v1.1 conversion is not really needed and there just for kicks or "just in case".

Similarly, the v1.1 -> v1.0 conversion is only for serving TB 3.0 clients.

> >def convert_11_to_10(doc):
> >    if doc.getElementsByTagName('clientConfig')[0].getAttribute('version'):
> >        doc.getElementsByTagName('clientConfig')[0].removeAttribute('version')
> >
> >    # select one of possibly two incomingServer's (IMAP over POP3)
> >    hasImap = False
> >    for a in doc.getElementsByTagName('incomingServer'):
> >        if a.getAttribute('type').find('imap') != -1:
> >            hasImap = True
> >            break
> >
> >    if hasImap:
> >        for a in doc.getElementsByTagName('incomingServer'):
> >            if a.getAttribute('type').find('pop') != -1:
> >                a.parentNode.replaceChild(doc.createComment(a.toxml()), a)
> >                break
> 
> That's a weird one.  Why not put the second for loop inside the if block of the
> first loop, and end it with a return?

You don't even need to write out the POP3 config as comment. Just remove it.

In fact, you should not care about IMAP or POP3. Just remove all but the *first* <incomingServer>.
Okay, I think this should be the final version, but we probably want BenB to double-check the results of the conversion before we start using it.

Ben, the converted files are in dropbox, as v1.0.bz2 and v1.1.bz2.

Thanks,
Blake.
bwinton, please attach the new version of your script, so that I can say how great it is. I just love it, the code is nice and very concise (< 100 lines) and clean. That's how code *should* look like. Python++, bwinton++
I switched to ElementTree, because it's pretty-printing was much more sane than minidom's.

And cleaned up the v1.1->v1.0 conversion to Ben's satisfaction, while I was there.

Gozer, if you like this version, feel free to check it in to wherever you feel it should live.

Thanks,
Blake.
Attachment #443722 - Attachment is obsolete: true
Attachment #443765 - Attachment is obsolete: true
*Sigh*.  Or the pep8-compliant version.

(What is it about the word "final" that causes there to be one more patch coming?)

Later,
Blake.
Attachment #444009 - Attachment is obsolete: true
(In reply to comment #27)
> Created an attachment (id=444009) [details]
> Dare I say, the final version of the code?
> 
> I switched to ElementTree, because it's pretty-printing was much more sane than
> minidom's.
> 
> And cleaned up the v1.1->v1.0 conversion to Ben's satisfaction, while I was
> there.

Excellent!

> Gozer, if you like this version, feel free to check it in to wherever you feel
> it should live.

One older version is already in SVN, so can you just update:

[svn.mozilla.org]/mozillamessaging.com/sites/ispdb.mozillamessaging.com/trunk/tools/convert.py
Attachment #444010 - Attachment description: No, this one is _really_ the final version. ;) → [Checked-in] No, this one is _really_ the final version. ;)
Thank you Blake for finishing this up. I am sorry I have been extremely busy at work and have not had time to look at this.

I really like the ElementTree stuff - I was missing somthing like the find() method in minidom.

From what I can wee the functionality mentioned in comment 17 has not yet been implemented. Is this still required?
My pleasure.

You're right, that hasn't been implemented yet.  It probably should, though.

Did you want to whip up a patch that did a string replace before writing out the file, and I'll review it?

Thanks,
Blake.
It is not the cleanest solution but it works.

I also fixed a few bugs:
- printing config when neither -a or -d is specified was broken after
  switch to ElementTree
- the call to write_config when -a was not specified should now have a
  string as the first argument instead for the doc
Attachment #444095 - Flags: review?(bwinton)
Comment on attachment 444095 [details] [diff] [review]
Implement replacement of %EMAILDOMAIN% on write of version 1.0 config

>+++ convert.py	(working copy)
>@@ -38,7 +38,8 @@
> def print_config(doc):
>-    print doc.toxml(encoding="UTF-8")
>+    outData = ET.tostring(doc.getroot(), encoding="UTF-8") + "\n"
>+    print outData

I think you can just make this:
    print ET.tostring(doc.getroot(), encoding="UTF-8")
(because we don't use the outData variable anywhere, and because print will automatically add a newline for us.)

>@@ -50,10 +51,13 @@
>-def write_domains(doc, time, output_dir="."):
>+def write_domains(doc, time, output_dir=".", version=None):

We could get the version from the doc, like this:
version = doc.getroot().attrib["version"]
so we don't need to pass it in.

>-        write_config(outData, time, output_dir + "/" + d.text)
>+        if version == "1.0":
>+            write_config(outData.replace("%EMAILDOMAIN%", d.text), time, output_dir + "/" + d.text)
>+        else:
>+            write_config(outData, time, output_dir + "/" + d.text)

You know, I think I might pass a dictionary in to the write_config instead:
        tokens = {}
        if version == "1.0":
            tokens["EMAILDOMAIN"] = d.text
        write_config(outData, tokens, time, output_dir + "/" + d.text)

then write_config can be:
def write_config(outData, tokens, time, filename=None):
    if os.path.exists(filename) and os.stat(filename).st_mtime >= time:
        return

    file = codecs.open(filename, "w")
    for token in tokens:
        outData.replace("%" + token + "%", tokens[token])
    file.write(outData)
    file.close()

And that would allow us to pass in all sorts of new tokens.

(And don't forget to fix any problems that pep8.py reports.
 You can get it by "easy_install pep8". ;)

The other fixes are very nice, though.

Thanks,
Blake.
Attachment #444095 - Flags: review?(bwinton) → review-
Updated with Blake's suggestions and against latest SVN. Cleaned for pep8 and trailing whitespace
Attachment #444095 - Attachment is obsolete: true
Attachment #444390 - Flags: review?(bwinton)
Comment on attachment 444390 [details] [diff] [review]
 Implement replacement of %EMAILDOMAIN% on write of version 1.0 config, v2 

>+++ convert.py	(working copy)
>@@ -33,27 +33,33 @@
>-def write_config(outData, time, filename=None):
>+def write_config(doc, tokens, time, filename=None):
>     if os.path.exists(filename) and os.stat(filename).st_mtime >= time:
>         return
>     print "Writing %s" % filename
>+    outData = ET.tostring(doc.getroot(), encoding="UTF-8") + "\n"

So, this is kind of expensive, and doesn't actually change for a document, no matter how many times we call it per domain.

To give specifics, just moving that one line from the write_config function back to the write_domains function makes the time to convert 179 sample configs go from 56.389s down to 9.160s.  That's a pretty big difference.

>+    for token in tokens:
>+        outData = outData.replace("%" + token + "%", tokens[token])
> 
> 
> def write_domains(doc, time, output_dir="."):
>-    outData = ET.tostring(doc.getroot(), encoding="UTF-8") + "\n"

Having said that, I like the rest of the patch.  So once you put that call back where it belongs, r=me.  ;)

Thanks,
Blake.
Attachment #444390 - Flags: review?(bwinton) → review+
Fair enough, I just thought it would be cleaner to pass the doc.
Attachment #444390 - Attachment is obsolete: true
Attachment #444622 - Flags: review?(bwinton)
Attachment #444622 - Flags: review?(bwinton) → review+
Cleaner, yeah, but _way_ slower.  I don't know if you've heard of the "Zen of Python", but try typing "import this" at the python prompt some time.  (Ah, easter eggs.)

And I mention it because one of the tenants of it is "Practicality beats purity", which seems to apply in this case.  :)

Later,
Blake.
Keywords: checkin-needed
Whiteboard: [c-n svn]
Wasn't that commited long ago, and is already used since a long time? FIXED?
Uh, yeah.  Fixed in http://viewvc.svn.mozilla.org/vc?view=revision&revision=66939
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Keywords: checkin-needed
Whiteboard: [c-n svn]
Component: ispdb → ISPDB Server
Product: Mozilla Messaging → Webtools
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: