Closed Bug 126252 Opened 18 years ago Closed 16 years ago
For converting the gcc gnats database from gnats to bugzilla, we needed a lot more features than gnats2bz.pl So I worked on it quite a bit, making it faster, and adding a ton of features. It's basically a rewrite, in that i've added more code than existed before. I was asked to file an RFE with the script attached, so i'm doing so. List of changes: 10x faster (used to take 45 minutes to handle the 270 meg of gcc gnats reports, now takes 4.5 minutes), even with all the rest of the changes. Mainly due to changing the pr parser to be based on the one in gnatsweb, which is absurdly fast. If you don't need any of the below, just replacing the parser will probably buy you a 20x speedup (i didn't do speed testing till i was finished, but its a bit slower than it was when i first replaced the parser) Chunks audit trail into separate comments, with the right From's, times, etc. Handles followup emails that are in the report, with the right From's, times, etc. Properly handles duplicates, adding the standard bugzilla duplicate message. Extracts and handles gnatsweb attachments, as well as uuencoded attachments appearing in either followup emails, the how-to-repeat field, etc. Replaces them with a message to look at the attachments list, and adds the standard "Created an attachment" message that bugzilla uses. Handling them includes giving them the right name and mime-type. "attachments" means multiple uuencoded things/gnatsweb attachments are handled properly. Handles reopened bug reports. Builds the cc list from the people who have commented on the report, and the reporter. Other things i've forgotten, i'm sure (I fixed many bugs regarding parsing of email addresses, etc) It's hard, at first glance, to tell the result from a bugzilla database built from scratch. The patch is actually larger than gnats2bz.pl currently. It has a *few* gcc specific things that are easy to remove: 1. we don't use op_sys and rep_platform, since you can build on any host, for any build os, for any target. Thus, we use a build, host, and target field with the triplet (ie i686-pc-linux-gnu) in them. 2. Because almost all of our attachments are text, and large text at that (preprocessed source files), all attachments data is compressed with Compress::Zlib. This is also because noone will ever want to full-text search it. Removing these two is a matter of just changing a few lines. You can see the result (no changes, this is what running the script gives you) at www.dberlin.org/bugzilla-2.14
Priority: -- → P2
Target Milestone: --- → Bugzilla 2.18
This is a python version of the rewrite, which is much cleaner, faster, and has even more features. The perl version of the rewrite here is so old compared to my current perl version, that i'm not sure these are *all* the improvements since then (IE it may be more). In addition to the features of first rewrite (which is out of date with regard does bug_activity creation, keyword handling, optional mime type guessing for attachments, no memory leaks (roughly 4-5 meg of memory is the continuous usage regardless of size of gnats database, and memory usage peaks at whatever the size of the largest PR is for the time it takes to read in that PR), a large number of bug fixes and other minor additions i'm forgetting, and it's 40% of the size of the original rewrite.
Attachment #70116 - Attachment is obsolete: true
Needed because the standard python uu module's decoder doesn't give back the filename.
Basically file(1) in pure python (does magic file parsing and checking). Used to determine mime types of attachments in gnatsparse.py. Beware that most magic.mime files have an audio/mpeg magic bug that make them detect most files as audio/mpeg. You can disable the use of this module by commenting out the lines referring to it in gnatsparse.py, and everything will work just fine (It will set the attachment mime-types to application/octet-stream)
reassign to patch author
Assignee: zach → dan
I say we just check this in to contrib as-is. Dave? dberlin: do we just need to check the above three files into the same directory, or is there more to it than that? Are they self-documenting? Gerv
Turned a few site-specific things into variables for easy customization
Attachment #120712 - Attachment is obsolete: true
Nope, nothing more to it. Just neeed to check all three files into the same dir. It's self documenting, but i can comment it all if you guys want. It requires python 2.2+, it won't work with 1.5.2 (Linux distributions and whatnot all ship with 2.2+ these days, i haven't seen a computer with python but not 2.2+ in a long time). If you want it commented, it'll be a few days. I have finals starting monday.
Given that we don't officially support stuff in /contrib, all the commenting we can get is good. We can easily wait; if we see someone needing it in the next week, we can just point them to this bug. Thanks for doing this :-) Gerv
It's my enhancement, might as well assign it to myself. :)
Commented as requested, should be ready to go into contrib now.
Attachment #121808 - Attachment is obsolete: true
Attachment #120713 - Attachment mime type: application/octet-stream → text/plain
Attachment #120714 - Attachment mime type: application/octet-stream → text/plain
dberlin: your gnatsparse.py file requires the two support files, right? Problem is, they aren't MPL-licensed. - Is there a known download location for the first one that we can document instead? - Are you able to tell us who the author of the second one was, so we can contact them? - Are you happy for gnatsparse.py to be MPLed? Gerv
The first one is a modified version of a file included with the standard python distribution. I rewrote about 10% of the function in question, and removed the encode function which we do not need. Ignoring the fact that someone put a copyright notice on top of it, it's actually not copyrightable in either the current or original form. I speak as a 2nd year law student who has taken and passed IP Law, Copyright Law, and Patent Law. Though this is not legal advice, just an opinion. There's just nothing protectible in there. But, for the sake of simplicity, i'll just rewrite the 104 lines of code and put it in gnatsparse.py, rather than have a seperate file. The second support file is based on code taken from a public mailing list (the email was in russian), and was specifically put into the public domain according to the message. I don't have a link handy. However, i've fixed quite a few bugs and made quite a few improvements to the code, so feel free to MPL my contributions if you like (you can't get a copyright in the public domain portions, only my modifications of it). Lastly, i'm happy to have gnatsparse.py MPL'd.
Daniel: if you can find a URL for that Russian message, it would make me feel a lot better :-) Feel free to make the other changes you suggest. Gerv
Daniel: are you able to make the final few changes here? I'm really snowed under, otherwise I'd do it. It would be a shame for this to languish... Gerv
clearing approval to get it out of my queue... re-request it when we have the licensing issue resolved.
Daniel: ping? :-) Gerv
I can't find the original URL for the russian message, so if you are worried about licensing of the support files, it can't go in. Of course, you could just remove the mime type guesser, and install it, if you wanted to.
Enhancements which don't currently have patches on them which are targetted at 2.18 are being retargetted to 2.20 because we're about to freeze for 2.18. Consideration will be taken for moving items back to 2.18 on a case-by-case basis (but is unlikely for enhancements)
Target Milestone: Bugzilla 2.18 → Bugzilla 2.20
Gerv thinks we're okay on licensing with this as is, lets go ahead and put it in.
Target Milestone: Bugzilla 2.20 → Bugzilla 2.18
Checking in README; /cvsroot/mozilla/webtools/bugzilla/contrib/README,v <-- README new revision: 1.9; previous revision: 1.8 done RCS file: /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/README,v done Checking in gnatsparse/README; /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/README,v <-- README initial revision: 1.1 done RCS file: /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/gnatsparse.py,v done Checking in gnatsparse/gnatsparse.py; /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/gnatsparse.py,v <-- gnatsparse.py initial revision: 1.1 done RCS file: /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/magic.py,v done Checking in gnatsparse/magic.py; /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/magic.py,v <-- magic.py initial revision: 1.1 done RCS file: /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/specialuu.py,v done Checking in gnatsparse/specialuu.py; /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/specialuu.py,v <-- specialuu.py initial revision: 1.1 done Thanks!
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.