Closed Bug 126252 Opened 18 years ago Closed 16 years ago

gnats2bz.pl rewrite

Categories

(Bugzilla :: Installation & Upgrading, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
Bugzilla 2.18

People

(Reporter: dan, Assigned: dan)

Details

Attachments

(3 files, 3 obsolete files)

For converting the gcc gnats database from gnats to bugzilla, we needed a lot
more features than gnats2bz.pl
So I worked on it quite a bit, making it faster, and adding a ton of features.
It's basically a rewrite, in that i've added more code than existed before.

I was asked to file an RFE with the script attached, so i'm doing so.
List of changes:

10x faster (used to take 45 minutes to handle the 270 meg of gcc gnats reports,
now takes 4.5 minutes), even with all the rest of the changes.  Mainly due to
changing the pr parser to be based on the one in gnatsweb, which is absurdly
fast.
If you don't need any of the below, just replacing the parser will probably buy
you a 20x speedup (i didn't do speed testing till i was finished, but its a bit
slower than it was when i first replaced the parser)

Chunks audit trail into separate comments, with the right From's, times, etc.

Handles followup emails that are in the report, with the right From's, times,
etc.

Properly handles duplicates, adding the standard bugzilla duplicate message.

Extracts and handles gnatsweb attachments, as well as uuencoded attachments
appearing in either followup emails, the how-to-repeat field, etc.  Replaces
them with a message to look at the attachments list, and adds the standard
"Created an attachment" message that bugzilla uses.  Handling them includes
giving them the right name and mime-type. "attachments" means multiple
uuencoded things/gnatsweb attachments are handled properly.

Handles reopened bug reports.

Builds the cc list from the people who have commented on the report, and the
reporter.

Other things i've forgotten, i'm sure (I fixed many bugs regarding parsing of
email addresses, etc)

It's hard, at first glance, to tell the result from a bugzilla database built
from scratch.

The patch is actually larger than gnats2bz.pl currently.

It has a *few* gcc specific things that are easy to remove: 
1.  we don't use op_sys and rep_platform, since you can build on any host, for
any build os, for any target.  Thus, we use a build, host, and target field
with the triplet (ie i686-pc-linux-gnu) in them.
2.  Because almost all of our attachments are text, and large text at that
(preprocessed source files), all attachments data is compressed with 
Compress::Zlib.  
This is also because noone will ever want to full-text search it.

Removing these two is a matter of just changing a few lines.

You can see the result (no changes, this is what running the script gives you)
at www.dberlin.org/bugzilla-2.14
Attached file Script in question (obsolete) —
Priority: -- → P2
Target Milestone: --- → Bugzilla 2.18
Attached file Rewrite in python (obsolete) —
This is a python version of the rewrite, which is much cleaner, faster, and has
even more features.
The perl version of the rewrite here is so old compared to my current perl
version, that i'm not sure these are *all* the improvements since then (IE it
may be more).

In addition to the features of first rewrite (which is out of date with regard 
does bug_activity creation, keyword handling, optional mime type guessing for
attachments, no memory leaks (roughly 4-5 meg of memory is the continuous usage
regardless of size of gnats database, and memory usage peaks at whatever the
size of the largest PR is for the time it takes to read in that PR), a large
number of bug fixes and other minor additions i'm forgetting, and it's	40% of
the size of the original rewrite.
Attachment #70116 - Attachment is obsolete: true
Needed because the standard python uu module's decoder doesn't give back the
filename.
Basically file(1) in pure python (does magic file parsing and checking).
Used to determine mime types of attachments in gnatsparse.py.
Beware that most magic.mime files have an audio/mpeg magic bug that make them
detect most files as audio/mpeg.
You can disable the use of this module by commenting out the lines referring to
it in gnatsparse.py, and everything will work just fine (It will set the
attachment mime-types to application/octet-stream)
reassign to patch author
Assignee: zach → dan
I say we just check this in to contrib as-is. Dave?

dberlin: do we just need to check the above three files into the same directory,
or is there more to it than that? Are they self-documenting?

Gerv
Attached file Rewrite in python (obsolete) —
Turned a few site-specific things into variables for easy customization
Attachment #120712 - Attachment is obsolete: true
Nope, nothing more to it.
Just neeed to check all three files into the same dir.
It's self documenting, but i can comment it all if you guys want.
It requires python 2.2+, it won't work with 1.5.2 (Linux distributions and
whatnot all ship with 2.2+ these days, i haven't seen a computer with python but
not 2.2+ in a long time).

If you want it commented, it'll be a few days.
I have finals starting monday.
Given that we don't officially support stuff in /contrib, all the commenting we
can get is good. We can easily wait; if we see someone needing it in the next
week, we can just point them to this bug.

Thanks for doing this :-)

Gerv
It's my enhancement, might as well assign it to myself. :)
Attached file gnatsparse.py
Commented as requested, should be ready to go into contrib now.
Attachment #121808 - Attachment is obsolete: true
Attachment #120713 - Attachment mime type: application/octet-stream → text/plain
Attachment #120714 - Attachment mime type: application/octet-stream → text/plain
dberlin: your gnatsparse.py file requires the two support files, right? Problem
is, they aren't MPL-licensed.

- Is there a known download location for the first one that we can document
instead? 
- Are you able to tell us who the author of the second one was, so we can
contact them?
- Are you happy for gnatsparse.py to be MPLed?

Gerv
The first one is a modified version of a file included with the standard python
distribution.
I rewrote about 10% of the function in question, and removed the encode function
which we do not need.

Ignoring the fact that someone put a copyright notice on top of it, it's
actually not copyrightable in either the current or original form. 
I speak as a 2nd year law student who has taken and passed IP Law, Copyright
Law, and Patent Law.  Though this is not legal advice, just an opinion.
There's just nothing protectible in there.
But, for the sake of simplicity, i'll just rewrite the 104 lines of code and put
it in gnatsparse.py, rather than have a seperate file.

The second support file is based on code taken from a public mailing list (the
email was in russian), and was specifically put into the public domain according
to the message.   I don't have a link handy.  However, i've fixed quite a few
bugs and made quite a few improvements to the code, so feel free to MPL my
contributions if you like (you can't get a copyright in the public domain
portions, only my modifications of it).

Lastly, i'm happy to have gnatsparse.py MPL'd.
Daniel: if you can find a URL for that Russian message, it would make me feel a
lot better :-) Feel free to make the other changes you suggest.

Gerv
Daniel: are you able to make the final few changes here? I'm really snowed
under, otherwise I'd do it. It would be a shame for this to languish...

Gerv
clearing approval to get it out of my queue...  re-request it when we have the
licensing issue resolved.
Flags: approval+
Daniel: ping? :-)

Gerv
I can't find the original URL for the russian message, so if you are worried about licensing of the 
support files, it can't go in.

Of course, you could just remove the mime type guesser, and install it, if you wanted to.
Enhancements which don't currently have patches on them which are targetted at
2.18 are being retargetted to 2.20 because we're about to freeze for 2.18. 
Consideration will be taken for moving items back to 2.18 on a case-by-case
basis (but is unlikely for enhancements)
Target Milestone: Bugzilla 2.18 → Bugzilla 2.20
Gerv thinks we're okay on licensing with this as is, lets go ahead and put it in.
Flags: approval+
Target Milestone: Bugzilla 2.20 → Bugzilla 2.18
Checking in README;
/cvsroot/mozilla/webtools/bugzilla/contrib/README,v  <--  README
new revision: 1.9; previous revision: 1.8
done

RCS file: /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/README,v
done
Checking in gnatsparse/README;
/cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/README,v  <--  README
initial revision: 1.1
done

RCS file: /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/gnatsparse.py,v
done
Checking in gnatsparse/gnatsparse.py;
/cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/gnatsparse.py,v  <-- 
gnatsparse.py
initial revision: 1.1
done

RCS file: /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/magic.py,v
done
Checking in gnatsparse/magic.py;
/cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/magic.py,v  <--  magic.py
initial revision: 1.1
done

RCS file: /cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/specialuu.py,v
done
Checking in gnatsparse/specialuu.py;
/cvsroot/mozilla/webtools/bugzilla/contrib/gnatsparse/specialuu.py,v  <-- 
specialuu.py
initial revision: 1.1
done

Thanks!
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.