Closed Bug 285614 Opened 19 years ago Closed 19 years ago

Rewrite importxml.pl to remove XML::Parser magic numbers

Categories

(Bugzilla :: Bug Import/Export & Moving, defect, P2)

2.19

Tracking

()

RESOLVED FIXED
Bugzilla 2.22

People

(Reporter: gregaryh, Assigned: gregaryh)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 12 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20040913 Firefox/0.10
Build Identifier: 

Updated importxml.pl

Reproducible: Always
Attached patch importxml.pl patch (obsolete) — Splinter Review
This patch addresses a number of issues with importxml.pl, but first let me
explain the background of the changes we have made.

We have been migrating large numbers of bugs from bugzilla.ximian.com to
bugzilla.novell.com (on the order of thousands) so we needed to automate the
process a little. importxml.pl works great when you are moving from similar
versions of bugzilla, but ximian was running 2.10. Their xml output was not up
to date with 2.18 so we took it and transformed each bug's xml using a
stylesheet and then ran the import localy using the xml files that were
generated. For this we did not want to spam users mail boxes with thousands of
emails so I coded a bit that, when set to off, does not send mail. We also
wanted to run it as a script from the command line and see any error output
there so I added a second bit to do that. In addition to the everconfirmed
problem we found and fixed the following:

Group permissions on the migrated bugs were not being set correctly. We added
code to check which groups the bugs' product belonged to and added them to that
group.

We wanted to make use of aliases in migrated bugs so that we had a way to refer
to the migrated bugs using their original bug number, we did this by prepending
the string XIM to the bug number in our xsl stylesheet. We added the alias
field to the import code.

Ximian's bugzilla outputs all the long desc info as base64. I am not sure if
this was their own hack or if this was default behavior in 2.10. We added code
to decode it first if it's encoding was base64

Move.pl does not transfer attachments but the 2.10 xml.cgi includes attachment
info. We added code to import attachments if present. This would be a desirable
enhancement to move.pl.

Bugs that were marked duplicate cannot be put in the duplicates table since
there is no way of knowing if the duplicate bug has been moved as well, and if
so what number it now has. To prevent sanitycheck from complaining, we change
the resolution of these bugs from DUPLICATE to MOVED. This is likely only an
issue for groups migrating products wholsale as we are doing. 

As for the everconfirmed, I simply check to see if the status is unconfirmed
and if so leave the everconfirmed at 0, otherwise I set it to 1. Since no vote
information is included in the move and since the number of votes needed to
confirm a bug between the two bugzilla's may not be the same, this seemed the
most logical.
Attachment #177027 - Flags: review?(travis)
Attachment #177027 - Flags: review?(travis) → review?
Assignee: justdave → ghendricks
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: PC → All
Attached patch importxml.pl patch (obsolete) — Splinter Review
added fix for bug 198571
Attachment #177027 - Attachment is obsolete: true
Attachment #177033 - Flags: review?
Status: NEW → ASSIGNED
Attachment #177027 - Flags: review?
Comment on attachment 177033 [details] [diff] [review]
importxml.pl patch

  OK, I'm going to take a crack at this from a purely perl-based review
standpoint. I believe that we no longer have anybody who understands or uses
this code frequently. Just the fact that you have used it and understand it
means that it's probably much better than what we have now.

  If we do check this in, I hope that you will be able to stay around to be the
"maintainer" of this feature. :-)

  Also, I suspect that this may eventually become somewhat obsoleted by the
Bugzilla::RPC interface that we will work on for 2.22 (bug 224577). But we'll
see. :-)

  I believe that justdave is the only person who's made recent *functional*
changes to this code, so I'm also asking him for review. bbaetz also
understands it, I believe.
Attachment #177033 - Flags: review?(justdave)
Comment on attachment 177033 [details] [diff] [review]
importxml.pl patch

>+use MIME::Base64;

  Is this in perl 5.6?

>+# If you would like to see error messages on STDOUT set $debug = 1
>+my $debug = 0;
>+
>+# If you do not want to send emails on bug import set $mail = 0
>+my $mail = 1;

  Thse should be "use constant" constants, I think.

>+print "Reading xml\n" if ($debug == 1);

  I think we should just have a sub called debug_message for this, and you can
do the check inside there. (Which should be if $debug, without the == 1.)

>-  MailMessage ($subject, $message, @to);
>+  print "$message\n " if ($debug == 1);
>+  MailMessage ($subject, $message, @to) if ($mail == 1);

  The $mail check and the $debug statement can both go inside MailMessage, I
believe. Same for the other calls to MailMessage.

>+  # Because we are using XML::Parser as a tree, it is important that the xml
>+  # fields be in the right order in the source file. It would be much better
>+  # if we used some other XML package that would identify fields by name 
>+  # such as XML::TreeBuilder or XML::SimpleObject

  Email Jeff about bug 224577, and decide which XML class you want to use, and
then maybe we can include it as an optional Bugzilla module.

>+  # Sanity check will complain about having bugs marked duplicate but no 
>+  # entry in the dup table. Since we can't tell the bug ID of bugs 
>+  # that might not yet be in the database we have no way of populating 
>+  # this table. Change the resolution instead.
>+  if (defined ($bug_fields{'resolution'}) && 
>+    ($bug_fields{'resolution'} eq "DUPLICATE")){
>+        $bug_fields{'resolution'} = "MOVED";
>+  }
>+  

  Hrm... I wish there was some better way to handle that, but I suppose there
isn't. Don't you also have to handle the dupe_of field, here?

>+  # For the sake of sanitycheck.cgi we do this.
>+  # Update lastdiffed 
>+  push @query, "lastdiffed";
>+  push @values, "now()"; 

  You only want to set lastdiffed to now if you didn't send out email and you
want Bugzilla to think that you did. If you want something to not register as a
bug change during an update, you can set delta_ts = delta_ts. I'm pretty sure
that people want to be generally emailed about new bugs that are MOVED into
this installation.

>+    my $filename = SqlQuote($att->{'filename'});
>+    my $description = SqlQuote($att->{'desc'});

  We don't use SqlQuote anymore in modern Bugzilla code. Instead use Bugzilla;
my $dbh = Bugzilla->dbh, and then use DBI methods with placeholders.

>+    # For some stupid reason, if I assign $ispatch = $att->{'ispatch'} directly 
>+    # it corrupts the binary data in $thedata. 

  That will also solve this one.

>+    my $isprivate = 0;
>+    SendSQL("INSERT INTO attachments (bug_id, creation_ts, filename, description, mimetype, ispatch, isprivate, submitter_id, thedata) 
>+           VALUES ($id, $datestamp, $filename, $description, $contenttype, $ispatch, $isprivate, $submitter_id, $thedata)");

  These lines are longer than 80 characters. :-)

>+  # Add this bug to each group of which its product is a member.
>+  my $sth = Bugzilla->dbh->prepare_cached(
>+      "SELECT group_id FROM group_control_map " .

  Instead of this, create a Bugzilla::User object and get its ->groups.
($user->groups).
Attachment #177033 - Flags: review?(justdave)
Attachment #177033 - Flags: review?
Attachment #177033 - Flags: review-
Oh, and by the way, in case you're wondering how I figured out who changed
importxml recently:

<http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/webtools/bugzilla/importxml.pl>
Attached patch Updated patch (obsolete) — Splinter Review
Worked out the issues that Max raised.
Attachment #177033 - Attachment is obsolete: true
Attachment #177150 - Flags: review?(justdave)
Attached patch fixed and updated (obsolete) — Splinter Review
Forgot to address the SqlQuotes.
Attachment #177150 - Attachment is obsolete: true
Attachment #177159 - Flags: review?(justdave)
Attachment #177150 - Flags: review?(justdave)
The XML output from Bugzilla *does* tell us what version it came from. 
importxml.pl, in order to be really useful, should be able to know what to
expect from given versions of Bugzilla and deal with it appropriately.  (i.e. if
the schema has changed)

Not that anything needs to be done here about this, but just thinking out loud
so it's on the record somewhere.
Severity: normal → major
Priority: -- → P2
Target Milestone: --- → Bugzilla 2.22
Version: unspecified → 2.19
Comment on attachment 177159 [details] [diff] [review]
fixed and updated

>Index: importxml.pl

>@@ -68,11 +70,25 @@
>+use MIME::Base64;
>+use Date::Parse;
>+use Date::Format;
> 
> require "CGI.pl";
> require "globals.pl";
> $::lockcount = 0;

Bitrot due to the removal of CGI.pl and of $::lockcount.


>@@ -131,6 +148,14 @@
>+sub Debug {
>+    return unless ($debug);
>+    my ($message, $level) = (@_);
>+    print "ERR: ". $message if ($debug == err_level);
>+    print $message if ($debug == debug_level);
>+    print "\n";
>+}
>+
> sub Lock {
>     if ($::lockcount <= 0) {
>         $::lockcount = 0;

Same here. Lock() has been removed.


ghendricks, please don't ask me to review the updated patch. I only noticed
that the patch is bitrotten, that's all. ;)
Attachment #177159 - Flags: review?(justdave) → review-
Blocks: 190596
Blocks: 307328
New version will use XML::Twig which still depends on XML::Parser but will allow
the processing of attachment data without hogging all system resources. Also
removes the dependence on field order in the XML document.
Summary: Rewrite of importxml.pl to address a number of bugs → Rewrite importxml.pl to remove XML::Parser magic numbers
Depends on: 308221
No longer blocks: 307328
Attached patch Rewrite v1 (obsolete) — Splinter Review
Attachment #177159 - Attachment is obsolete: true
Attachment #196225 - Flags: review?
This is a nearly complete rewrite of importxml.pl. You can use attachment
(id=196223) to test it. Easiest way is to run importxml.pl from the command line
with the file.

This uses XML::Twig which has a number of benefits. 
First it removes the magic number references that the previous version required,
thus removing the field order dependency and making it much more readable. 
Second, it allows the processing of much larger files. Since it parses the tree
piecemeal, it can discard the parts it is finished with and devote the memory to
the next chunk. This is very important since it now allows attachment data to be
included. 

Additionally, this patch addresses all the bugs on this bugs blocks list.
Attachment #196225 - Flags: review?(myk)
Attachment #196225 - Flags: review?(mkanat)
Attachment #196225 - Flags: review?
Comment on attachment 196225 [details] [diff] [review]
Rewrite v1

  So... this is basically a new script... so... it should run under taint mode.

  OK, below is a *really* fast review of this, just on a quick glance:

>+# If you would like to see error messages on STDOUT set $debug = 1
>+# Set $debug to 2 if you want more verbose messages
>+
>+my $debug = 2;
>+
>+use constant debug_level => 2;
>+use constant err_level => 1;

  1) Could we make the debug level a command-line switch?
  2) Constants should be ALL_UPPERCASE.

>+# If you do not want to send emails on bug import set $mail
>+my $mail;
>+$mail = $ARGV[0] if ($ARGV[0]);

  I think it would be better to use GetOpt::Long, if it's available by default
in perl 5.6.

>+our $bugqty;

  You don't have to abbreviate that name. :-) Spell it out, it would be easier
to understand.

>+my $xml;
>+Debug("Reading xml", debug_level);

  I'm usually happier if subroutine definitions are at the top of the file, and
the code that calls them is lower.

>+while (<>) {
>+ $xml .= $_;
>+}

  I think I'd prefer to do all of that with IO::File, it would be clearer and
cleaner. Or at least with a slurp (where you grab the whole file in at once).

  Also, as long as you do this this way, you're going to need memory equal to
the size of the entire XML file, which could be quite large including
attachments.

>+Debug("Parsing tree", debug_level);
>+my $twig = XML::Twig->new(twig_handlers=>{bug => \&process_bug,
>+                                          attachment => \&process_attachment},
>+                          start_tag_handlers => {bugzilla => \&init});
>+$twig->parse($xml);
>+my $root = $twig->root;
>+my $maintainer = $root->{'att'}->{'maintainer'};
>+my $exporter = $root->{'att'}->{'exporter'};
>+my $urlbase= $root->{'att'}->{'urlbase'};

  You know, looking at this (this is just a thought) it would be nice to
actually create some sort of *object* from the input XML, and then just insert
that into the DB. :-) Maybe Bugzilla::Bug::FromXML, or something like that.

> sub sillyness {

  This whole "sillyness" thing should be entirely unnecessary, if we're
re-writing this code anyhow.

>+sub Log {
>+    my ($str) = (@_);
>+    open(FID, ">>$datadir/maillog") || die "Can't write to $datadir/maillog";

  Being unable to write to the mail log isn't a fatal error, seems like we
could just "warn" about that.

>+    if ($date) {
>+        $date = str2time($date);
>+        $date = time2str("%Y-%m-%d %X", $date);

  Hrm... did you really mean to use sql_date_format, there (since you're
putting it into SQL...)?


>+    my %multiple_fields;

  I pretty much stop understanding the process_bug function from here on down.
Could you comment it a bit to help me out?

>+      # If we leave the attachemnt ID in the comment it will be made a link
>+      # to the wrong attachment. Since the new attachment ID is unkown yet
>+      # let's strip it out for now. We will make a comment with the right ID
>+      # later
>+      $data =~ s/Created an attachment \(id=\d+\)/Created an attachment/g;

  Why don't you instead leave it in, so that you can make a translation table
(old attach_id => new attach_id), and just fix the comments after you insert
the attachments?

>+    my @versions = Bugzilla::Version::get_versions_by_product($prod_id);

  I'm fairly sure you should be using $product->versions instead, that's the
standard way to do that.

>+    if (defined ($bug_fields{'priority'}) &&
>+         (my @priority = grep(lc($_) eq lc($bug_fields{'priority'}), @::legal_priority)) ){
>+      push (@values, $dbh->quote($priority[0]) );

  Um, what? Why $priority[0]?

  Also, you know, instead of doing all this $dbh->quote, you could also just
create placeholders in the SQL for each item in the arrays (since they're both
in the same order) and pass in the parameter array to the $dbh->do or whatever.

>+        my ($tm) = $dbh->selectrow_array("SELECT defaultmilestone FROM products " .
>+                "WHERE name = ". $dbh->quote($product));

  It would be nice to have a $product->default_milestone... or wait, don't we,
already? So you could just use that.

>+        ($qa_contact) = 
>+        $dbh->selectrow_array("SELECT initialqacontact ".
>+                "FROM components, products " .
>+                "WHERE components.product_id = products.id" .
>+                " AND products.name = " . $dbh->quote($product) .
>+                " AND components.name = " . $dbh->quote($component) );

  You should be able to get that out of $component instead.

>+    my $query  = "INSERT INTO bugs (\n" 

  It would be nice to have a debug level that printed that out somewhere, I
think. :-) (Well... maybe except longdescs. :-D)

>+        if (!$keywordseen{$i}) {
>+          $dbh->do("INSERT INTO keywords (bug_id, keywordid) VALUES ($id, $i)");

  Please use placeholders instead of inserting the values directly.

>+    # Clear the attachments array for the next bug
>+    @attachments = ();

  That looks fishy to me.

>+    $dbh->do("INSERT INTO longdescs (bug_id, who, bug_when, work_time, isprivate, thetext) VALUES " .
>+      "($id, $exporterid, now(), $worktime, $private, "  . $dbh->quote($long_description) . ")");

  Placeholders, please. :-)

>+    # Add this bug to each group of which its product is a member.
>+    my $sth = $dbh->prepare_cached(
>+        "SELECT group_id FROM group_control_map " .
>+        "WHERE product_id = ?");

  Isn't there a function you can call on $product to get that?

>+    Debug($log, err_level);
>+    Bugzilla::BugMail::Send($id, { 'changer' => $exporter }) if ($mail);
Attachment #196225 - Flags: review?(mkanat) → review-
(In reply to comment #13)
> (From update of attachment 196225 [details] [diff] [review] [edit])
>   So... this is basically a new script... so... it should run under taint mode.

Um, what does that mean exactly (Perl ignoramus in the house -.-)

> >+# If you do not want to send emails on bug import set $mail
> >+my $mail;
> >+$mail = $ARGV[0] if ($ARGV[0]);
> 
>   I think it would be better to use GetOpt::Long, if it's available by default
> in perl 5.6.
Is it? I will look into it.

> >+our $bugqty;
> 
>   You don't have to abbreviate that name. :-) Spell it out, it would be easier
> to understand.
> 
This was the name Dawn gave it in the original. Laziness shining through :-).
So I guess this isn't a TOTAL rewrite since I have reused a fair amount of the
code, if not the ideas used by Dawn in the original. I am willing to redo it if
it makes the code more readable, faster, etc. otherwise I don't see much point.

> >+while (<>) {
> >+ $xml .= $_;
> >+}
> 
>   I think I'd prefer to do all of that with IO::File, it would be clearer and
> cleaner. Or at least with a slurp (where you grab the whole file in at once).
> 
>   Also, as long as you do this this way, you're going to need memory equal to
> the size of the entire XML file, which could be quite large including
> attachments.

Is IO::File part of Perl 5.6? Is it more efficient? Since the parser needs the
whole file any way, I am not sure it can be avoided. Since the data is coming
from STDIN is there a way to have it only pull in part at a time? Twig does have
a parsfile command but it is expecing a filename and path

> 
> >+Debug("Parsing tree", debug_level);
> >+my $twig = XML::Twig->new(twig_handlers=>{bug => \&process_bug,
> >+                                          attachment => \&process_attachment},
> >+                          start_tag_handlers => {bugzilla => \&init});
> >+$twig->parse($xml);
> >+my $root = $twig->root;
> >+my $maintainer = $root->{'att'}->{'maintainer'};
> >+my $exporter = $root->{'att'}->{'exporter'};
> >+my $urlbase= $root->{'att'}->{'urlbase'};
> 
>   You know, looking at this (this is just a thought) it would be nice to
> actually create some sort of *object* from the input XML, and then just insert
> that into the DB. :-) Maybe Bugzilla::Bug::FromXML, or something like that.
> 

My thoughts exactly. We need a Bugzilla::Bug::Insert() to just create a bug
object and pop it into the database. I was looking through process_bug to see
how it could be done but that is for another day *sigh*

> > sub sillyness {
> 
>   This whole "sillyness" thing should be entirely unnecessary, if we're
> re-writing this code anyhow.
> 
It will be, once data/versioncache goes away ;-)  As it is, with the component,
milestone and product objects some of these values have departed.

> >+    if ($date) {
> >+        $date = str2time($date);
> >+        $date = time2str("%Y-%m-%d %X", $date);
> 
>   Hrm... did you really mean to use sql_date_format, there (since you're
> putting it into SQL...)?

My lack of DBI knowlege shining through. I will look into it.

> >+      # If we leave the attachemnt ID in the comment it will be made a link
> >+      # to the wrong attachment. Since the new attachment ID is unkown yet
> >+      # let's strip it out for now. We will make a comment with the right ID
> >+      # later
> >+      $data =~ s/Created an attachment \(id=\d+\)/Created an attachment/g;
> 
>   Why don't you instead leave it in, so that you can make a translation table
> (old attach_id => new attach_id), and just fix the comments after you insert
> the attachments?

Laziness again. For me at least, that is beyond my limited abilites at the
moment. I knew how to do this much so this is what I did.

> >+    if (defined ($bug_fields{'priority'}) &&
> >+         (my @priority = grep(lc($_) eq lc($bug_fields{'priority'}),
@::legal_priority)) ){
> >+      push (@values, $dbh->quote($priority[0]) );
> 
>   Um, what? Why $priority[0]?
> 
For some strange reason Dawn set these as arrays. I didn't really look into
changing it since it worked. 

> 
> >+    # Clear the attachments array for the next bug
> >+    @attachments = ();
> 
>   That looks fishy to me.

Hmm, how to explain O.-;
As the tree is parsed it calls subroutine handlers for each element for which
one is defined (i.e. attachment) Since attachments are the biggest chunk of data
and can potentially be many megabytes, and since XML trees in memory are
typically 10-30 times larger than the original data, we want to look at one
attachment shove it into a hash and dispose of it so we can free up the memory.
Since each bug is it's own twig all the attachments for a single bug will be
processed before process_bug will be called. We therefore know that the
attachments array only contains the hashes of this bugs attachments. Once we hav
e processed them we clear it out for the next bugs attachments. 

I know it is complicated, but this ensures that we have enough memory to handle
multiple bugs with multiple attachments. Trust me, it works ;-)

>   Also, you know, instead of doing all this $dbh->quote, you could also just
> create placeholders in the SQL for each item in the arrays (since they're both
> in the same order) and pass in the parameter array to the $dbh->do or whatever.

Can you provide an example of this? I am not sure how to go about doing so.
Depends on: 310111
Attached patch Rewrite v2 (obsolete) — Splinter Review
Attachment #196225 - Attachment is obsolete: true
Attachment #197770 - Flags: review?(mkanat)
Attachment #196225 - Flags: review?(myk)
Attachment #197770 - Flags: review?(mkanat) → review?(justdave)
No longer depends on: 310111
Flags: blocking2.22?
Comment on attachment 197770 [details] [diff] [review]
Rewrite v2

Partial review from a non-reviewer

>Index: importxml.pl
>-use XML::Parser;
>+use XML::Twig;

Suggest to add 'use Bugzilla' as well. It is done by globals.pl but that will go away.

>+# Since this script can be run remotely by an email alias, it is good to log
>+# what it did. $datadir will usually be data/ in the bugzilla root directory
>+sub Log {
>+    my ($str) = (@_);
>+    open(FID, ">>$datadir/maillog") || warn "Can't write to $datadir/maillog";
>+    print FID time2str("%D %H:%M", time()) . ": $str\n";
>+    close FID;
>+}

Please remove Log. This was removed from main Bugzilla, see bug 288170. 

>+  # Parse attachments.
>+  #
>+  # This subroutine is called once for eaach attachment in the xml file. 

Typo: eeach

>+sub process_bug {    
[..]
>+    if (defined $bug->{'att'}->{'error'}) {
>+      $log .= "\nError in bug ". $bug->field('bug_id') ."\@$urlbase:";
>+      $log .= " $bug->{'att'}->{'error'}\n";
>+      if ($bug->{'att'}->{'error'} =~ /NotFound/) {
>+        $log .= "$exporter tried to move bug ". $bug->field('bug_id') ."here";
>+        $log .= " but $urlbase reports that this bug does not exist.\n"; 
>+      } elsif ( $bug->{'att'}->{'error'} =~ /NotPermitted/) {
>+        $log .= "$exporter tried to move bug ". $bug->field ."('bug_id') here";
>+        $log .= " but $urlbase reports that $exporter does not have access";
>+        $log .= " to that bug.\n";
>       }

What does this do?

>+    # Now we process each of the fields in turn and make sure they contain 
>+    # valid data. We will create two parallel arrays, one for the query
>+    # and one for the values. For every field we need to push an entry onto
>+    # each array.
>+    my @query = ();
>+    my @values = ();

Do not push $dbh->quote('somestuff') on those values. Instead do something like:
my $query  = "INSERT INTO bugs (\n"
     . join (",\n", @query)
     . "\n) VALUES (\n";
$query .= '?,' foreach (@values);
# Remove the last comma.
chop($query);
$query .= ")";
$dbh->do($query, undef, @values);

>+    # Each of these fields does not need to be quoted so just shove them 
>+    # ont the array.

Typo: ont

>+    foreach my $unq_field ( qw(cclist_accessible reporter_accessible) ) {
>+        if ( (defined $bug_fields{$unq_field}) && ($bug_fields{$unq_field}) ){
>+          push (@query, "$unq_field");
>+          push (@values, $bug_fields{$unq_field});
>+        }
>+    }

Should still validate those fields are numeric, otherwise this is a security issue. With custom made SQL someone could insert whatever SQL they want. However, this wouldn't matter with the $dbh->quote replacement stuff (I think).

>+    push (@query, "delta_ts");
>+    if ( (defined $bug_fields{'delta_ts'}) && ($bug_fields{'delta_ts'}) ){
>+        push (@values, $dbh->quote($bug_fields{'delta_ts'}));
>+    }
>+    else {
>+        push (@values, "NOW()");

Will not working with quoting change suggested above. Suggest to do:
my ($timestamp) = Bugzilla->dbh->selectrow_array("SELECT NOW()");
push (@values, $timestamp);

Use $timestamp in all the code.

>+    # Here we check to see if the product and component that were sent 
>+    # are valid in our installation. If not, use the defaults set in
>+    # the parameters
[..]
>+    my $prod_id = $product->id;
>+    my $comp_id = $component->id;

This will fail if Param("moved-default-product") or Param("moved-default-component") is an invalid product/component or empty. $product->id is in those cases not allowed.

>-  my $comments;
>+      # Niether the imported bug's product and/or component nor the parameters 
>+      # are valid. Exit gracefully.

Typo: Niether

>+    # Process time fields
>+    if (Param("timetrackinggroup")) {
>+        my $date = $bug_fields{'deadline'};
>+        if ($date) {
>+            $date = str2time($date);
>+            $date = time2str("%Y-%m-%d %X", $date);
>+        }
>+        else { $date = "NULL";}

Would be undef with the quoting changes.

>+    my $query  = "INSERT INTO bugs (\n" 
>+                 . join (",\n", @query)
>+                 . "\n) VALUES (\n"
>+                 . join (",\n", @values)
>+                 . "\n)\n";
>+    Debug("Bug Query: ". $query, DEBUG_LEVEL);
>+    $dbh->do($query);

See comments above.

>+    # Handle CC's
[..]
>+          $dbh->do("INSERT INTO cc (bug_id, who) VALUES (?,?)", 
>+                                undef, $id, $dbh->quote($uid));

Replace that with $sth_cc and $sth_cc->execute. See below for example.

>+    # Handle keywords
[..]
>+        my $i = $::keywordsbyname{$keyword};
>+        if (!$i) {
>+          $err .= "Skipping unknown keyword: $keyword.\n";
>+          next;
>+        }

The keywords are stored in two places.
1. In the keywords table
2. In the keywords column of the bugs table (as text).

These should be kept synchronized (check the unknown keywords earlier and update $bug_fields{'keywords'} when required). The keywords column of the bugs table is a sorted (a to z) list of the keywords joined by ', '.

>+        if (!$keywordseen{$i}) {
>+          $dbh->do("INSERT INTO keywords (bug_id, keywordid) VALUES (?,?)", 
>+                                                          undef, $id, $i);
>+          $keywordseen{$i} = 1;

It is better to prepare the INSERT statement out of the loop and then execute it specifying the arguments, like:
my $sth_keywords = $dbh->prepare('INSERT INTO keywords (bug_id, keywordid) VALUES (?,?)');

before the foreach loop and:
$sth_keywords->execute(($id, $i));

>+    # Parse bug flags

I understand nothing about flags, skipping.

>+    
>+  # Insert Attachments for the bug
>+  foreach my $att (@attachments){
>+      $dbh->do("INSERT INTO attachments 
>+          (bug_id, creation_ts, filename, description, mimetype, ispatch, 
>+          isprivate, isobsolete, submitter_id) 
>+          VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)" , 
>+          undef, $id, $dbh->sql_date_format($att->{'date'}), $att->{'filename'}, 

I think you said before sql_date_format didn't work. Why does it work now?

>+      # Process attachment flags

Skipping that again.

>+    my $worktime = $bug_fields{'actual_time'} || 0.0;
>+    $dbh->do("INSERT INTO longdescs 
>+                   (bug_id, who, bug_when, work_time, isprivate, thetext) 
>+            VALUES (?,?,?,?,?,?)", undef, 
>+            $id, $exporterid, "now()", $worktime, $private, $dbh->quote($long_description) );

You are trying to push the text 'NOW()' into a field instead of the SQL function 'NOW()'. Use $timestamp instead (commented on that somewhere above). Suggest to retrieve that during init() (as is it always needed during import).

>+    # Add this bug to each group of which its product is a member.

Don't know anything about groups, skipping :)

>+=head1 NAME

Also don't know much about perldoc.
Comment on attachment 197770 [details] [diff] [review]
Rewrite v2

I only had a quick look at your patch and I only have a few comments so far:

0) I really like the way you wrote it! :) At first sight, it looked scary, but it's finally not that hard. ;)

1) If the initial assignee and/or QA contact of a bug doesn't exist in the new DB, reassign the bug to the default assignee and/or QA contact of the product where the imported bug is going into. This choice makes much more sense than reassigning to the exporter of the bug.

2) For pending requests, if the requestee doesn't exist, you should request from the wind instead of requesting the exporter. I doubt the exporter will set the flag himself. ;)

3) About checking products and components, you shouldn't use check_product() and check_component() as they will throw an error if they don't exist. You should instead do the following (we use it in editproducts.cgi already):
my $product = new Bugzilla::Product({name => $product_name});
unless ($product) {
  # use the default product
}

same for the component. This solution is *much* cleaner.

4) I haven't check all validations, but be sure to never trust anything a priori. For instance, be sure that the summary of the bug has no newline, see bug 101380.


I will test an updated patch, probably next week (if nobody else wants to).
Attachment #197770 - Flags: review?(justdave) → review-
Attached patch Rewrite v3 (obsolete) — Splinter Review
Attachment #197770 - Attachment is obsolete: true
Attachment #201880 - Flags: review?
Comment on attachment 201880 [details] [diff] [review]
Rewrite v3

r- per my comments on IRC
Attachment #201880 - Flags: review? → review-
*** Bug 233005 has been marked as a duplicate of this bug. ***
Blocks: 275386
Depends on: 315605
Depends on: 316096
Attached file text xml
This is an xml file that can be used to test this patch
Attached patch Rewrite v4 (obsolete) — Splinter Review
<ghendricks>  Invalid parameter passed to Bugzilla::Product::_init.
<ghendricks>  It must be numeric.
<ghendricks> that is what I get when I try $product = new Bugzilla::Product('product name')
<LpSolit> ghendricks: of course, a product ID is expected
<LpSolit> ghendricks: $product = new Bugzilla::Product(name => 'product name'})
<ghendricks> ah
<LpSolit> err.. $product = new Bugzilla::Product({name => 'product name'})
<LpSolit> you need a hash if you want to build the object from its name
<LpSolit> same for versions, components, milestones, etc..

<LpSolit> ghendricks: why do you need Pod::Usage?
<ghendricks> makes for easy commandline usage message
<ghendricks> try running ./importxml.pl --help
<LpSolit> and Getopt::Long ?
<bkor> importxml.pl: Unknown option: help
* LpSolit installs missing librairies
<ghendricks> oops
<ghendricks> try -?
<bkor> the -? is nice
<ghendricks> k, I fixed the --help
<ghendricks> mispelled it in the options

<LpSolit> don't forget to fix XML::Twig occurences in the docs too
<LpSolit> errr... XML::Parser
<ghendricks> hmm
<ghendricks> ok
<LpSolit> unless they refer to Bugzilla 2.20 or earlier
<LpSolit> 'XML::Parser' => ['importxml.pl'], also appears in t/Support/Files.pm
<LpSolit> this should be fixed
<LpSolit> else I fear that runtests.pl will fail
<ghendricks> still passes, though I have updated it

<LpSolit> ghendricks: could you send me an email with an XML file in it? importxml.pl says it will remove what is before <?xml
<LpSolit> # This script is used import bugs from another installation of bugzilla.
<LpSolit> nit: used to import
<ghendricks> I can try. I have not tried setting up the email settings 
<ghendricks> but I have left that untouched from the earlier version of this file
<bkor> LpSolit: that stripping part hasn't changed since 2.16, I know that works
<LpSolit> no idea, it's the first time I look at importxml.pl ;)
<LpSolit> importxml.pl don't care about bugzilla.dtd, right?
<bkor> nope
<ghendricks> not directly
<LpSolit> not at all you mean ;)
<LpSolit> ghendricks: $Data::Dumper::Useqq = 1; <---- why this?
<ghendricks> That was already there
<LpSolit> the docs say it will slow down Dump()
<LpSolit> ok
<ghendricks> I don't use it though it might have been left over
<LpSolit> the docs say it unprintable characters will be quoted
<LpSolit> s/it//
<LpSolit> quoted octal integers
<LpSolit> ghendricks: should be: "help|?"        => \$help); right?
<LpSolit> instead of $help ?
<ghendricks> yes, already fixed it

<LpSolit>   my $subject = shift @_;
<LpSolit>   my $message = shift @_;
<LpSolit> nit: "shift" alone
<ghendricks> again, not my doing
<ghendricks> but I can change it
<LpSolit> oh sorry, but I have applied your patch and now I'm reading importxml.pl directly
<LpSolit> so I have no idea what's old, what's new ;)
<ghendricks> even just looking at the patch I have moved things around a lot so it appears new even if it isn't
<ghendricks> let me explain the basic workings of XML::Twig
<ghendricks> bkor: this will answer your question
<ghendricks> the file xml file gets slurped in and parsed
<ghendricks> as it begins parsing it looks for certain tags in the xml that have defined triggers
<ghendricks> start_tag_handlers => {bugzilla   => \&init});
<ghendricks> thus the init sub gets called as soon as the <bugzilla> tag is parsed
<ghendricks> all init does is check that certain attributes are set in the bugzilla tag
<ghendricks> and makes sure the parameters are set to allow moving etc.
<ghendricks> it then returns and the parsing continues
<ghendricks> as it goes along it will find one or more <bug> tags
<ghendricks> once it has read the entire contents of a bug tag it calls process bug
<ghendricks> process_bug then takes the twig object that was passed to it and begins handling the data
<ghendricks> if there are attachments in the xml, they are always found inside <bug> tags
<ghendricks> once an attachment tag is fully parsed process_attachment gets called
<ghendricks> this slurps the Twig data and shoves it into a hash which is pushed onto an array and which is held until process_bug can deal with it
<ghendricks> one of the major drawbacks of the old importxml is that it had to hold the entire xml tree in memory
<LpSolit> and here?
<LpSolit> one bug at a time?
<ghendricks> the nature of xml is that it takes up about 10-30 time the size of the file in memory in order to handle it
<LpSolit> wow
<ghendricks> XML::Twig allows us to hold not only one bug, but parts of a bug (ie attachments) at a time
<ghendricks> since attachments are usually the biggest part of the bug xml data we can handle them and then dump them out of memory
<LpSolit> nice
<ghendricks> this should allow us to parse much larger files than in the past
<LpSolit> # Read STDIN in slurp mode. VERY dangerous, but we live on the wild side ;-)
<LpSolit> local($/);
<LpSolit> $xml = <>;
<LpSolit> o_O
<LpSolit> what's that?
<ghendricks> # Read STDIN in slurp mode. VERY dangerous, but we live on the wild side ;-)
<LpSolit> :-p
<ghendricks> as per mkanat's requirements in an earlier review
<LpSolit> # remove everything in file before xml header (i.e. remove the mail header)
<LpSolit> $xml =~ s/^.+(<\?xml version.+)$/$1/s;
<LpSolit> ah ok, I don't understand it anyway ;)
<LpSolit> should be .* instead of .+
<LpSolit> else your force to have something before <?xml
<ghendricks> again, I havn't touched that 
<ghendricks> but it seems to work
<LpSolit> I'm surprised
<bkor> if it doesn't find a match, it will not replace and <xml would still be at the start
<bkor> (that is how I see it)
<ghendricks> bkor: true
<LpSolit> hum... all right
<LpSolit> ghendricks: at this point, what is $twig->root ?
<LpSolit> it's <bugzilla> ?
<ghendricks> which point?
<LpSolit> your parse the XML file
<LpSolit> and then call ->root
<LpSolit> you then read its attributes
<ghendricks> yes, bugzilla is the root
<LpSolit> ok
<LpSolit> $log = "Imported $bugtotal bug(s) from $urlbase,\n  sent by $exporter.\n\n";
<LpSolit> ghendricks: $bugtotal is undefined...
<ghendricks> our $bugtotal;
<ghendricks> line 111
<ghendricks>     $bugtotal++;
<ghendricks> line 269
<LpSolit> oh...
<LpSolit> I missed something crucial here
<LpSolit> my $root = $twig->root;
<LpSolit> my $maintainer = $root->{'att'}->{'maintainer'};
<ghendricks> by the time the log line is called it has processed all the bugs
<LpSolit> there is a lot of stuff running between these two lines, right?
<ghendricks> yes
<ghendricks> see discussion above
<LpSolit> calling ->root will start the process?
<ghendricks> actually the process starts with $twig->parse($xml);
<LpSolit> yeah... but I thought that you had to call ->root first, and then pass $root to process_bug or something like that
<ghendricks> no, the root in this area is only used to get the maintaner to pass to the log and mail message
<LpSolit> you mean that init() is called when we parse the tree?
<ghendricks> yes
<ghendricks> and process_bug and process_attachment
<ghendricks> it is very non linear :)
<LpSolit> so ->root is the last thing being processed?
<LpSolit> indeed
<LpSolit> I see XML only briefly when learning Java and XML Path, that's all
<ghendricks> no, root is just a way to access the already processed portion of the tree that i am interested in, namely the <bugzilla> tag
<ghendricks> http://xmltwig.com/
<ghendricks> all documented here of course
<LpSolit> ok
<LpSolit> ghendricks: even if I grant review, I think a second review would be fine
<LpSolit> unless I become a XML expert in a few days ;)
<LpSolit> sub init(){
<LpSolit>     my ($twig, $bugzilla) = @_;
<LpSolit> $bugzilla == "bugzilla" ?
<ghendricks> that is the reason for doing this live review
<LpSolit> sure, else I wouldn't review it, I would have too many questions
<ghendricks> that and there don't seem to be many experts in this field that also happen to be reviewers
<LpSolit> $bugzilla in init() gets the name of the tag, which we know is already "bugzilla", right?
<ghendricks> from cpan:
<LpSolit> wow, I didn't know the exporter had to be in both DB
<LpSolit> on the other had, you have no clue whether the email really comes from the right person
<LpSolit> s/had/hand/
<ghendricks> A hash { expression = \&handler}>. Sets element handlers that are called when the element is open (at the end of the XML::Parser Start handler). The handlers are called with 2 params: the twig and the element. The element is empty at that point, its attributes are created though.
<ghendricks> yes, this was the case before i mucked with it
<ghendricks> it makes sense that at least the person moving the bugs needs to have an account in the bugzilla you are moving to
<LpSolit> I'm fine with that, yes
<LpSolit> but assuming someone is spamming bugzilla-import, you are lost
<ghendricks> never said it wasn't dangerous :)
<LpSolit> the email should contain exporter's crypted password
<ghendricks> in my case, we always run it from the command line
<LpSolit> so that the DB receiving the email can unencrypt it and makes sure it's a valid email
<ghendricks> I agree, that would be a good enhancment
<LpSolit> we should keep it in mind, and open it when your patch is approved
<LpSolit> err.. open a bug for it
<ghendricks> understood :)
<LpSolit> heh :)
<LpSolit> ghendricks: where is the bug going if there is no default product/component?
<LpSolit> shouldn't this be checked in init() too?
<ghendricks> probably wouldn't hurt
<LpSolit> what does your patch actually?
<LpSolit> assuming the old product doesn't exist in this DB
<ghendricks> if the product in the xml is invalid it tries the default
<ghendricks> and makes note of it
<LpSolit> but if the default is missing?
<LpSolit> crash? ignore? random?
<ghendricks> it will exit gracefully
<LpSolit> looks bad
<ghendricks> line 449
<ghendricks> or there abouts
<LpSolit> this should definitely be in init()
<ghendricks> already added :)
<LpSolit> added or moved? ;)
<ghendricks> added, we still should leave the checks in below
<LpSolit> why??
<LpSolit> this check comes far too late anyway
<ghendricks> well, rather I still need to create the objects in the process_bug
<LpSolit> ok, so this is not exactly the same test
<LpSolit> in init(), you only make sure a default product and component are given
<LpSolit> but you don't make sure they really exist?
<ghendricks> so I have added a check in init to see that there is a default param set
<ghendricks> but we need to check below to make sure they are valid product
<LpSolit> indeed
<LpSolit> ok, nice catch
<LpSolit> on the other hand, it doesn't hurt to build these objects in init
<LpSolit> it's only two SQL queries
<LpSolit> hum, only one even
<LpSolit> try to build the component object
<ghendricks> well, I would rather not build them in init and use them in process_bug
<ghendricks> unless i build them in both places
<LpSolit> if the product doesn't exist, it will fail
<LpSolit> that's what I mean, yes
<ghendricks> meaning I don't want to persist the object 
<ghendricks> from init
<LpSolit> if the component doesn't exist for the product, it will fail too
<ghendricks> yes
<LpSolit> that's what I mean too
<LpSolit> this way, you are *really* sure you can fall back to the default values
* LpSolit adds this note to check in the next patch
<ghendricks> added to init:
<ghendricks> my $def_product = new Bugzilla::Product({name => Param("moved-default-product")}); 
<ghendricks>     my $def_component = new Bugzilla::Component({product_id => $def_product->id, 
<ghendricks>                             name => Param("moved-default-component")}) if ($def_product);
<ghendricks>     unless ($def_product && $def_component){
<ghendricks>         Error("no default Product in parameters", "REOPEN"); 
<ghendricks>         Error("no default Component in parameters", "REOPEN"); 
<ghendricks>     }
<ghendricks> oops, only the first Eroor( will get called
<LpSolit> this won't work if the product doesn't exist
<ghendricks> how so?
<bkor> I already commented on that
<ghendricks> not even with the if ($def_product)?
<bkor> ahh.. missed that :)
<LpSolit> $def_product will be undefined and the script will fail because ->id is not a valid method on an undef object
<bkor> rather have a $def_product ? blablabla : undef;
<LpSolit> write: $def_product || Error() before building the component object
<ghendricks> won't the if ($def_product) kick in before it tries to get product_id?
<LpSolit> oh... I didn't see the if() at the end
<LpSolit> not a good idea
<ghendricks> why?
<LpSolit> because I missed it :)
<ghendricks> lol
<LpSolit> a nice
<LpSolit> if ($defproduct) {
<LpSolit>    fooo
<LpSolit> }
<LpSolit> is clearer IMO
<ghendricks> TMTOWTDI
<LpSolit> ?
<LpSolit> or even better, throw the error before calling the component
<LpSolit> this way your error message will really reflect what is missing
<LpSolit> $def_product || Error()
<LpSolit>     my $date = $attach->field('date');
<LpSolit> what's date?
<LpSolit> bzqabot bug 2701
<bzqabot> LpSolit: Bug http://landfill.bugzilla.org/qa220/show_bug.cgi?id=2701 nor, P2, 2.20rc2, LpSolit@gmail.com, ASSI, QA members
<ghendricks> you mean $attach->field('date')?
<LpSolit> yes
<ghendricks> <attachment>
<ghendricks>             <attachid>47644</attachid>
<ghendricks>             <date>2005-09-08 11:55 MDT</date>
<ghendricks>             <desc>diff</desc>
<LpSolit> oh wait, I was looking at <bug> :(
<LpSolit>     # $dbh->sql_date_format does not seem to work for inserting dates
<LpSolit> you can remove "seem to" :-D
<LpSolit>     if (!$date) {
<LpSolit>         my ($timestamp) = Bugzilla->dbh->selectrow_array("SELECT NOW()");
<LpSolit>         $date = $timestamp;
<LpSolit>     }
<LpSolit> write ($date) = Bugzilla->dbh->foo directly
<LpSolit> or even better:
<LpSolit> $date ||= Bugzilla->dbh->foo
<LpSolit> selectrow_array("SELECT NOW()") will return the time in a scalar context, so it's fine without parens
* LpSolit notes this point too
<LpSolit>     $attachment{'ctype'} = $attach->field('type') || "unknown/unknown";
<LpSolit>             <ctype>text/plain</ctype>        
<bkor> LpSolit: mkanat disagrees in bug 282132 on selectrow_array... see last comment
<ghendricks> hmmm
<LpSolit> I don't care, mkanat should read the POD too
<bkor> hehe ok (I was planning on adding it)
* LpSolit notes #3
<LpSolit> ... about ctype vs type
<LpSolit> ghendricks: unless I missed something?
<ghendricks> #3 where?
* ghendricks is confunded
<LpSolit> (01:15:02) LpSolit:     $attachment{'ctype'} = $attach->field('type') || "unknown/unknown";
<LpSolit> (01:15:11) LpSolit:             <ctype>text/plain</ctype>        
<LpSolit> ghendricks: the field is ctype
<ghendricks> oh, sorry, yes, that is from the dtd
<ghendricks> or are you suggesting text/plain be the default?
<LpSolit> no, I just show you which tag was used in my XML file ;)
<ghendricks> well, that was a historical change
<ghendricks> it used to be called <ctype>
<ghendricks> so I continue to use that as the variable name
<LpSolit> for backward compatibility, thank you :)
* LpSolit doesn't see <data> in his XML file
<ghendricks> where did you get this xml file?
<LpSolit> <data> has not been exported ???
<LpSolit> from my own installation
<ghendricks> tip?
<LpSolit> I sent myself a mail
<LpSolit> yes
<ghendricks> does the bug you exported have an attachment?
<LpSolit> ghendricks: I can see it from here though show_bug.cgi?ctype=xml&id=64
<LpSolit> but not when clicking "move this bug"
<ghendricks> did you click the xml link in the bug?
<ghendricks> oh,
<LpSolit> 2 attachments
<LpSolit> am I wrong?
<ghendricks> that is probably becasue we havn't changed move this bug to include attachments yet
<LpSolit> oops
<ghendricks> that is a different bug
<LpSolit> and yes, tip
<LpSolit> sure
<LpSolit>     my @fieldlist = (Bugzilla::Bug::fields(), 'group', 'long_desc', 'attachment');
<LpSolit> from process_bug, line 695
<ghendricks> oooh
<ghendricks> very nice
<LpSolit> attach_data missing
<LpSolit> or something like that
<ghendricks> hmmm, I could use Bugzilla::Bug::fields() in place of the long list of nastiness in process_bug
<LpSolit> wow, sure
* LpSolit notes #4: missing <data>
<LpSolit> \me notes #5: use Bugzilla::Bug::fields
<LpSolit> #4 is a separate bug
<ghendricks> oh, except we have to treat fields that have multiple values differently :(
<ghendricks> hmmm
<ghendricks> wonder how we can work around this
<LpSolit> which fields have multiple values??
<LpSolit> oh, blockers
<LpSolit> cc list
<ghendricks> foreach my $field (qw (dependson cc keywords long_desc blocked attachment flag)) {
<LpSolit> but those are not in Bugzilla::Bug::fields I think 
* LpSolit checks
<ghendricks> cc is
<LpSolit> yes, they are both
<ghendricks> and blocked and depends on
<LpSolit> yes
<ghendricks> wife is calling me for dinner :(
<LpSolit>     $attachment{'filename'} = $attach->field('filename') || "file";
<LpSolit> <filename> missing
<LpSolit> ok
<ghendricks> in the xml?
<LpSolit> yes
<LpSolit>     if (defined ($attach->first_child('data')->{'att'}->{'encoding'}) &&
<LpSolit> why first_child here?
<LpSolit> you have only one <data>?!?!
<ghendricks> umm
<ghendricks> first_child ($optional_condition)
<ghendricks> 
<ghendricks>     Return the first child of the element, or the first child matching the $optional_condition
<ghendricks> from cpan
<LpSolit> yes, I read it too
<ghendricks> I am not sure why they called it first_child
<-- mick_home has quit (Ping timeout)
<ghendricks> it is a misnomer in my mind
<LpSolit> but why not $attach->field('data')
<ghendricks> field is supposed to be the same thing
<ghendricks> but for some reason it does not work for the attribute gathering
<ghendricks> all I know is it works 
* ghendricks shrugs
<LpSolit> ok
<ghendricks> hmmm, filename is difined in dtd
<ghendricks> guess we need to add it to show_bug
<ghendricks> and move
<ghendricks> i try to make it support any xml file that complies with the dtd
<ghendricks> since the xml can be coming from any source, not just move.pl
<LpSolit> I think first_child is an object and can have attributes, field is ... only a field ;)
<ghendricks> that would explain it
<LpSolit>          }
<LpSolit>     else {
<LpSolit> nit:
<LpSolit> }
<LpSolit> else {
<ghendricks> what line?
<LpSolit> 230
<ghendricks> gonna go grab some food
<LpSolit>     $attachment{'flags'} = \@aflags if (@aflags);
<LpSolit> if () is useless IMO
<LpSolit> ok
<LpSolit> ok, so process_attachment() is fine
<LpSolit> now process_bug()
<LpSolit> almost done ;)
<ghendricks> in fact, if you want to continue this later now might be a good place to pause
<LpSolit> ghendricks: how long the pause?
* LpSolit won't be here this week-end
<ghendricks> neither will I
<ghendricks> I will go eat and if you are still on I will keep going
<ghendricks> please feel free to continue to make notes :)
<LpSolit> I will still be here in 30-40 minutes ;)
<ghendricks> k, ttyl
<LpSolit> note to self: there is no direct bug - attachment connection
* LpSolit notes #6

--> ghendrick1 (ghendricks@694A834D.EA3D6700.5FEC0F75.IP) has joined #qa-bugzilla
<ghendrick1> back :)
<LpSolit> cool
<LpSolit> I just posted a patch about missing <data>
<ghendrick1> cool
<LpSolit> the 'encoding' attribute is missing too
<LpSolit> as in 2.22, all data are encoded
<LpSolit> ghendrick1: tell me if you agree with my fix
<LpSolit> bug 315157
<ghendrick1> yes, that looks good
<ghendrick1> did I miss anymore suggestions while I was gone?
<LpSolit> why ghendricks and ghendrick1?
<ghendrick1> logged in at home now :)
<LpSolit> (01:51:41) LpSolit: note to self: there is no direct bug - attachment connection
<LpSolit> (01:52:15) ***LpSolit notes #6
<LpSolit> ok
<ghendrick1> left it up at the office to keep logging
<LpSolit> so, let's go with process_bug
<ghendrick1> k
<LpSolit>       $log .= " $bug->{'att'}->{'error'}\n";
<LpSolit> this doesn't work
<LpSolit> you have to write $log .= $bug->{}->{} . "\n";
<ghendrick1> ???
<LpSolit> myk did this mistake already in Attachment.pm
--> justdave (dave@moz-960424AE.dhcp.aldl.mi.charter.com) has joined #qa-bugzilla
<ghendrick1> ah, gotcha
<ghendrick1> in the double quotes
<LpSolit> yes
<LpSolit> you send only one log at the end of importxml.pl, right?
<LpSolit> when the bug is inacessible, you don't send separate email
<LpSolit> I'm fine with that though
<LpSolit> with my patch applied, I now see attachment data
<LpSolit> ... actually encoded :)
<ghendrick1> yay
<ghendrick1> yes, the log gets sent with the email
<ghendrick1> if debug mode is set you see it on the screen too
* ghendrick1 welcomes justdave
<LpSolit> what's the problem with fields having several values??
<LpSolit> you could use childs are loop on them
<LpSolit> if there is only one value, that's fine too
<ghendrick1> which are you referring to?
<LpSolit> this way, you could use Bugzilla::Bug::fields
<ghendrick1> the cc, and dependson junk?
<LpSolit>     # This list contains all the fields that may have more than one value
<LpSolit>     # Each instance of one of these fields will be concatenated together 
<LpSolit>     # and processed later
<LpSolit>     my %multiple_fields;
<LpSolit>     foreach my $field (qw (dependson cc keywords long_desc blocked attachment flag)) {
<LpSolit>       $multiple_fields{$field} = "x"; 
<LpSolit>     }
<ghendrick1> hmm, this is something I basically left alone from the original code
<LpSolit> we should change it
<LpSolit> well... *you* should ;)
<ghendrick1> The thing is we want to support both the case where i have <cc>name2, name4, name21</cc> and the case where we have <cc>name</cc><ccname</cc>
<ghendrick1> I will have to play with it
<ghendrick1> that would be a good one to note
<LpSolit> ghendrick1: easy, split the content of the field
<justdave> shouldn't it be a <cclist> element that contains a bunch of user objects?
<LpSolit> split /[\s,]+/
<ghendrick1> justdave: this is in the xml
<justdave> of course I'm probably just dreaming :)
<ghendrick1> justdave: I think the dtd allows for both ways that I mentioned
<LpSolit> but not show.xml.tmpl ;)
<ghendrick1> I am not sure which is more kosher in xml
<ghendrick1> seems to me though that each member of the CC list is its own element
<LpSolit> that's the way show.xml works
<ghendrick1> we need to make sure we are consistent
<LpSolit> we are
<ghendrick1> but importxml should handle either way
<LpSolit> is importxml.pl supposed to work between two DBs having different version of bugzilla??
<LpSolit> justdave: ^^^
<ghendrick1> no reason why not
<ghendrick1> you may have to transform the xml in between though
<ghendrick1> justdave: made a comment to that effect in the bug
* LpSolit read none
<LpSolit> ghendrick1: do you want to add a warning in the log if the two versions differ?
<ghendrick1> comment #8
<ghendrick1> wouldn't be a bad idea
<LpSolit> should be done
<LpSolit> as an enhancement... :)
<LpSolit> first let make it work between two installations having the same version :)
<ghendrick1> good idea
<LpSolit> didn't we have a first bug to open already?
<LpSolit> maybe the one justdave is reviewing
<ghendrick1> yes, we had one noted already
<LpSolit> correct, it's the bug I opened a few minutes ago
<LpSolit> there is still <filename> missing though
<LpSolit>       if (defined $multiple_fields{$bugchild->name}) {
<LpSolit>         if (defined $bug_fields{$bugchild->name}) {
<LpSolit>           $bug_fields{$bugchild->name} .= " " .  $bugchild->field;
<LpSolit>         } else {
<LpSolit>           $bug_fields{$bugchild->name} = $bugchild->field;
<LpSolit>         }
<LpSolit>       } elsif (defined $all_fields{$bugchild->name}) {
<LpSolit>         $bug_fields{$bugchild->name} = $bugchild->field;
<LpSolit> this is ugly :-D
<LpSolit> $bug_fields{$bugchild->name} = join(" ", @$bugchild->field) or something equivalent
<LpSolit> I know how to do that
<LpSolit> you should do a loop over each field
<LpSolit> I mean by giving a field name
<LpSolit> this way, my line above will do what you want as you will only get fields you really want to concatene
* LpSolit isn't sure about the spelling
<LpSolit> justdave: you mean in show_bug.cgi?
<LpSolit> justdave: process_bug.cgi requires it
<ghendrick1> hmmm
<LpSolit> ghendrick1: this would prevent a lot of trouble 
<LpSolit> and would be a nice cleanup
<ghendrick1> yes, I think I see what you mean
<ghendrick1> something like:
<ghendrick1> foreach bugchild->field {
<ghendrick1>    foreach bugchild->name 
<ghendrick1>      join field
<LpSolit> no
<ghendrick1> k, then I am not following
<LpSolit> let's say all field names are in @fields
<ghendrick1> ok
<LpSolit> foreach my $field_name (@fields) {
<LpSolit>     $bug_fields{$field_name} = join(" ", @$bugchild->field($field_name));
<LpSolit> }
<justdave> LpSolit: do I mean show_bug.cgi about what?
<LpSolit> justdave: oh sorry, there is another justdave who is sending me emails :-p
<LpSolit> justdave: I thought you were unique ;)
<LpSolit> ghendrick1: this will work, right?
<LpSolit> ghendrick1: with this change s/bugchild/bug/
<LpSolit> ------- Comment #2 from justdave@bugzilla.org  2005-11-04 17:51 PST ------- 
<LpSolit> I'm not sure the attachment data should be included in the default fieldset... 
<LpSolit> I'm sure the bug moving code can specify that it wants it included, right? 
<ghendrick1> LpSolit: I am not sure if bugchild->field($fild_name) will work as an array like that
<LpSolit> ghendrick1: so $bug->children_text()
<LpSolit> ghendrick1: the docs says it returns an array containing the text of children
<ghendrick1> ok, that might work
<ghendrick1> LpSolit: the attachment data should be included on a single bug I think but not on multiple bugs (like from buglist) unless they are all being moved
<ghendrick1> I had a patch for that somewhere
<LpSolit> ghendrick1: I think you shouldn't display the content of unknown fields in the error log
<LpSolit> ghendrick1: you have no idea what it contains
<LpSolit> it's untrustable
<LpSolit> the field name only is fine IMO
* LpSolit has to wake up in 3.5 hours :-/
<ghendrick1> O.O
<LpSolit> :-D
<ghendrick1> why you still here man
<ghendrick1> go sleep
<LpSolit> because I'm in the middle of a review
<ghendrick1> die hard
<ghendrick1> when I whined about not getting reviews faster I didn't mean for anyone to lose sleep :(
<LpSolit> hum..
<LpSolit> or we stop here for now
<LpSolit> there is now all this stuff about comments
<LpSolit> we could try again on Monday if you have time
<LpSolit> I should have some in the evening (European Time)
<LpSolit> heh :)
<ghendrick1> I will be on all day monday :)
* LpSolit now goes to bed
<ghendrick1> thanks for the help
<LpSolit> fine :)
<LpSolit> you welcome
<LpSolit> there is still a lot to review
<LpSolit> it should go faster now
<ghendrick1> yes
--- LpSolit is now known as LpSolit_zzz
* ghendrick1 is impressed with LpSolit_zzz's devotion to the project
* LpSolit_zzz too
<LpSolit_zzz> night!
<ghendricks> Morning
<LpSolit> afternoon
<ghendricks> LpSolit: along with bug 315157 see bug 307328 as well
* LpSolit doesn't know bug number by heart
* ghendricks grumbles at bzqabot
<LpSolit> ssdbot doesn't want to join
<LpSolit> this option has been restricted :(
<ghendricks> Along with attachment data in the bug move code, importxml now handles flags
<ghendricks> bug 307328 exports these
<ghendricks> at least in show_bug
--> karl (akkornel@moz-5EF7C0B4.columbus.res.rr.com) has joined #qa-bugzilla
<LpSolit> ghendricks: I can see them in the email I got when moving bugs
<ghendricks> oh?
<ghendricks> maybe I missed them since we don't use flags in our install
<LpSolit> ghendricks: show.xml.tmpl already has a section about flags
<LpSolit> oh...
<LpSolit> ghendricks: only attachment flags are exported; bug flags are not
<ghendricks> yes, the patch on https://bugzilla.mozilla.org/show_bug.cgi?id=307328 remedies this
<LpSolit> correct
<ghendricks> as well as adds filename to the the attachments
<LpSolit> looks like I will be the one who will review it ;)
<LpSolit> as well as encoding="base64" to attachment data :)
<LpSolit> hum... we are going to conflict
* ghendricks sighs
<ghendricks> story of my life
<LpSolit> lol
<LpSolit> except that my patch adds attachmentdata to the list of exported fields
<LpSolit> ghendricks: we can fix importxml.pl and the show.xml.tmpl independently
<LpSolit> s/the//
<ghendricks> yes

<LpSolit> ghendricks: do we continue our review about importxml.pl?
* LpSolit needs a break
<ghendricks> I am ready when you are
<LpSolit> I am... the review will be my break
<ghendricks> lol
<ghendricks> ok
<LpSolit> but I need the bug number
<ghendricks> 285614
<-- karl has quit (Ping timeout)
* LpSolit takes his notes
<LpSolit> ok
<LpSolit> our last discussion was about laoding data from the XML file
<LpSolit> as I said, it should be rewritten
<ghendricks> I have been trying to rework ikt
<ghendricks> it*
<-- mick_home has quit (Ping timeout)
<LpSolit> ok, myk is reviewing my patch
--> mick_home (chatzilla@moz-1D089401.sip.bct.bellsouth.net) has joined #qa-bugzilla
--> karl (akkornel@moz-5EF7C0B4.columbus.res.rr.com) has joined #qa-bugzilla
* LpSolit has to convince myk to take them for 2.22
<LpSolit> he thinks this is a 2.24 thing
<karl> ?
<karl> (getting dropped off like that is really annoying)
<LpSolit> justdave: ping?
<LpSolit> ghendricks: is importxml.pl actually broken?
<ghendricks> If you mean, does it work? Yes, it works
<ghendricks> we use it every day here
<ghendricks> to import thousands of bugs
<LpSolit> that's not a good argument to have it in 2.22 ;)
<LpSolit> if it's not broken, myk may disagree to take it for 2.22
<ghendricks> oh, you mean does it work as is in 2.20?
<ghendricks> NO
<LpSolit> not 2.20, 2.21.1
<ghendricks> they are the same
<ghendricks> importxml is anyway
<LpSolit> ah :)
<ghendricks> it will not return any errors, but it leaves a lot of data out
<ghendricks> I meant my patch, it works
<ghendricks> or so far it has anyway :)
<LpSolit> I meant without your patches being applied
* LpSolit still tries to convince myk to get it in 2.22
<ghendricks> considering it has not been updated really since before 2.16 I think this is a good reason to get it in 2.20
<LpSolit> the discussion with myk is one of the reason why I'm slow in reviewing your patch ;)
<ghendricks> it has not kept pace with the Schema changes
<LpSolit> 2.20 or 2.22??
<LpSolit> this won't work in 2.20
<ghendricks> I meant 2.22
<LpSolit> Attachment.pm has been rewritten for 2.22 only
<LpSolit> ok
<ghendricks> yes, I know
<justdave> pong
<ghendricks> though we have installed the pm files ontop of our 2.20 installation to get this to work and it seems to do OK
<justdave> It's not a chunk of code that's used by a lot of people, but the few people that due use it will probably be annoyed if it doesn't work.
<LpSolit> justdave: do you agree to consider the rewrite of importxml.pl for 2.22?
<ghendricks> I did request blocking on 2.22 on the bug :)
<justdave> because of how few people use it, even semi-major changes to it would be low-impact
<ghendricks> in our case it is critical for our business
<LpSolit> justdave: we need approval on bugs 307328 and 315157 to have it work correctly
<LpSolit> both patches have r+ on them
<LpSolit> justdave: why cannot I invite ssdbot in the channel? the message says I have to ask an admin?!
--> ssdbot (pid-2866@moz-F54056DC.mozilla.org) has joined #qa-bugzilla
* LpSolit wonders why there is a restriction on who can invite it now
<LpSolit> I could do is freely before
<LpSolit> s/is/it/
<LpSolit> ok, ghendricks, let me do the checkins.. it will be easier to test importxml.pl ;)
<ghendricks> ok
<LpSolit> ghendricks: I wonder if %bug_fields is really useful
<ghendricks> LpSolit: the only reason it is there is to allow us to know whether or not there are additional xml elements that are not designated bug fields in the DTD
<ghendricks> this makes it more friendly for people importing bugs from other systems besides bugzilla
<LpSolit> o_O
<LpSolit> on Friday, I suggested to loop over each field name
<LpSolit> coming from Bugzilla::Bug::fields
<LpSolit> unknown fields would be ignored by doing so
<ghendricks> importxml.pl is much more than just a way to move bugs from one bugzilla install to another
<LpSolit> ah?
<ghendricks> we don't want to ignore them necessarily, we want to make note of them
<ghendricks> you will notice a bit lower in the code that unkonwn fields are appended to the comments in the bug
<ghendricks> that way the data is still captured
<ghendricks> Unknown bug field "foundby" encountered while moving bug
<ghendricks>    <foundby>Other</foundby>
<ghendricks> looks like this ^^^
<ghendricks> we use this to import bugs from our old commercial issue tracking system
<ghendricks> they include a large number of fields that do not exist in bugzilla
<ghendricks> we would be lynched if that data was not captured in the bug somehow during the import :)
<LpSolit> ok, both patches checked in
<LpSolit> the last step is importxml.pl itself
--- karl is now known as karl-afk
<LpSolit> ghendricks: ok, I understand your view point
<ghendricks> I have tried to make this as friendly as possible to importing any bug data that complies with the DTD
<LpSolit> back to the review
<LpSolit> I can wake up later tomorrow ;)
--- LpSolit gives voice to ssdbot
<-- LpSolit has quit (Ping timeout)
--> LpSolit (LpSolit@moz-172A88F9.cust.bluewin.ch) has joined #qa-bugzilla
--- ChanServ sets mode +q #qa-bugzilla LpSolit
--- ChanServ gives channel operator status to LpSolit
<LpSolit> ghendricks: I think you want Debug("Parsing Long Description",DEBUG_LEVEL); out of the foreach my $comment ($bug->children('long_desc')) loop
<ghendricks> This actually allows me to see how many comments are being parsed
<ghendricks> useful for when a problem arises in one of the comments text which causes things to blow up
<ghendricks> not that that should ever happen ;)
<LpSolit> you could use $bug->children_count('long_desc')
<ghendricks> besides, DEBUG_LEVEL is rather verbose anyway
<LpSolit> ok, that's a nit anyway
<ghendricks> that tells me how many comments are in the xml, not which one it choked on
<LpSolit> I don't see where in the code you could have problems ;)
<LpSolit> but ok
<LpSolit>     sub by_date {my @a; my @b; $a->{'bug_when'} cmp $b->{'bug_when'}; }
<LpSolit>     my @sorted_descs = sort by_date @long_descs;
<LpSolit> you could simplify this a bit
<LpSolit> @sorted_descs = grep { $a->{'bug_when'} cmp $b->{'bug_when'} } @long_descs; should work
<LpSolit> hum...
<LpSolit> hashes
<LpSolit> should be $a{'bug_when'} cmp $b{'bug_when'}
<ghendricks> ok
<LpSolit> ghendricks: what's the difference between $long_description and $comments ????
<ghendricks> $long_description is what comes in from the xml, $comments is stuff we are appending as a result of running the script
<ghendricks> in the end $comments will be appended to $long_description
<LpSolit>     for (my $z=0 ; $z <= $#sorted_descs ; $z++) {
<LpSolit>       unless ( $z==0 ) {
<LpSolit>         $long_description .= "$sorted_descs[$z]->{'who'} "; 
<LpSolit>         $long_description .= "$sorted_descs[$z]->{'bug_when'}";
<LpSolit>         $long_description .= " ----";
<LpSolit>         $long_description .= "\n THIS COMMENT IS PRIVATE" if 
<LpSolit>                                             ($sorted_descs[$z]->{'isprivate'});
<LpSolit> ghendricks: these informations are not set for comment 0
<LpSolit> ghendricks: and shouldn't $comments also include the login of the original reporter?
* LpSolit notes # n+1
<LpSolit>     # Each of these fields we will assume is valid. Just quote them and 
<LpSolit>     # shove them onto the array
<LpSolit>     foreach my $field (qw(alias creation_ts status_whiteboard bug_file_loc 
<LpSolit>                            short_desc cclist_accessible reporter_accessible)) {
<LpSolit>         if ( (defined $bug_fields{$field}) && ($bug_fields{$field}) ){
<LpSolit>           push (@query, "$field");
<LpSolit>           push (@values, $bug_fields{$field});
<LpSolit>         }
<LpSolit>     }
<LpSolit> I disagree with this
<LpSolit> you shouldn't trust these values
<LpSolit> ghendricks: please add tests on them
<ghendricks> both these section are untouched from the original
<ghendricks> except the comments
<ghendricks> but I will add tests

<LpSolit> ghendricks: also, if a value is 0 or '', you should include it as well
<LpSolit> ghendricks: some fields in the DB have no default value and PostgreSQL will fail (MySQL falls back to standard default values when none is given; PostgreSQL doesn't)
<ghendricks> oi

<LpSolit> all dates should have a correct format
<ghendricks> LpSolit: yes, how do I do that exactly?
<ghendricks> LpSolit: I have been struggling with that since mysql has a very strict date requirement for timestamp fields
<ghendricks> I am sure pgsql will be different too
<LpSolit> ghendricks: use Bugzilla::Util::format_time()

<ghendricks>         $long_description .= "\n\n\n---- Additional Comments From ";
<ghendricks>         $long_description .= "$sorted_descs[$z]->{'who'} "; 
<ghendricks>         $long_description .= "$sorted_descs[$z]->{'bug_when'}";
<ghendricks>         $long_description .= " ----";
<LpSolit>     eval {
<LpSolit>         $version = Bugzilla::Version::check_version($product, 
<LpSolit>                                                    $bug_fields{'version'});
<LpSolit>     };
<ghendricks> this part does not apply to the intital comment
<ghendricks> I will add the check for private

<LpSolit> replace eval {} by $version = new Bugzilla::Version and check whether the object is undefined instead
<LpSolit> ghendricks: it applies as comment 0 will be attributed to $exporter instead of the original reporter
<ghendricks> hmmm
<ghendricks> description doesn't have a 'who' displayed on the page
<LpSolit>     my @versions = @{$product->versions};
<LpSolit>     if (!$@){
<LpSolit>       push (@values, $version->name );
<LpSolit>       push (@query, "version");
<LpSolit>     } else {
<LpSolit>       my $v = $versions[0];
<LpSolit>       push (@query, "version");
<LpSolit>       push (@values, $v->name);
<LpSolit> push (@query, "version"); is common to both blocks => move it out of them
<LpSolit> and my @versions = @{$product->versions}; is only used in the second block => move it there

<LpSolit> ghendricks: for severity, priority, OS and platform, use Bugzilla::Field::check_form_field()
<ghendricks> ooh, we can get rid of more @legal_shiznit?

<LpSolit> ghendricks: severity is missing!!
<LpSolit> ghendricks: you only checked and included the other three
<ghendricks> if (defined ($bug_fields{'bug_severity'}) &&
<ghendricks>          (my @severity= grep(lc($_) eq lc($bug_fields{'bug_severity'}), 
<ghendricks>                              @::legal_severity)) ){
<ghendricks>       push (@values, $severity[0]) ;
<ghendricks>       push (@query, "bug_severity");
<LpSolit> where is that?
<ghendricks> further down
<LpSolit> oh I see
<LpSolit> move it with other default fields
<ghendricks> hmm, I never noticed that they were seperated like that 
<LpSolit> that's pretty confusing
<LpSolit> there is no reason to let it separate

<LpSolit>     if (Param("timetrackinggroup")) {
<LpSolit>         my $date = $bug_fields{'deadline'};
<LpSolit>         if ($date) {
<LpSolit>             $date = str2time($date);
<LpSolit>             $date = time2str("%Y-%m-%d %X", $date);
<LpSolit>         }
<LpSolit> ghendricks: again, use Bugzilla::Util::format_time()
<LpSolit>         }
<LpSolit>         else { $date = "NULL";}
<LpSolit> arghh....
<LpSolit> ghendricks: 'datetime' + undef please
<LpSolit> estimated_time + remaining_time have to be checked too
<ghendricks> you mean date = undef?
<LpSolit>         push(@values, undef);
<LpSolit>         push(@query, "deadline");
<LpSolit> ghendricks: the answer is 'no'

<LpSolit> also, write
<LpSolit> }
<LpSolit> else (foo) {
<LpSolit>     ...
<LpSolit> }
<LpSolit> instead of all this on a single line
<ghendricks> ok, I was just using the style that was in the file
<LpSolit> some reviewers could r- due to that
<ghendricks> yes, I have met a few :)

<LpSolit> heh
<LpSolit> remove *all* eval {} !!
<LpSolit>       eval{
<LpSolit>           $milestone = Bugzilla::Milestone::check_milestone($product, 
<LpSolit>                                           $bug_fields{'target_milestone'});
<LpSolit>       };
<LpSolit> if you don't want an error to be thrown, don't call a function which throws one ;)

<LpSolit>         $err .= "\" in product \"". $product->name ."\".\n";
<LpSolit> you could use single quotes 'foo' instead of \"
<LpSolit> this would make these lines easier to read

<LpSolit>     my $reporterid = login_to_id($bug_fields{'reporter'});
<LpSolit>     if ( ($bug_fields{'reporter'}) && ( $reporterid ) ) {
<LpSolit> if $bug_fields{'reporter'} is undefined, login_to_id() will complain
<LpSolit> ghendricks: and this last line can be written: if ($reporterid)
<LpSolit> ghendricks: if $reporter is is not null, then we already know $bug_fields{'reporter'} is defined ;)
<LpSolit> ghendricks: so need to check both
<LpSolit> ghendricks: even better: if (($bug_fields{'reporter'}) && login_to_id(...))
<LpSolit> hum... maybe not that good
<LpSolit> you would have to call login_to_id twice
<LpSolit> (problem seen for the assignee)

<LpSolit> ghendricks: move the QA contact section close to the assignee one
<LpSolit> ghendricks: same thing for resolution and status: use Bugzilla::Field::check_form_field()
<ghendricks> @sorted_descs = grep { $a->{'bug_when'} cmp $b->{'bug_when'} } @long_descs;
<ghendricks> ^^^ this didn't seem to work :(
<ghendricks> lose all comments
<LpSolit> that's what I said
<LpSolit> $a{'bug_when'} instead of $a->{'bug_when'}
<ghendricks> oh
<ghendricks> woops
<ghendricks> hmmm, Global symbol "%a" requires explicit package name at ./importxml.pl line 390
<LpSolit> ghendricks: wrong
<LpSolit> ghendricks: s/grep/sort/
<ghendricks> lol
<ghendricks> I should have caught that one ;)
<ghendricks> still get the error though

<LpSolit>     if ( ($changed_owner) && (!$resolution[0]) ) {
<LpSolit>       push (@values, "NEW");
<LpSolit> I don't understand !$resolution[0] not why you confirm UNCONFIRMED bugs
<LpSolit> s/not/nor/
<LpSolit> ghendricks: the %a error doesn't make sense
<LpSolit> ghendricks: it's understood by _sort_
<ghendricks> that is what I thought
<LpSolit> the part which handles bug status looks awful
<ghendricks> but you can try for yourself
<LpSolit> needs some work
<LpSolit> especially because all UNCO bugs are now mark as NEW
<LpSolit>     unless ($mail){
<LpSolit>         push @query, "lastdiffed";
<LpSolit>         push @values, "now()"; 
<LpSolit>     }
<LpSolit> you should use the same time as delta_ts
<LpSolit>     if ($bug_fields{'bug_status'} eq 'UNCONFIRMED'){
<LpSolit>         push @query, "everconfirmed";
<LpSolit>         push @values, 0;
<LpSolit>     }
<LpSolit>     else{
<LpSolit>         push @query, "everconfirmed";
<LpSolit>         push @values, 1;
<LpSolit>     }
<LpSolit> ghendricks: so all closed bugs are now confirmed???
<LpSolit> ghendricks: even INVALID ones?? :(
<LpSolit> ghendricks: everconfirmed should be exported
<ghendricks> LpSolit: sounds good to me
<ghendricks> let's get it into the dtd
<ghendricks> I don't really understand unconfirmed bugs since we do not use votes or UNCONFIRMED status here
<ghendricks> all our bugs come in as new
<LpSolit> that's something we certainly don't want on b.m.o
<ghendricks> I thought though, that if a bug has a resolution it would be confirmed as INVALID
<ghendricks> or whatever the resolution is

<LpSolit>     my $query  = "INSERT INTO bugs (\n" 
<LpSolit>                  . join (",\n", @query)
<LpSolit>                  . "\n) VALUES (\n";
<LpSolit>                  $query .= '?,' foreach (@values);
<LpSolit> I'm not a fan of \n in a SQL query
<ghendricks> again, already there
<ghendricks> it does come in handy for printing :)
<ghendricks> so explain something to me,
<LpSolit> ghendricks: chop($query); should not be used
<LpSolit> ghendricks: use chomp() instead

<ghendricks> if a bug gets enough votes it is automatically moved into the confirmed state, correct?
<LpSolit> correct
<ghendricks> and everconfirmed gets set to 1 in the DB?
<LpSolit> correct
<ghendricks> and if it is then manually moved to the UNCONFIRMED state does everconfirmed go back to 0?
<bkor> LpSolit: chop is used to strip the extra ","
<ghendricks> I thought chomp only removed white space
<LpSolit> ghendricks: no about NEW back to UNCO
<ghendricks> so everconfirmed, once set to 1 is always 1
<ghendricks> why do we care?

<LpSolit> bkor: I see. this query should be rewritten by using map() instead. r- due to that ;)
<LpSolit> ghendricks: a confirmed bug can never go back to the UNCO state
<LpSolit> ghendricks: everconfirmed is used so that when you reopen a bug, you know if you should mark it as REOPENED or UNCO

<bkor> LpSolit: ahh, you saw I suggested it, I don't care if it isn't perfect.. I stole it from ./contrib/bzdbcopy.pl ;-)
* ghendricks is still just a budding perl hack and does not yet know all the great depths of perl wisdom
* ghendricks has an account on perl monks
* ghendricks is listed as a novice
* ghendricks likes Perl because Perl's mantra is "TMTOWTDI"

<LpSolit>       foreach my $person (split(/[ ,]/, $bug_fields{'cc'})) {
<LpSolit>         my $uid;
<LpSolit>         if ( ($person ne "") && ($uid = login_to_id($person)) ) {
<LpSolit>           $sth_cc->execute($id, $uid);
<LpSolit> be sure you never have duplicates
<ghendricks> ???
<LpSolit> try to insert twice the same users
<ghendricks> no, I know, but I thought I had written a code block to handle that
<LpSolit> remember that you have no clue whether the email is valid
<LpSolit>       foreach my $i (sort(@keywords)){
<LpSolit>         $keywordstring .= "$i, ";
<LpSolit>       }
<LpSolit>       chop($keywordstring);
<LpSolit>       chop($keywordstring);
<LpSolit>       $dbh->do("UPDATE bugs SET keywords = ? WHERE bug_id = ?", 
<LpSolit>                 undef, $keywordstring, $id)
<LpSolit>     }
<ghendricks> shouldn't login_to_id take care of that for me?
<LpSolit> why chop() twice??
<LpSolit> anyway, ugly hack
<ghendricks> $keywordstring .= "$i, ";
<ghendricks> I agree it is ugly
<LpSolit> ugly means r- in most cases :-p
<ghendricks> but this is the best way I could come up with to make sure that keyword cache is correct
<ghendricks> I think that keyword cache is ugly
<ghendricks> any time you have data stored in two separate locations is bad
<LpSolit> use join(), map(), grep(), sort(), but not such way of writing queries ;)
<ghendricks> If you have a better suggestion, I am very open to it :)

<LpSolit>     my ($flagid) = $dbh->selectrow_array("SELECT id FROM flagtypes
<LpSolit>                                           WHERE name = ?", undef,
<LpSolit>                                           $bflag->{'att'}->{'name'});
<LpSolit> this doesn't work;
<LpSolit> flag type names are not unique
<LpSolit> there are at least 5-6 different "review" flag types on b.m.o
<ghendricks> I didn't realize that was allowed
<LpSolit> moreover, you don't check whether the flag applies for the given product/component
<LpSolit> you should check flagexclusions and flaginusions
<ghendricks> ok
<LpSolit> flaginclusions*
<ghendricks> flags are another thing we have never used 
<ghendricks> though we are starting to get requests for them
<LpSolit> also, you have to make sure the requester and requestee, if they exist, are in the right groups
<ghendricks> guess I have to do more homework :P
<LpSolit> i.e. flagtypes.request_group and flagtypes.grant_group

<LpSolit> for security, you should use attachment IDs when inserting them, instead of adding everything which is in @attachments
<LpSolit>       my $att_data = $att->{'data'};
<LpSolit>       $dbh->do("INSERT INTO attach_data (id, thedata) 
<LpSolit>                 VALUES (?, ?)", undef, $att_id, $att_data);
<LpSolit> I think this doesn't work, at least on PostgreSQL
<LpSolit> you should use something like (taken from attachment.cgi):
<LpSolit>   $sth = $dbh->prepare("INSERT INTO attach_data
<LpSolit>                            (id, thedata) VALUES ($attachid, ?)");
<LpSolit>   trick_taint($data);
<LpSolit>   $sth->bind_param(1, $data, $dbh->BLOB_TYPE);
<LpSolit>   $sth->execute();

<LpSolit> attachment flags are also incorrectly validated
<ghendricks> how do you know what $attachid is before you insert it?
<ghendricks> oh, this is the attachment data table
<LpSolit> yes
<LpSolit> the attach ID will change when inserting it, but at least you know it was really in the bug
<ghendricks> but I do use the ID
<ghendricks> you mean use the attachment ID from the xml in the insert?
<ghendricks> will that work?
<LpSolit> no, I mean use the attachment ID from the XML to make sure the attachment you are going to attach is really in the bug you are considering
<LpSolit> of course, it will receive a new ID when being inserted into the DB

<LpSolit> done! review complete! :)
<ghendricks> so you mean parse the long description for the created an attachment text and look for the ID?
<ghendricks> I don't understand how you can know whether the attachemnt was part of the original bug
<LpSolit> use <bug><attachment><attachid>
<ghendricks> oh, I think I see
<-- LpSolit has quit (Ping timeout)
--> LpSolit (LpSolit@moz-172A88F9.cust.bluewin.ch) has joined #qa-bugzilla
--- ChanServ sets mode +q #qa-bugzilla LpSolit
--- ChanServ gives channel operator status to LpSolit
<-- mick_home has quit (Ping timeout)
<ghendricks> LpSolit: thanks for taking the time
<ghendricks> I'll get on these
<LpSolit> ghendricks: you welcome :)
<LpSolit> ok
<LpSolit> ghendricks: IMO you are lost about flags
Attachment #201880 - Attachment is obsolete: true
Attachment #203077 - Flags: review?
Attachment #203077 - Flags: review? → review?(LpSolit)
Blocks: 303691
Yes, this is a blocker. As far as I know, importxml.pl is totally broken and it should be fixed for 2.22.
Flags: blocking2.22? → blocking2.22+
Comment on attachment 203077 [details] [diff] [review]
Rewrite v4

>Index: importxml.pl

>@@ -1,4 +1,4 @@
>-#!/usr/bin/perl -w 
>+#!/usr/bin/perl -w -T

Nit: write -wT


>@@ -58,26 +69,61 @@

>+use Data::Dumper;
>+$Data::Dumper::Useqq = 1;
>+use warnings;

'use warnings' is useless as you already have set the -w flag.


>+my $dbh = Bugzilla->dbh;
>+my ($timestamp) = Bugzilla->dbh->selectrow_array("SELECT NOW()");

Nit: you have defined $dbh; use it! my ($timestamp) = $dbh->selectrow_array("SELECT NOW()");



>@@ -85,25 +131,12 @@

> sub MailMessage {

> my $to = join (", ", @recipients);

Don't you have to use $to =~ s/@/\@/g; here too?


>@@ -115,517 +148,950 @@

>+sub flag_handler {
>+    my ($type, $name, $status, $setter_login, $requestee_login, 
>+        $exporterid, $bugid, $productid, $componentid, $attachid) = @_;

If $attachid is defined, then it's an attachment flag, else a bug flag. So having both $type and $attachid doesn't seem required to me. But from what I see, you use it later to make sure than an attachment has an ID, else you reject it. Maybe it worths a comment explaining why you have both parameters. Else you could use something like:

my $type = ($attachid) ? "attachment" : "bug";


>+    my $setter = new Bugzilla::User(login_to_id($setter_login));

Write: Bugzilla::User->new_from_login($setter_login);


>+    if (defined($requestee_login)){
>+        $requestee = new Bugzilla::User(login_to_id($requestee_login));

Same remark here.


>+        if ($requestee->id == 0){
>+            $err  = "Invalid requestee $requestee_login on $type flag $name\n";
>+            $err .= "   Requesting from the wind.\n";
>+        }
>+        $requestee_id = $requestee->id;
>+    }

Two things:
1) the test must be ($requestee && $requestee->id) in case the user object is undefined.
2) you forgot an "else" enclosing "$requestee_id = $requestee->id;". You only set $requestee_id *if* $requestee->id exists.


>+    if ($attachid){
>+        $flag_types = Bugzilla::FlagType::match(
>+            {'target_type'  => 'attachment',
>+             'product_id'   => $productid, 
>+             'component_id' => $componentid });
>+        
>+    }

What happens if the attachment is in the insidergroup but $exporter is not in this group?


>+    else {
>+        my $bug = new Bugzilla::Bug($bugid, $exporterid);
>+        $flag_types = $bug->flag_types;
>+                        
>+    }

If the bug is not accessible or has no flag types available, you should return from here.


>+    my @flag_types = @{$flag_types};
>+    my @type_names;
>+    my $j;
>+    foreach my $type (@{$flag_types}){
>+        push @type_names, $type->{'name'};
>+    }
>+    $j = lsearch(\@type_names, $name);

@type_names and $j are useless. Extract the correct type from the loop directly and then use $type->{'foo'} instead of $flag_types[$j]->{'foo'}.


>+        if ($grant_gid){
>+            unless ($setter->in_group_id($grant_gid)){

Wrong! Check grant_id *only* if the flag is set to "+" or "-".


>+        if ($requestee){
>+            my $request_gid = $flag_types[$j]->{'request_gid'};
>+            if ($request_gid){
>+                unless ($requestee->in_group_id($request_gid)){

Wrong here too! You have to check the setter of the flag, not the requestee. Moreover, do it only if the flag is set to "?". If the bug is restricted to some groups, you also have to make sure that the requestee is allowed to access this bug: $requestee->can_see_bug($bugid).

If the attachment was in the insidergroup, you also have to make sure the requestee can access this attachment (and that the setter is allowed to add a private attachment).


>+        # Check that we are not trying to insert a bug flag into an attchment type
>+        # and visa versa
>+        my $t_type = $flag_types[$j]->{'target_type'};
>+        if ($t_type eq $type){

This test doesn't make sense. $flag_types has been built correctly already, see lines 203 and 210. So we already know it's ok.


>+            $dbh->do("INSERT INTO flags 
>+                    (id, type_id, status, bug_id, attach_id, creation_date, 
>+                    setter_id, requestee_id, is_active)
>+                    VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)", undef, 
>+                    ++$fid, $ftypeid, $status, $bugid, $attachid, $timestamp,
>+                    $setter_id, $requestee_id, $is_active);

Wrong! $flag_type->{'is_active'} has nothing to do with $flag->{'is_active'}. If you insert a new flag, then this flag is active. Set it to "1" for all new flags.


>+sub init(){
>+    my ($twig, $bugzilla) = @_;
>+    my $root = $twig->root;
>+    my $maintainer = $root->{'att'}->{'maintainer'};
>+    my $exporter = $root->{'att'}->{'exporter'};
>+    my $urlbase= $root->{'att'}->{'urlbase'};

Nit: I think that a warning about the version of Bugzilla this XML is coming from would be fine.


>+sub process_attachment()  {

>+    $attachment{'ispatch'} = $attach->{'att'}->{'ispatch'};
>+    $attachment{'isobsolete'} = $attach->{'att'}->{'isobsolete'} || 0;
>+    $attachment{'isprivate'} = $attach->{'att'}->{'isprivate'} || 0; 

Nit: || 0 for 'ispatch' too.


>+sub process_bug {    

>+      } elsif ( $bug->{'att'}->{'error'} =~ /NotPermitted/) {
>+        $log .= "$exporter tried to move bug ". $bug->field ."('bug_id') here";

Must be: $bug->field('bug_id') . " here";


>+    foreach my $field (qw (long_desc attachment flag everconfirmed), 
>+                           Bugzilla::Bug::fields()) {
>+      $all_fields{$field} = "x"; 
>+    }

Nit: we usually write 1 instead of "x".


>+    foreach my $bugchild ($bug->children()){
>+      Debug("Parsing field: ". $bugchild->name, DEBUG_LEVEL);
>+      if (defined $all_fields{$bugchild->name}) {
>+        if ($bug->children_count($bugchild->name) > 1){
>+          $bug_fields{$bugchild->name} = join(" ", $bug->children_text($bugchild->name));
>+        }
>+        else{
>+          $bug_fields{$bugchild->name} = $bug->children_text($bugchild->name);
>+        }
>+      } 

Use join() in all cases, even if the field has only one child. This makes the code easier to read.


>+    Debug($err, ERR_LEVEL);
>+    my @long_descs;
>+    my $private = 0;
>+    # Parse long descriptions

You call Debug() even if no error is returned??? (line 446)


>+    sub by_date {my @a; my @b; $a->{'bug_when'} cmp $b->{'bug_when'}; }
>+    my @sorted_descs = sort by_date @long_descs;

Nit: Don't use the function 'by_date'. Write:
my @sorted_descs = sort {$a->{'bug_when'} cmp $b->{'bug_when'}} @long_descs;


>+    my $long_description = "";
>+    for (my $z=0 ; $z <= $#sorted_descs ; $z++) {
>+      if ( $z==0 ) {
>+        $long_description .= "\n\n\n---- Reported by ";
>+        $long_description .= "$sorted_descs[$z]->{'who'} "; 
>+        $long_description .= "$sorted_descs[$z]->{'bug_when'}";
>+        $long_description .= " ----";
>+        $long_description .= "\n\n";
>       }
>+      else {
>+        $long_description .= "\n\n\n---- Additional Comments From ";
>+        $long_description .= "$sorted_descs[$z]->{'who'} "; 
>+        $long_description .= "$sorted_descs[$z]->{'bug_when'}";
>+        $long_description .= " ----";
>+        $long_description .= "\n\n";
>+      }

There is a lot of code in common. Only keep "Reported by" and "Additional Comments From" in the test and move the rest outside.


>+    if (defined $bug_fields{'dependson'}) {
>+      $comments .= "Thiss bug depended on bug(s) $bug_fields{'dependson'}.\n";
>+    }

Nit: s/Thiss/This/


>+    foreach my $field (qw(status_whiteboard bug_file_loc short_desc)) {
>+        if ( (defined $bug_fields{$field}) && ($bug_fields{$field}) ){
>+          $bug_fields{$field} =~ s/\n/ /g;

Use Bugzilla::Util::clean_text(), see bug 238780.


>+          push (@query, "$field");
>+          push (@values, $bug_fields{$field});

Nit: no need to add quotes.


>+    # Bug Acess
>+    push (@query, "cclist_accessible");
>+    push (@values, $bug_fields{'cclist_accessible'} == 1 ? 1 : 0);

Nit: s/Acess/Access/


>+    if (defined $bug_fields{'component'}){
>+      $component = new Bugzilla::Component({product_id => $product->id, 
>+                                            name => $bug_fields{'component'}});
>+      unless ($component){ 
>+        $component = $def_component; 
>+        $product = $def_product;
>+        $err .= "Unknown Component ". $bug_fields{'component'} ."\n";
>+        $err .= "   Using default product and component set in Parameters \n";
>+      }
>+    } else{
>+      $component = $def_component; 
>+    }    

If the product given in the XML file exists but not the component, both the default product and component should be used, instead of the component alone. Else you could have the wrong component associated with your product.


>+    # Milestone
>+    if (Param("usetargetmilestone")) {
>+      my $milestone = new Bugzilla::Milestone($product->id, $bug_fields{'target_milestone'});
>+      if ($milestone){
>+        push (@values, $milestone->name) ;
>+        push (@query, "target_milestone");
>+      }
>+      else{
>+        push (@values, $product->default_milestone);
>+        push (@query, "target_milestone");

Nit: move push (@query, "target_milestone"); out of the IF block. You have it in all cases.


>+    # For priority, severity, opsys and platform we check that the one being
>+    # imported is valid. If it is not we use the defaults set in the parameters.
>+    # We do this by greping the value in the list of legal_priorities and 
>+    # pushing that onto an array.
>+    # If it is valid it will be the first one on the array otherwise we defult
>+    # to the first valid one in the system

I really don't understand this comment. It's either obsolete now that you use check_field() or it needs to be clearer.


>+    if (defined ($bug_fields{'bug_severity'}) 
>+        && check_field('bug_severity', scalar $bug_fields{'bug_severity'}, 
>+           \@::legal_severity, ERR_LEVEL)){

Note that Bugzilla::Field::check_field() has not been rewritten yet (my patch is still awaiting review). Maybe should you copy this routine in importxml.pl for now, and we will remove it as soon as bug 315605 lands.


>+    # Process time fields
>+    if (Param("timetrackinggroup")) {
>+        my $date = format_time($bug_fields{'deadline'}, "%Y-%m-%d") || undef;
>+        push(@values, $bug_fields{'estimated_time'});
>+        push(@query, "estimated_time");
>+        push(@values, $bug_fields{'remaining_time'});

You have to validate these fields. Use Bugzilla::Bug::ValidateTime(). 'actual_time' should be checked here too (see line 988).


>+    my @resolution;
>+    if (defined ($bug_fields{'resolution'}) &&
>+         (@resolution= grep(lc($_) eq lc($bug_fields{'resolution'}), 
>+             @::legal_resolution)) ){

Use check_field().


>+    # if the bug's assignee changed, mark the bug NEW, unless a valid 
>+    # resolution is set, which indicates that the bug should be closed.
>+    #
>+    if ( ($changed_owner) && (!$resolution[0]) ) {
>+      push (@values, "NEW");
>+      push (@query, "bug_status");
>+      $err .= "Bug reassigned, setting status to \"NEW\".\n";

Wrong! The bug could be UNCONFIRMED too. You should first check 'everconfirmed' and then use IsOpenenedState() to see if the bug is opened or not. And based on it, you set the resolution accordingly. If the resolution is unknown or invalid, set it to MOVED.


>+    } elsif (defined ($bug_fields{'bug_status'}) &&
>+         (my @status = grep(lc($_) eq lc($bug_fields{'bug_status'}), 
>+             @::legal_bug_status)) ){

Use check_field(). This whole block needs to be rewritten, see my previous comment.


>+    # Voting and everconfirmed
>+    my $everconfirmed;
>+    if (Param("usevotes")){
>+        $everconfirmed = $bug_fields{'everconfirmed'} || 0;
>+        if ($bug_fields{'bug_status'} eq 'UNCONFIRMED' && $everconfirmed ){
>+            $bug_fields{'bug_status'} = "NEW"; 
>+            $err .= "Bug Status was UNCONFIRMED but everconfirmed was true\n";
>+            $err .= "   Setting status to NEW\n";
>+            $err .= "Resetting votes to 0\n" if ($bug_fields{'votes'});
>+        }
>+    }
>+    else{    
>+        $everconfirmed = 1;
>+    }

Wrong! We don't care about 'usevotes'. The important parameter is 'votestoconfirm' (see post_bug.cgi).


>+    # Handle CC's
>+    if (defined $bug_fields{'cc'}) {
>+      my %ccseen;
>+      my $sth_cc = $dbh->prepare("INSERT INTO cc (bug_id, who) VALUES (?,?)"); 
>+      foreach my $person (split(/[ ,]/, $bug_fields{'cc'})) {
>+        my $uid;
>+        if ( ($person ne "") && ($uid = login_to_id($person)) ) {

The regexp should be /[\s,]+/. And you should add "next unless $person;" right after the "foreach" line. This way, you don't have to care about this case anymore and this simplify the loop.


>+      foreach my $keyword (split(/[\s,]+/, $bug_fields{'keywords'})) {
>+        if ($keyword eq '') {
>+          next;
>+        }
>+        my $i = $::keywordsbyname{lc($keyword)};

Use GetKeywordIdFromName().


>+      my ($keywordarray) = $dbh->selectcol_arrayref(
>+                 "SELECT d.name FROM keyworddefs d, keywords k 
>+                         WHERE d.id = k.keywordid 
>+                         AND k.bug_id = ? 
>+                         ORDER BY d.name", undef, $id); 

Write FROM foo INNER JOIN bar ON foo.a = bar.b WHERE baz


>+    # Parse bug flags
>+    foreach my $bflag ($bug->children('flag')){
>+        next unless (defined ($bflag));
>+        $err .= flag_handler("bug", $bflag->{'att'}->{'name'}, $bflag->{'att'}->{'status'}, 
>+                      $bflag->{'att'}->{'setter'}, $bflag->{'att'}->{'requestee'}, 
>+                      $exporterid, $id, $comp_id, $prod_id, undef);
>+    }

As I said, if $attachid is undefined, we already know that $type eq "bug". IMO, you should make this test *before* calling flag_handler(), i.e. if a bug has an $attachid, there is a problem and don't call flag_handler(); if an attachment has no $attachid, there is a problem too and don't call flag_handler().


>+    # Insert Attachments for the bug
>+    foreach my $att (@attachments){
>+      if ($att->{'attachid'}){

This check should be done *before* populating @attachments! This way, you are sure that all attachments in @attachments are valid.

Nit: also, please use an indentation of 4.


>+    my $worktime = $bug_fields{'actual_time'} || 0.0;
>+    $dbh->do("INSERT INTO longdescs 
>+                   (bug_id, who, bug_when, work_time, isprivate, thetext) 
>+            VALUES (?,?,?,?,?,?)", undef, 
>+            $id, $exporterid, $timestamp, $worktime, $private, $long_description );

'actual_time' should be validated at line 725, see above.


>+    # Add this bug to each group of which its product is a member.
>+    foreach my $group_id (keys %{$product->group_controls}) {
>+      $dbh->do(
>+        "INSERT INTO bug_group_map (bug_id, group_id) " .
>+        "VALUES (?, ?)",
>+        undef, $id, $group_id);
>+    }

Nit: you should $dbh->prepare() the SQL query outside the loop and then use $dbh->execute().


>+$log .= "Imported $bugtotal bug(s) from $urlbase,\n  sent by $exporter.\n\n";
>+my $subject = "$bugtotal Bug(s) successfully moved from $urlbase to " 
>                . Param("urlbase") ;

Nit: are you sure that all $bugtotal bugs have been successfully moved? ;)


At some point, probably in 2.24, we should move the code from post_bug.cgi into Bug.pm and use it here. A lot of code is duplicated in post_bug.cgi and importxml.pl and some validations are less rigorous than in post_bug.cgi, especially about flags. But at least, importxml.pl should now work, even with duplicated code. Let's go step by step. ;)
Attachment #203077 - Flags: review?(LpSolit) → review-
Attached patch Rewrite v5 (obsolete) — Splinter Review
(In reply to comment #25)

> >+#!/usr/bin/perl -w -T
> 
> Nit: write -wT
> 
For some reason this always produces a "Too late for -T" error and won't let me continue.

> > sub MailMessage {
> 
> > my $to = join (", ", @recipients);
> 
> Don't you have to use $to =~ s/@/\@/g; here too?
> 
No, since this regexp is being use to prevent Perl from thinking it is an array

> >+    if ($attachid){
> >+        $flag_types = Bugzilla::FlagType::match(
> >+            {'target_type'  => 'attachment',
> >+             'product_id'   => $productid, 
> >+             'component_id' => $componentid });
> >+        
> >+    }
> 
> What happens if the attachment is in the insidergroup but $exporter is not in
> this group?
> 
That can never happen since the exporter would not even export the attachment in the xml since he can't see it on his installation. see show.xml.tmpl

> >+    sub by_date {my @a; my @b; $a->{'bug_when'} cmp $b->{'bug_when'}; }
> >+    my @sorted_descs = sort by_date @long_descs;
> 
> Nit: Don't use the function 'by_date'. Write:
> my @sorted_descs = sort {$a->{'bug_when'} cmp $b->{'bug_when'}} @long_descs;
> 
I tried this and it keeps giving me an error saying %a is not defined or some such. See comment 23.


> >+    if (defined $bug_fields{'component'}){
> >+      $component = new Bugzilla::Component({product_id => $product->id, 
> >+                                            name => $bug_fields{'component'}});
> >+      unless ($component){ 
> >+        $component = $def_component; 
> >+        $product = $def_product;
> >+        $err .= "Unknown Component ". $bug_fields{'component'} ."\n";
> >+        $err .= "   Using default product and component set in Parameters \n";
> >+      }
> >+    } else{
> >+      $component = $def_component; 
> >+    }    
> 
> If the product given in the XML file exists but not the component, both the
> default product and component should be used, instead of the component alone.
> Else you could have the wrong component associated with your product.
> 
Ummm, it does ;) ($product = $def_product;)

> >+    # Parse bug flags
> >+    foreach my $bflag ($bug->children('flag')){
> >+        next unless (defined ($bflag));
> >+        $err .= flag_handler("bug", $bflag->{'att'}->{'name'}, $bflag->{'att'}->{'status'}, 
> >+                      $bflag->{'att'}->{'setter'}, $bflag->{'att'}->{'requestee'}, 
> >+                      $exporterid, $id, $comp_id, $prod_id, undef);
> >+    }
> 
> As I said, if $attachid is undefined, we already know that $type eq "bug". IMO,
> you should make this test *before* calling flag_handler(), i.e. if a bug has an
> $attachid, there is a problem and don't call flag_handler(); if an attachment
> has no $attachid, there is a problem too and don't call flag_handler().
> 
Well, since $attachid is the id coming from the database after we just inserted it I don't see the point in checking that it exists, since it should always. I did add checks above to make sure there is an attach_id in the xml and only do this if there is.
Attachment #203077 - Attachment is obsolete: true
Attachment #206153 - Flags: review?(LpSolit)
>> For some reason this always produces a "Too late for -T" error and won't let me
continue.

This usually happens on Windows. You need to give the -T flag in the perl command line as well (as an argument). In Linux, the first line of the script is actually the command line, but there are Windows configurations that hard-code the command line to "perl.exe", without reading the first line of the file.

http://www.bugzilla.org/docs/win32install.html has more resources; practically you need to add -T to the perl command call that gets executed when your server needs to serve .cgi files.
(In reply to comment #27)

> This usually happens on Windows. You need to give the -T flag in the perl
> command line as well (as an argument). In Linux, the first line of the script
> is actually the command line, but there are Windows configurations that
> hard-code the command line to "perl.exe", without reading the first line of the
> file.

This _IS_ in linux. The strange thing is that it only recently started doing this. 
The original code was -w -T and that worked. When I switched it to -wT it quit compiling (perl -c produced the too late warning and would quit) putting it back the way it was (-w -T) didn't work either. Only removing the -T would the compiler actually continue compiling. I am not sure what is up with that.
(In reply to comment #28)
> This _IS_ in linux. The strange thing is that it only recently started doing
> this. 
> The original code was -w -T and that worked. When I switched it to -wT it quit
> compiling (perl -c produced the too late warning and would quit) putting it
> back the way it was (-w -T) didn't work either. Only removing the -T would the
> compiler actually continue compiling. I am not sure what is up with that.

Need to specify the -T (and the -w) on the command line when you test-compile from the command line as well.

perl -cwT filename
(In reply to comment #29)

Just to clarify, the #! line in a perl script (even on Linux) is only used to invoke the script it you call it directly on the command line (i.e. ./filename) instead of putting "perl" in front of it.  If you pass the filename as an argument to perl, you need to pass all the parameters yourself.
The only way I'm taking this for 2.22 is if we can pull it off without requiring the changes to check_form_field.  I want the check_form_field changes on the trunk after we branch, but it's too invasive of a change to make to a core process this close to a release.
Comment on attachment 206153 [details] [diff] [review]
Rewrite v5

>Index: importxml.pl

>#!/usr/bin/perl -w

I tried writing -wT and it works! You have to compile it with "perl -cwT":
perl -cwT importxml.pl
importxml.pl syntax OK

So please add the T argument to get warnings about tainted values.


>@@ -49,583 +58,1178 @@
>+    $::path = $0;
>+    $::path =~ m#(.*)/[^/]+#;
>+    $::path = $1;
>+    $::path ||= '.';    # $0 is empty at compile time.  This line will
>+
>+    # have no effect on this script at runtime.

Nit: please correctly indent the comment as follows:
>+    $::path ||= '.';    # $0 is empty at compile time.  This line will
>+                        # have no effect on this script at runtime.


>+use Data::Dumper;
>+$Data::Dumper::Useqq = 1;

Nit: do we really use Data::Dumper somewhere?? I suggest to remove it as this looks like old code.


> use Bugzilla;
>+use Bugzilla::Bug;
> use Bugzilla::Config qw(:DEFAULT $datadir);
> use Bugzilla::BugMail;

Bugzilla::Foo, where Foo = {FlagType, Product, Component, Version, Milestone} are missing.


>+use Date::Parse;

Nit: we never use str2time() so Date::Parse is useless in this script.


> sub sillyness {
>     my $zz;
>     $zz = $Data::Dumper::Useqq;

Nit: as said above, we never use Data::Dumper. Remove it!


>+sub flag_handler {

>+    my $setter       = new Bugzilla::User->new_from_login($setter_login);

Wrong! 'new' and 'new_from_login' are two distinct methods. Write:
my $setter = Bugzilla::User->new_from_login($setter_login);


>+    if ( defined($requestee_login) ) {
>+        $requestee = new Bugzilla::User->new_from_login($requestee_login);

Same remark here. Remove 'new'.


>+        if ( !$requestee ) {
>+            $err = "Invalid requestee $requestee_login on $type flag $name\n";
>+            $err .= "   Requesting from the wind.\n";
>+            $requestee_id = 0;

Wrong! If the requestee doesn't exist, its ID must be 'undef' (NULL), not 0. 0 is not a valid user account. 'undef' means there is none.


>+        if ( $requestee && !$requestee->can_see_bug($bugid) ) {

Nit: $requestee is useless as you already checked it above. If we are here, then we already know it exists. So only keep !$requestee->can_see_bug($bugid) here.


>+            $err .= "Requestee is not a member of bug group\n";
>+            $err .= "   Requesting from the wind\n";
>+            $requestee_id = 0;

Same remark as above. It must be undefined, not 0.


>+    if ($attachid) {
>+        $flag_types = Bugzilla::FlagType::match(
>+            {
>+                'target_type'  => 'attachment',
>+                'product_id'   => $productid,
>+                'component_id' => $componentid
>+            } );
>+    }

You still don't make sure that the exporter is allowed to create private attachments. The fact that he was able to export the attachment only means he is allowed to see private attachment on *his* installation, but doesn't mean that he is allowed to see them on *this* installation. Here you assume that he is allowed to. You should check whether the attachment was initially private. If yes, then if the exporter is allowed to see private attachments on *this* installation, then make this attachment private, else public. If the attachment was initially public, leave it public.


>+    # It is possible for two flags on the same bug have the same name
>+    # If this is the case, we will only match the first one.
>+    my @flag_types = @{$flag_types};

Nit: @flag_types is never used. Removed it (you only use @$flag_types).


>+        if ($grant_gid) {
>+            unless ( ( $status eq '+' || $status eq '-' )
>+                && $setter->in_group_id($grant_gid) ) {
>+                $err = "Setter $setter_login on $type flag $name ";
>+                $err .= "is not in the Grant Group\n";
>+                $err .= "   Dropping flag $name\n";
>+                return $err;
>+            }
>+        }

Wrong! You are doing things the wrong way. Here, you say that if grant_gid exists, then the flag must be either "+" or "-", else it complains. But if the flag is "?", we don't care about grant_gid. So the correct test is:

if (($status eq '+' || $status eq '-') && grant_type && !$setter->in_group_id($grant_gid)) {
    ERROR
}


>+        if ($request_gid) {
>+            unless ( $status eq '?' && $setter->in_group_id($request_gid) ) {
>+                $err = "Setter $setter_login on $type flag $name ";
>+                $err .= "is not in the Request Group\n";
>+                $err .= "   Dropping flag $name\n";
>+                return $err;
>+            }
>+        }

Same error here. You only care about request_gid if the flag is set to "?". So you have to test:

if ($status eq "?" && request_gid && !$setter->in_group_id($request_gid)) {
    ERROR
}


>+    if (!($xmlversion eq $Bugzilla::Config::VERSION)) {

Nit: instead of !($foo eq $bar), you can write ($foo ne $bar). ;)


>+sub process_bug {

>+    foreach my $field ( 
>+        qw (long_desc attachment flag everconfirmed), Bugzilla::Bug::fields() )
>+    {
>+        $all_fields{$field} = "1";
>+    }

Nit: quotes are useless.


>+    foreach my $bugchild ( $bug->children() ) {
>+        Debug( "Parsing field: " . $bugchild->name, DEBUG_LEVEL );
>+        if ( defined $all_fields{ $bugchild->name } ) {
>+              $bug_fields{ $bugchild->name } =
>+                  join( " ", $bug->children_text( $bugchild->name ) );
>+        }

Nit: Are we sure that we never have whitespaces in a field having several values? Else how do you parse it back? But I think this never happen so maybe it's fine for now.


>+    for ( my $z = 0 ; $z <= $#sorted_descs ; $z++ ) {
>+        if ( $z == 0 ) {
>+            $long_description .= "\n\n\n---- Reported by ";
>+        }

Unless I miss something, it should be "$z < $#sorted_descs" instead of "<=" as we are starting from 0 instead of 1.


>+    if ( defined $bug_fields{'component'} ) {
>+        $component = new Bugzilla::Component(
>+            {
>+                product_id => $product->id,
>+                name       => $bug_fields{'component'}
>+            }
>+        );
>+        unless ($component) {
>+            $component = $def_component;
>+            $product   = $def_product;
>+            $err .= "Unknown Component " . $bug_fields{'component'} . "\n";
>+            $err .= "   Using default product and component set ";
>+            $err .= "in Parameters \n";
>+        }
>+    }
>+    else {
>+        $component = $def_component;
>+    }

If the component is defined but doesn't exist, you fall back to both the default product and component, which is good. But if you give no component at all (is undefined), you only fall back to the default component, but leave the product alone. In the case where the product given in the XML file is valid, you would end with the product given in the XML with the default component given by your Bugzilla installation, which probably won't work. In this case, you also have to fall back to the default component.


>+    if ( Param("timetrackinggroup") ) {
>+        my $date = format_time( $bug_fields{'deadline'}, "%Y-%m-%d" )
>+          || undef;

Nit: where do you push this fields into the arrays?? /me searches...

>+        push( @values, $date );
>+        push( @query,  "deadline" );

... oh, they are here... :) To avoid this kind of questions, could you please group these lines together? ;)


>+    if ( defined($bug_fields{'resolution'}) 
>+          && $valid_res 
>+            && !IsOpenedState($bug_fields{'bug_status'})){

I disagree with the way you handle the bug status and resolution. Here, you use IsOpenedState($bug_fields{'bug_status'}) but you haven't checked whether the bug status is a valid one or not and what to do in case it's not.

My opinion is that you should first worry about the bug status, and based on it, set up the correct resolution. Please rewrite this block.


>+    elsif ( defined $bug_fields{'resolution'} && !$valid_res ) {
>+        $err .= "Unknown resolution \"$bug_fields{'resolution'}\".\n";

The bug resolution could be undefined, which you have to catch too. So you should write:
elsif (!defined $bug_fields{'resolution'} || !$valid_res )


>+    if ( Param("usevotes") && $product->votes_to_confirm) {
>+        $everconfirmed = $bug_fields{'everconfirmed'} || 0;

grr... as I said several times, everconfirmed *doesn't* depend on Param("usevotes"). It only depends on $product->votes_to_confirm!


>+        if ( $bug_fields{'bug_status'} eq 'UNCONFIRMED' && $everconfirmed ){
>+            $bug_fields{'bug_status'} = "NEW";

What about the opposite case, i.e. everconfirmed = 0 but bug_status = NEW or ASSIGNED?


>+    # First we check everconfirmed. Since it will only be 0 if usevotes is true
>+    # we do not need to check usevotes again.

Wrong! everconfirmed *doesn't* depend on usevotes!


>+    if ( $everconfirmed == 0){
>+        push( @values, "UNCONFIRMED" );
>+    }

Wrong! If the bug has been closed but has never been confirmed (e.g. RESOLVED INVALID), you shouldn't reopen the bug!
What you tried to do here partially answers my question above, but you have done it in an incorrect way.


>+    elsif (( defined( $bug_fields{'resolution'} ))  && ( !$valid_res )){
>+
>+        #if the resolution was illegal then set status to NEW
>+        push( @values, "NEW" );
>+        $err .= "Resolution was invalid. Setting status to \"NEW\".\n";
>+        $err .= "   Previous status was \"";
>+        $err .=
>+          ( defined $bug_fields{'bug_status'} )
>+          ? $bug_fields{'bug_status'}
>+          : "unknown";
>+        $err .= "\".\n";
>+    }

I'm really not convinced by all these part of the code where you handle the bug status and resolution, as said above. I'm pretty sure I can corrupt your data, i.e. find inconsistencies. ;) As I said, check the status first and update the resolution accordingly.


>+    eval {
>+        no warnings 'uninitialized';
>+        Debug(
>+            "Bug Query: INSERT INTO bugs (\n"
>+              . join( ",\n", @query )
>+              . "\n) VALUES (\n"
>+              . join( ",\n", @values ),
>+            DEBUG_LEVEL
>+        );
>+    };

I see no reason to use eval {} and "no warnings 'uninitialized';" here. As you don't use binary operators, it should never complain. Joining undefined values is fine.


>+        foreach my $person ( split( /[\s,]+/, $bug_fields{'cc'} ) ) {
>+            next unless $person;
>+            my $uid;
>+            if ( ( $person ne "" ) && ( $uid = login_to_id($person) ) ) {
>+                if ( !$ccseen{$uid} ) {
>+                    $sth_cc->execute( $id, $uid );
>+                    $ccseen{$uid} = 1;
>+                }
>+            }
>+            elsif ( $person ne "" ) {
>+                $err .= "CC member $person does not have an account here\n";
>+            }
>+        }

Nit: You already return if $person eq "", so there is no need to repeat this check again and again. So your test reduces to a simple if ... else... .


>+        foreach my $keyword ( split( /[\s,]+/, $bug_fields{'keywords'} )) {
>+            if ( $keyword eq '' ) {
>+                next;
>+            }

Nit: even cleaner: "next unless $keyword;".


>+            my $i = GetKeywordIdFromName(lc($keyword));

Nit: lc() is already used internally by GetKeywordIdFromName(). So there is no need to use it here.


>+        my $att_id   = $dbh->bz_last_key( 'attachments', 'attach_id' );

>+        foreach my $aflag (@{ $att->{'flags'} }) {
>+            next unless ( defined($aflag) && $att_id );

Nit: as you define $att_id yourself above, you already know $att_id exists. You can remove it from the test: "next unless $aflag;".


>+    my $worktime = $bug_fields{'actual_time'} || 0.0;
>+    $dbh->do("INSERT INTO longdescs 

Nit: you shouldn't rely on 'actual_time' if the user is not in the timetracking group, because this field hasn't been validated in this case. Probably you should fall back to 0.0 in this case.


>+    # Add this bug to each group of which its product is a member.
>+    my $sth_group = $dbh->prepare("INSERT INTO bug_group_map (bug_id, group_id) 
>+                         VALUES (?, ?)");
>+    foreach my $group_id ( keys %{ $product->group_controls } ) {
>+            $sth_group->execute( $id, $group_id );
>+    }

IIRC, even groups which do *not* apply to the product (those having NA/NA) are included in $product->group_controls. So you should make sure they don't have NA/NA on them before adding the bug to them.


>+$log .=  "Imported $bugtotal bug(s) from $urlbase,\n  sent by $exporter.\n\n";
>+my $subject =  "$bugtotal Bug(s) successfully moved from $urlbase to " 
>+   . Param("urlbase");
> my @to = ($exporter);
>+MailMessage( $subject, $log, @to );

Nit: I think you should also send an email to the maintainer of the installation so that he knows what has been inserted into his DB too.


Note that I haven't tested your patch yet, so I may find other bugs later. But I first want a patch which is visually OK.
Attachment #206153 - Flags: review?(LpSolit) → review-
(In reply to comment #31)
> The only way I'm taking this for 2.22 is if we can pull it off without
> requiring the changes to check_form_field.  I want the check_form_field changes
> on the trunk after we branch, but it's too invasive of a change to make to a
> core process this close to a release.
> 

This is why we have copied the check_form_field code into importxml directly. This will make a good test bed for this function and it will not affect any other code.
Attached patch Rewrite v6 (obsolete) — Splinter Review
We really need to discuss what level of permissions the exporter of a bug should have. Whether or not they should be allowed to export private attachments/comments/etc.
Attachment #206153 - Attachment is obsolete: true
Attachment #207629 - Flags: review?(LpSolit)
Attached patch Rewrite v7 (obsolete) — Splinter Review
Fixed nits as per IRC comments:
<LpSolit> ghendricks: visually, your patch looks good; only a few nits left
<ghendricks> LpSolit: :D
<LpSolit> but if the patch works correctly, I will r+ anyway
<LpSolit> ghendricks: just one thing though...
<LpSolit> ghendricks: if the attachment is private but the exporter is not allowed to create private attachment, we should make it public instead of deleting it
<ghendricks> ok
<LpSolit> that's the most serious point I have noticed, everything else is minor
<ghendricks> easy fix
<LpSolit> ghendricks: also, at the end, when you send an email to the exporter, shouldn't it be $exporter_login instead of $exporter now?
<ghendricks> hmmm
<LpSolit> I don't remember if Send() takes a user login or a user object or a user ID
<LpSolit> I will check later
<ghendricks> no because $exporter in that space is in a different name space
<LpSolit> ah ok... it's not the user object defined earlier?
<ghendricks> $exporter_login is scoped to process_bug only
<LpSolit> ok
<ghendricks> correct
<LpSolit> this leaves me with 7 nits only :)
<LpSolit> maybe more after my set of tests ;)
<ghendricks> feel free to hammer it
<LpSolit> here?
<LpSolit> do you want me to mention the nits here?
<LpSolit> or in a comment?
<LpSolit> hum... maybe in a comment, so that I can add other points too if I find some new while testing your patch....
* LpSolit is hungry :)
<ghendricks> LpSolit: whichever
<ghendricks> I have the file open and ready to make changes
<LpSolit> ok, so let's try to be quick
<LpSolit> around line 220
<LpSolit> I think $requestee_id = undef; is useless as it's already undefined
* LpSolit uses interdiff so I haven't the whole context in mind
<ghendricks> you said that it should not be 0 but undef
<LpSolit> right, but if you don't say anything, it will remain undefined
<LpSolit> when you declare it, you write "my $requestee_id;"
<LpSolit> at this level, it's undefined
<LpSolit> so if you drop the line above, it will remain in this state
<LpSolit> i.e. undefined
<LpSolit> my $foo;
<LpSolit> $foo = undef;
<LpSolit> reduces to: my $foo;
<LpSolit> :)
<LpSolit> unless you change $requestee_id meanwhile which is not the case here IIRC
<LpSolit> as I said, that's really a nit ;)
<ghendricks> ok, there are tow places where I say $requestee_id = undef, I can remove both?
<LpSolit> yes
<LpSolit> I will check when I will apply your patch locally so that I can see the context
<ghendricks> done
<LpSolit> just above the second $requestee_id = undef, you left if ( $requestee && !$requestee->can_see_bug($bugid) ) 
<LpSolit> as I said, we already know that $requestee exists
<LpSolit> else we wouldn't be here
<ghendricks> oops
<ghendricks> missed that one
<LpSolit> nit! :)
<ghendricks> gone
<LpSolit> now I haven't the line but it's when you check the status of the bug...
<LpSolit> when you check if($changed_owner){
<LpSolit> and you have ...
<LpSolit> $err .= "Bug reassigned, setting status to \"$status\".\n";
<LpSolit> $err .= "   Previous status was \"";
<LpSolit> $err .=  $bug_fields{'bug_status'} . "\".\n";
<ghendricks> found it
<ghendricks> ok?
<LpSolit> the fact that the owner changed doesn't mean the status changed too, maybe the status was UNCO, respectively NEW already
<LpSolit> in which case the warning would be confusing
<ghendricks> ah
<LpSolit> previous status NEW; new status NEW
<LpSolit> :-D
<ghendricks> so only change if it wasn't already NEW or UNCONFIRMED respectivley
<LpSolit> what you did is fine, but only display the warning if the status really changed
<LpSolit> that's a simple if ($status ne $bug_fields{'bug_status'})
<ghendricks> done
<LpSolit> this if only applies to the warning, which can remain where it is already
<LpSolit> ok
<LpSolit> a few lines below...
<LpSolit> if($status ne "UNCONFIRMED" && $is_open){
<LpSolit> we are in the $is_open block already, so we already know that $is_open = 1
<ghendricks> oops
<LpSolit> else we wouldn't be here
<ghendricks> I caught one of those
<ghendricks> missed that one
<ghendricks> I wrote them all out to help me see the logic better
<ghendricks> but forgot to remove that one
<LpSolit> now around line 1020...
<LpSolit> you removed $person ne "" from the elsif condition
<LpSolit> but you forgot to remove it from the else condition too
<LpSolit> if $person eq "", you return already
<LpSolit> so you wouldn't be here
<ghendricks> this is in the CC block?
<LpSolit> don't know, I don't see the context
<LpSolit> probably
<ghendricks> if ( ( $person ne "" ) &&( $uid = login_to_id($person) ) ) {?
<LpSolit> yes
<LpSolit> $err .= "CC member $person does not have an account here\n";
<LpSolit> so yes, it's probably in this block :)
<ghendricks> gone
<LpSolit> around line 1080
<LpSolit> if (!$exporter->in_group(Param('insidergroup')) && $att->{'isprivate'}){
<LpSolit> as I said, don't throw the attachment, just make it public
<ghendricks> already fixed
<LpSolit> don't know where you have to set this attribute though
<LpSolit> cool
<ghendricks> $att->{'isprivate'} = 0;
<ghendricks> instead of next;
<LpSolit> last nit, around line 1130..
<LpSolit> ah ok :)
<LpSolit> easy indeed :)
<LpSolit> if ($product->group_controls->{$group_id}->{'membercontrol'} != 0
<LpSolit>     && $product->group_controls->{$group_id}->{'othercontrol'} != 0){
<ghendricks> don't need the !=0?
<LpSolit> you should write CONTROLMAPNA instead of 0
<LpSolit> in case someone altered Constants.pm
<ghendricks> ah, gotcha
<LpSolit> no more nits :)
<LpSolit> now if I don't find bugs while testing your patch, it's a r+
<ghendricks> is that used for both othercontrol and membercontrol?
<LpSolit> yes
<ghendricks> done
<LpSolit> maybe you could post a new patch with these nits fixed and I will test this one directly
<LpSolit> to make sure we didn't introduce regressions
<LpSolit> my girlfriend is waiting for me to eat now ;)
<ghendricks> [Thu Jan  5 11:42:04 2006] importxml.pl: Bareword "CONTROLMAPNA" not allowed while "strict subs" in use at ./importxml.pl line 1160.
<ghendricks> better run then ;)
<LpSolit> you have to "use Bugzilla::Constants;"
<ghendricks> nope
<ghendricks> don't have it
<ghendricks> ok
Attachment #207629 - Attachment is obsolete: true
Attachment #207639 - Flags: review?(LpSolit)
Attachment #207629 - Flags: review?(LpSolit)
Attached patch Rewrite with dtd changes (obsolete) — Splinter Review
Talking with Colin he feels the dtd may need to be reworked to come into sync with Bug.pm's output. He told me to attach the dtd changes to this bug. Included are changes to Bug.pm to include everconfirmed.
Attachment #207639 - Attachment is obsolete: true
Attachment #207779 - Flags: review?(LpSolit)
Attachment #207639 - Flags: review?(LpSolit)
Comment on attachment 207779 [details] [diff] [review]
Rewrite with dtd changes

>Index: bugzilla.dtd

>+<!ELEMENT bug (bug_id, (alias?, creation_ts, short_desc, delta_ts, reporter_accessible, cclist_accessible, classification_id, classification, product, component, version, rep_platform, op_sys, bug_status, resolution?, bug_file_loc?, status_whiteboard?, keywords*, priority, bug_severity, target_milestone?, dependson*, blocked*, (votes, everconfirmed)?, reporter, assigned_to, qa_contact?, cc*, (estimated_time, remaining_time, actual_time, deadline)?, group*, flag*, long_desc*, attachment*)?)>

Nit: why setting everconfirmed as optional? If this parameter is missing, then all open bugs will be marked as UNCO. On the other hand, if everconfirmed = 0, show.xml.tmpl will not include it in the XML file. So maybe it's fine like this.



>Index: importxml.pl

> sub MailMessage {
>+    return unless ($mail);
>+    my $subject    = shift;
>+    my $message    = shift;
>+    my @recipients = @_;
>+
>+    my $to     = join( ", ", @recipients );

Mail::Mailer cannot handle multiple addressees in the To: field at once. You have to proceed one addressee at a time.


>+sub Error {

>+    $message .= "For more info, contact " . Param("maintainer") . ".\n";
>+    my @to = ( Param("maintainer") );

Asking the maintainer to contact himself doesn't make sense.


>+# This subroutine handles flags for process_bugs. It is generic in that

Nit: the routine name is process_bug with no trailing 's'.


>+sub process_bug {

>+    # This list contains all other bug fields that we want to process.
>+    # If it is not in this list it will not be included.
>+    my %all_fields;
>+    foreach my $field ( 
>+        qw(long_desc attachment flag everconfirmed), Bugzilla::Bug::fields() )
>+    {

First of all, everconfirmed is already included in Bug::fields(). Secondly, we are exporting groups via <group> </group>, but importxml.pl is ignoring them. Is it so by design? In all cases, "group" should be added to the list of known field so that importxml.pl doesn't complain about these pseudo-unknown fields.


>+    # Parse long descriptions
>+    foreach my $comment ( $bug->children('long_desc') ) {

Nit: why don't we import/export the "hours worked" data (longdescs.work_time)? As we are processing all comments, it would be easy for importxml.pl to get this value. I think we should do it.


>+        $long_desc{'isprivate'} = $comment->{'att'}->{'isprivate'} || 0;

If the comment is private but the exporter is not in the Param('insidergroup') group, the comment should stay public. This is especially important if the insidergroup is not used (i.e. nobody could see the comment, including admins). That what we did for attachments already.

I wonder if we should post two comments, one with all public comments and another one with all private comments, so that not everything is "hidden" if there is only one comment private (for instance). But that's another bug.


>+        # If we leave the attachemnt ID in the comment it will be made a link
>+        # to the wrong attachment. Since the new attachment ID is unkown yet
>+        # let's strip it out for now. We will make a comment with the right ID
>+        # later
>+        $data =~ s/Created an attachment \(id=\d+\)/Created an attachment/g;

Nit: is there really no way to replace them with the new attachment IDs? After all, attachments will be added into the DB *before* comments in the longdescs table, so those IDs are known on time. This would mean that this regexp should be executed right before adding the comment into the DB. We could as well fix it in another bug if required.


>+    for ( my $z = 0 ; $z < $#sorted_descs ; $z++ ) {

Oops, sorry. I was wrong! You indeed have to write "<=" else the last comment will be omitted. :(


>+    $comments .= "This bug previously known as _bug_ $bug_fields{'bug_id'} at ";

Nit: This bug *was* previously known as ...


>+    $comments .= $urlbase . "\n";
>+    $comments .= $urlbase . "show_bug.cgi?";
>+    $comments .= "id=" . $bug_fields{'bug_id'} . "\n";

The URL base is repeated twice!
Nit: moreover, you could let the last two lines on a single one.


>+    $comments .= "Originally filed under the $bug_fields{'product'} ";
>+    $comments .= "product and $bug_fields{'component'} component.\n";

Nit: I think we should display the original product and component only if those values changed. Else it's rather confusing.


>+    # Alias
>+    if ( $bug_fields{'alias'} ) {
>+        my ($alias) = $dbh->selectrow_arrayref("SELECT COUNT(*) FROM bugs 
>+                                                WHERE alias = ?", undef,
>+                                                $bug_fields{'alias'} );

It must be selectrow_array, as ($alias) is an array, not a reference to an array (and this makes $alias to be always true, i.e. you can never import aliases).


>+        if ($alias) {
>+            $alias = undef;
>+            $err .= "Dropping conflicting bug alias ";
>+            $err .= $bug_fields{'alias'} . "\n";
>+        }
>+        else {
>+            $alias = $bug_fields{'alias'};
>+        }
>+        push @query,  'alias';
>+        push @values, $alias;

Nit: we could as well add 'alias' to the list only when one is given.


>+        my @versions = @{ $product->versions };
>+        my $v        = $versions[0];
>+        push( @values, $v->name );
>+        $err .= "Unknown version $bug_fields{'version'} in product ";
>+        $err .= $product->name. " Setting version to \"" . $v->name . "\".\n";

Nit: please add a period after the product name and start "Setting version..." on a new line to be consistent with the other messages.


>+        push( @values, $component->default_assignee->id );
>+        $changed_owner = 1;
>+        $err .= "The original assignee of this bug does not have\n";
>+        $err .= "   an account here. Reassigning to the default assignee\n";
>+        $err .= "   for the component, $component->default_assignee->id.\n";

First, you want to display the login name, not the user ID. Secondly, $component->default_assignee->login has to be out of the quotes, else Perl sees $component only, which is a hash: HASH(...) is displayed instead.


>+            push( @values, $component->default_qa_contact->id || undef );
>+            $err .= "Setting qa contact to the default for this product.\n";
>+            $err .= "   This bug either had no qa contact or an invalid one.\n";

Nit: please add this message only if the previous QA contact doesn't exist in the new installation *or* if there was none and you are adding one. But if there was no QA contact and there is no default QA contact in the product we are moving into, don't display it.


>+        $dbh->do("INSERT INTO attachments 
>+                 (bug_id, creation_ts, filename, description, mimetype, 
>+                 ispatch, isprivate, isobsolete, submitter_id) 
>+                 VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
>+            undef, $id, $att->{'date'}, $att->{'filename'},
>+            $att->{'desc'}, $att->{'ctype'}, $att->{'ispatch'},
>+            $att->{'isprivate'}, $att->{'isobsolete'}, $exporterid);

If the original creator of the attachment exists, you should use it instead of the exporter. And so if the creator is in the insidergroup, you can leave the attachment private if it was already. We keep the original setters for flags, there is no reason to not do it here for attachments.


>+    $log .= "Bug $urlbase/show_bug.cgi?id=$bug_fields{'bug_id'} ";
>+    $log .= "imported as bug $id.\n";
>+    $log .= Param("urlbase") . "/show_bug.cgi?id=$id\n\n";

$urlbase already contains the trailing "/", so don't repeat it here (else you get foo.com//bar). You can write ${urlbase}show_bug.cgi for instance.


>+    if ($err) {
>+        $log .= "The following problems were encountered creating bug $id.\n";

Nit: encountered *while* creating ...


>+        $log .= "You may have to set certain fields in the new bug by hand.\n\n";
>+        $log .= $err;
>+        $log .= "\n\n\n";

Nit: could you move "You may have to set ..." *after* $err? This would make more sense, and the output is nicer. ;)


>+    Debug( $log, ERR_LEVEL );
>+    Bugzilla::BugMail::Send( $id, { 'changer' => $exporter_login } ) if ($mail);

Nit: most of the time, the import is successful, so having "ERR:" at the beginning of this line is confusing. I needed some time to realize there was no error. Maybe should we add an OK_LEVEL = 3 and replace "ERR:" by "OK:" in this (unique) case.


>+# Read STDIN in slurp mode. VERY dangerous, but we live on the wild side ;-)
>+local ($/);
>+$xml = <>;
>+
>+# remove everything in file before xml header (i.e. remove the mail header)
>+$xml =~ s/^.+(<\?xml version.+)$/$1/s;

One of the reason of my r- : after reading the XML file but before processing it, you have to decode the file. Since Bugzilla 2.22, when Param('utf8') is turned on and when the email contains non ASCII characters, the email is encoded using either Base64 or Quoted-Printable, see BugMail::MessageToMTA(). Per discussion on IRC with karl, we found the following solution to get the decoded body of the email:

----- start here -----
# If the email was encoded (BugMail::MessageToMTA() does it when using UTF-8),
# we have to decode it first, else the XML parsing will fail.
my $parser = MIME::Parser->new;
$parser->output_to_core(1);
$parser->tmp_to_core(1);
my $entity = $parser->parse_data($xml);
my $bodyhandle = $entity->bodyhandle;
$xml = $bodyhandle->as_string;
----- end here -----

For an email containing two 700 Kb attachments, importxml.pl needed 1:30 minute and required > 95% CPU to execute the whole process, including decoding the email + adding bugs and attachments to the DB. Maybe is the MIME::Parser decoding slow and very CPU + memory consuming. I wonder if setting one if the params above to '0' would help, either output_to_core() or tmp_to_core() or both. We could optimize the MIME decoding in another bug.


>+my @to = ($exporter, $maintainer);
>+MailMessage( $subject, $log, @to );

As said earlier, you cannot pass several addressees at once (due to a limitation in the implementation of SMTP in Mail::Mailer). So either call MailMessage() twice, or do a loop internally in MailMessage() to proceed one addressee at once.



>Index: Bugzilla/Bug.pm

>@@ -172,7 +172,7 @@

>-      delta_ts, COALESCE(SUM(votes.vote_count), 0),
>+      delta_ts, COALESCE(SUM(votes.vote_count), 0), everconfirmed,

Adding everconfirmed in the SELECT clause of the SQL query is not enough. PostgreSQL requires that 'everconfirmed' also appears in the GROUP BY clause, else it crashes (that's what happened to me while testing your patch). That's the main reason why I deny review.


We are very close now. Globally, your patch works fine. :)
Attachment #207779 - Flags: review?(LpSolit) → review-
Attached patch Rewrite FINALSplinter Review
> Mail::Mailer cannot handle multiple addressees in the To: field at once. You
> have to proceed one addressee at a time.

Done.

> >+sub Error {
> 
> >+    $message .= "For more info, contact " . Param("maintainer") . ".\n";
> >+    my @to = ( Param("maintainer") );
> 
> Asking the maintainer to contact himself doesn't make sense.

Oops. I have added the exporter to this but feel it is good to leave the maintainer as well so (s)he knows when someone is attempting to import bugs.

> Nit: the routine name is process_bug with no trailing 's'.

Done
 
> >+sub process_bug {
> 
> >+    # This list contains all other bug fields that we want to process.
> >+    # If it is not in this list it will not be included.
> >+    my %all_fields;
> >+    foreach my $field ( 
> >+        qw(long_desc attachment flag everconfirmed), Bugzilla::Bug::fields() )
> >+    {
> 
> First of all, everconfirmed is already included in Bug::fields(). Secondly, we
> are exporting groups via <group> </group>, but importxml.pl is ignoring them.
> Is it so by design? In all cases, "group" should be added to the list of known
> field so that importxml.pl doesn't complain about these pseudo-unknown fields.

Removed everconfirmed from this list as it will now be included in Bug::fields
As for the group tags, we do not currenly export groups inside <group> tags. As such it is intentional that we leave these out. See bug 275386 for my recent comments there (the filer of this bug had inserted these tags manually). If at some point in the future we want to handle <group> tags lets make that a separate bug.

> Nit: why don't we import/export the "hours worked" data (longdescs.work_time)?
> As we are processing all comments, it would be easy for importxml.pl to get
> this value. I think we should do it.

This is included in actual_time as a sum of these values. I do handle that later on.

> >+        $long_desc{'isprivate'} = $comment->{'att'}->{'isprivate'} || 0;
> 
> If the comment is private but the exporter is not in the Param('insidergroup')
> group, the comment should stay public. This is especially important if the
> insidergroup is not used (i.e. nobody could see the comment, including admins).
> That what we did for attachments already.
> 
> I wonder if we should post two comments, one with all public comments and
> another one with all private comments, so that not everything is "hidden" if
> there is only one comment private (for instance). But that's another bug.

Feel free to file it. I now check that the exporter is in the insidergroup before flagging it private. 

> 
> >+        # If we leave the attachemnt ID in the comment it will be made a link
> >+        # to the wrong attachment. Since the new attachment ID is unkown yet
> >+        # let's strip it out for now. We will make a comment with the right ID
> >+        # later
> >+        $data =~ s/Created an attachment \(id=\d+\)/Created an attachment/g;
> 
> Nit: is there really no way to replace them with the new attachment IDs? After
> all, attachments will be added into the DB *before* comments in the longdescs
> table, so those IDs are known on time. This would mean that this regexp should
> be executed right before adding the comment into the DB. We could as well fix
> it in another bug if required.

Good idea. This will require some reworking of the long description and attachment sections which I would like to hold off on till later. Lets file a bug. 

> 
> >+    for ( my $z = 0 ; $z < $#sorted_descs ; $z++ ) {
> 
> Oops, sorry. I was wrong! You indeed have to write "<=" else the last comment
> will be omitted. :(
> 
> 
> >+    $comments .= "This bug previously known as _bug_ $bug_fields{'bug_id'} at ";
> 
> Nit: This bug *was* previously known as ...
> 
> 
> >+    $comments .= $urlbase . "\n";
> >+    $comments .= $urlbase . "show_bug.cgi?";
> >+    $comments .= "id=" . $bug_fields{'bug_id'} . "\n";
> 
> The URL base is repeated twice!
> Nit: moreover, you could let the last two lines on a single one.

I had used the URL to the generic bugzilla and then a link to the bug itself, hence the two instances, but I agree it is redundant.

> >+    $comments .= "Originally filed under the $bug_fields{'product'} ";
> >+    $comments .= "product and $bug_fields{'component'} component.\n";
> 
> Nit: I think we should display the original product and component only if those
> values changed. Else it's rather confusing.

Done

> >+    # Alias
> >+    if ( $bug_fields{'alias'} ) {
> >+        my ($alias) = $dbh->selectrow_arrayref("SELECT COUNT(*) FROM bugs 
> >+                                                WHERE alias = ?", undef,
> >+                                                $bug_fields{'alias'} );
> 
> It must be selectrow_array, as ($alias) is an array, not a reference to an
> array (and this makes $alias to be always true, i.e. you can never import
> aliases).
> 
> Nit: we could as well add 'alias' to the list only when one is given.

Done and done.
 
> 
> >+        my @versions = @{ $product->versions };
> >+        my $v        = $versions[0];
> >+        push( @values, $v->name );
> >+        $err .= "Unknown version $bug_fields{'version'} in product ";
> >+        $err .= $product->name. " Setting version to \"" . $v->name . "\".\n";
> 
> Nit: please add a period after the product name and start "Setting version..."
> on a new line to be consistent with the other messages.

Done

> 
> >+        push( @values, $component->default_assignee->id );
> >+        $changed_owner = 1;
> >+        $err .= "The original assignee of this bug does not have\n";
> >+        $err .= "   an account here. Reassigning to the default assignee\n";
> >+        $err .= "   for the component, $component->default_assignee->id.\n";
> 
> First, you want to display the login name, not the user ID. Secondly,
> $component->default_assignee->login has to be out of the quotes, else Perl sees
> $component only, which is a hash: HASH(...) is displayed instead.

Done

> 
> >+            push( @values, $component->default_qa_contact->id || undef );
> >+            $err .= "Setting qa contact to the default for this product.\n";
> >+            $err .= "   This bug either had no qa contact or an invalid one.\n";
> 
> Nit: please add this message only if the previous QA contact doesn't exist in
> the new installation *or* if there was none and you are adding one. But if
> there was no QA contact and there is no default QA contact in the product we
> are moving into, don't display it.

Done
 
> >+        $dbh->do("INSERT INTO attachments 
> >+                 (bug_id, creation_ts, filename, description, mimetype, 
> >+                 ispatch, isprivate, isobsolete, submitter_id) 
> >+                 VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
> >+            undef, $id, $att->{'date'}, $att->{'filename'},
> >+            $att->{'desc'}, $att->{'ctype'}, $att->{'ispatch'},
> >+            $att->{'isprivate'}, $att->{'isobsolete'}, $exporterid);
> 
> If the original creator of the attachment exists, you should use it instead of
> the exporter. And so if the creator is in the insidergroup, you can leave the
> attachment private if it was already. We keep the original setters for flags,
> there is no reason to not do it here for attachments.

Currently the creator of an attachment is not exported.  This would require work to Bug.pm and likely Attachment.pm. I think this merits its own bug. 

The way I see exporter is as the person creating a new bug in the new system. Importxml treats him the same as if he were entering a new bug and copying the bug data (comments, attachments etc.) by hand. As I have said earlier, I think a discussion about the exporter role should be warrented and once settled upon we can act differently in this regard.

> 
> >+    $log .= "Bug $urlbase/show_bug.cgi?id=$bug_fields{'bug_id'} ";
> >+    $log .= "imported as bug $id.\n";
> >+    $log .= Param("urlbase") . "/show_bug.cgi?id=$id\n\n";
> 
> $urlbase already contains the trailing "/", so don't repeat it here (else you
> get foo.com//bar). You can write ${urlbase}show_bug.cgi for instance.

Done

> >+    if ($err) {
> >+        $log .= "The following problems were encountered creating bug $id.\n";
> 
> Nit: encountered *while* creating ...
> 
> 
> >+        $log .= "You may have to set certain fields in the new bug by hand.\n\n";
> >+        $log .= $err;
> >+        $log .= "\n\n\n";
> 
> Nit: could you move "You may have to set ..." *after* $err? This would make
> more sense, and the output is nicer. ;)

Done and done.

> 
> >+    Debug( $log, ERR_LEVEL );
> >+    Bugzilla::BugMail::Send( $id, { 'changer' => $exporter_login } ) if ($mail);
> 
> Nit: most of the time, the import is successful, so having "ERR:" at the
> beginning of this line is confusing. I needed some time to realize there was no
> error. Maybe should we add an OK_LEVEL = 3 and replace "ERR:" by "OK:" in this
> (unique) case.

Done

> 
> >+# Read STDIN in slurp mode. VERY dangerous, but we live on the wild side ;-)
> >+local ($/);
> >+$xml = <>;
> >+
> >+# remove everything in file before xml header (i.e. remove the mail header)
> >+$xml =~ s/^.+(<\?xml version.+)$/$1/s;
> 
> One of the reason of my r- : after reading the XML file but before processing
> it, you have to decode the file. Since Bugzilla 2.22, when Param('utf8') is
> turned on and when the email contains non ASCII characters, the email is
> encoded using either Base64 or Quoted-Printable, see BugMail::MessageToMTA().
> Per discussion on IRC with karl, we found the following solution to get the
> decoded body of the email:
> 
> ----- start here -----
> # If the email was encoded (BugMail::MessageToMTA() does it when using UTF-8),
> # we have to decode it first, else the XML parsing will fail.
> my $parser = MIME::Parser->new;
> $parser->output_to_core(1);
> $parser->tmp_to_core(1);
> my $entity = $parser->parse_data($xml);
> my $bodyhandle = $entity->bodyhandle;
> $xml = $bodyhandle->as_string;
> ----- end here -----

Done. Thanks for catching this. It is something I would never have thought of. Also it would have taken me hours of searching to come up with a solution.

> 
> For an email containing two 700 Kb attachments, importxml.pl needed 1:30 minute
> and required > 95% CPU to execute the whole process, including decoding the
> email + adding bugs and attachments to the DB. Maybe is the MIME::Parser
> decoding slow and very CPU + memory consuming. I wonder if setting one if the
> params above to '0' would help, either output_to_core() or tmp_to_core() or
> both. We could optimize the MIME decoding in another bug.

This is likely caused by the xml itself. XML::Twig relies on XML::Parser which in turn relies on libxml.so. All but the most recent versions of libxml.so have a bug which causes a xml file to take on an order of O^2 while parsing. 
Try updating libxml and see if it improves. My tests have shown it to be more speedy and require less resources.

However, processing xml is a heavy task even for the fastest and most optimized parsers. One of the reasons for using XML::Twig is that it is possible to process one chunk at a time and dispose of each chunk when finished. Using XML::Parser, bugs > ~2MB would take on the order of 10 min to process. XML::Twig allows the processing of much larger attachments.
> 
> 
> We are very close now. Globally, your patch works fine. :)
> 

YAY :D
Attachment #207779 - Attachment is obsolete: true
Attachment #208008 - Flags: review?(LpSolit)
Comment on attachment 208008 [details] [diff] [review]
Rewrite FINAL

>Index: importxml.pl

>+use MIME::Base64;

Nit: maybe should we add MIME::Parser too as we use it to decode the email.


>+    foreach my $field ( 
>+        qw(long_desc attachment flag), Bugzilla::Bug::fields() )
>+    {
>+        $all_fields{$field} = 1;
>+    }

Nit: add 'group' to the list as it's really exported by show.xml.tmpl. Bug related to groups: bug 275386.


>+    if ( defined $bug_fields{'product'} ) {
>+        $product = new Bugzilla::Product( { name => $bug_fields{'product'} } );
>+        unless ($product) {
>+            $product = $def_product;
>+            $err .= "Unknown Product " . $bug_fields{'product'} . "\n";
>+            $err .= "   Using default product set in Parameters \n";
>+        }
>+    }

Nit: specify the name of the default product.


>+        unless ($component) {
>+            $component = $def_component;
>+            $product   = $def_product;
>+            $err .= "Unknown Component " . $bug_fields{'component'} . "\n";
>+            $err .= "   Using default product and component set ";
>+            $err .= "in Parameters \n";
>+        }

Nit: specify the name of the default component too.


>+        $err .= "Unknown version \"";
>+        $err .= ( defined $bug_fields{'version'} )
>+            ? $bug_fields{'version'}
>+            : "unknown";
>+        $err .= " in product " . $product->name . ". \n";

Nit: there is a missing \" after the version.


>+            $err .= "Unknown milestone \"";
>+            $err .= ( defined $bug_fields{'target_milestone'} )
>+                ? $bug_fields{'target_milestone'}
>+                : "unknown";
>+            $err .= " in product " . $product->name . ". \n";

Nit: here too, there is a missing \" after the target milestone.


>+    if ($err) {
>+        $log .= "The following problems were encountered while creating bug $id.\n";
>+        $log .= $err;
>+        $log .= "You may have to set certain fields in the new bug by hand.\n\n";
>+        $log .= "\n\n\n";
>+    }

Nit: are you sure there are enough newlines at the end of the message (5)? ;) We could remove the last "\n\n\n".


Your patch works great. That was a huge work; thanks! r=LpSolit
Attachment #208008 - Flags: review?(LpSolit) → review+
Flags: approval?
Flags: approval? → approval+
YAY!!! :)


Checking in checksetup.pl;
/cvsroot/mozilla/webtools/bugzilla/checksetup.pl,v  <--  checksetup.pl
new revision: 1.468; previous revision: 1.467
done
Checking in importxml.pl;
/cvsroot/mozilla/webtools/bugzilla/importxml.pl,v  <--  importxml.pl
new revision: 1.47; previous revision: 1.46
done
Checking in t/Support/Files.pm;
/cvsroot/mozilla/webtools/bugzilla/t/Support/Files.pm,v  <--  Files.pm
new revision: 1.21; previous revision: 1.20
done
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Keywords: relnote
Added to the Bugzilla 2.22 Release Notes in bug 322960.
Keywords: relnote
Blocks: 658905
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: