Problem with the regex in to find duplicates

RESOLVED FIXED in Bugzilla 2.18



Installation & Upgrading
16 years ago
6 years ago


(Reporter: davef, Assigned: bbaetz)


Bugzilla 2.18


(Whiteboard: [fixed in 2.16.1])


(1 attachment, 1 obsolete attachment)

801 bytes, patch
Jouni Heikniemi
: review+
Joel Peshkin
: review+
Details | Diff | Splinter Review


16 years ago
When running to convert an old installation (in particular, the site), I got sql errors on line 2709 because 's weren't being
escaped. Upon further investigation (by bbaetz), we figured out that the regex
on 2708 that was trying to filter out the bugids was broken, as it wasn't
getting rid of the text after the * * * This bug is a duplicate... * * * line.
The suggested fix, change the regex to:

                $dupes{$key} =~ /.*\*\*\* This bug has been marked as a
duplicate of (\d{1,5}) \*\*\*/s;
                $dupes{$key} = $1;
Created attachment 87588 [details] [diff] [review]

davef's suggested regexp (from bbaetz?) with minor changes.

.* at the beginning is redundant if there's no ^, since the lack of a ^ matches
anything before the pattern anyway.  However, to ensure that it snags the first
duplicate text in the comment (in case someone was stupid and provided their
own (and yes I've actually seen it done on bmo), I changed the beginning of the
regexp to ^.*? so it would only get the closest dupe text to the beginning.

.*? is a non-greedy match (the shortest possible string that matches) instead
of a greedy match like .* (which would get the longest possible match).

Comment 2

16 years ago
Comment on attachment 87588 [details] [diff] [review]

No, we want the last duplicate marking - if a bug is duped, reopened, then
duped, its the last bug which is important.

.*? has the problem the current regexp has, in that it appears not to be greedy
(which is why the .*? doesn't capture stuff at the end) Did this change from
perl5.00x to 5.6?

Remember, the text we're going over is the _entire_ set of comments, not just a
single comment:

$ perl -w
my $foo = "abc xxx 1 xxx 2 asdas";
$foo =~ /^.*xxx (\d)/;
print $1;

gives '2', but using .*? gives '1'.
Attachment #87588 - Flags: review-
We are not operating on the entire comment text.  The regexp is operating on a
single comment.  That's why the silly hash trick. :-)  It starts with the oldest
comment in each bug and works its way to the newest, replacing the duplicate bug
number every time it finds a new matching comment.

Comment 4

16 years ago
OK, justdave is right (bgo has this cause they took the movers patch out of
order, and so have all the move comments here.)

We still want the last one in the commment, though - consider:

This bug is a duplicate of bug 123. Bugzilla should say:

*** This bug has been marked as a duplicate of bug 123 ***


*** This bug has been marked as a duplicate of bug 123 ***

The s/ will change this to:

123 See:

*** This bug has been marked as a duplicate of bug 123 ***

which isn't a valid bug number....

Comment 5

16 years ago
Created attachment 90584 [details] [diff] [review]

OK, try this.

davef: Can you try this patch for the conversion, and then do a sanity check
afterwards, checking that there aren't any bugs marked as dupes withouth
entries in the dupes table? If there are, then if you tell me the bug #s I'll
try to check where they are.
Attachment #87588 - Attachment is obsolete: true

Comment 6

16 years ago
-> me for 2.18, although I could maybe be talked into doing this for 2.16.
Assignee: zach → bbaetz
Keywords: patch, review
Target Milestone: --- → Bugzilla 2.18

Comment 7

16 years ago
Comment on attachment 90584 [details] [diff] [review]

r=jouni, but needs a second-review preferably with test results from a real
installation. I did run the regexp against some simulated comments and it
worked right, though.
Attachment #90584 - Flags: review+
I think the second r= on this patch needs to come from someone upgrading b.g.o,
to confirm it fixes their problem.


Comment 9

16 years ago
Sorry, I've been out of town this last week and haven't been able to try out 
the patch (and in fact, I'm still out of town). I'll be back on friday, and 
have an answer for you as to if it works this weekend. Thank you for the fix, 
btw - it's always preferrable for me to get the right fix, rather than 
something that seems to work.

Comment 10

16 years ago
okay, tried the patch - it seems to work. I will keep checking to make sure it
didn't miss any duplicates, but the error I was getting that prompted this bug
is no longer there. Thanks...

Comment 11

16 years ago
gerv: does davef's comment count as an r2? (And should this go onto the 2.16
I'd say so. I don't know about 2.16 - the patch author is the best person to
assess the risk.


Comment 13

16 years ago
Comment on attachment 90584 [details] [diff] [review]

While I don't have a db with a lot of dupes, this is pretty clearly correct,
does no harm, and has been confirmed to fix the problem for others.

2xr from joel
Attachment #90584 - Flags: review+

Comment 14

16 years ago
OK, this is checked in on the trunk and 2.16 branch, so that people upgrading
get the fix (which is sort of the point of it...)
Last Resolved: 16 years ago
Resolution: --- → FIXED
Whiteboard: [fixed in 2.16.1]
Summary: Problem with the regex in to find dependancies. → Problem with the regex in to find duplicates
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.