DiffStrings needs to manage fields with duplicate values

RESOLVED FIXED in Bugzilla 2.18

Status

()

P1
blocker
RESOLVED FIXED
15 years ago
6 years ago

People

(Reporter: goobix, Assigned: jouni)

Tracking

({dataloss})

unspecified
Bugzilla 2.18
dataloss
Bug Flags:
approval +
blocking2.18 +

Details

Attachments

(1 attachment)

(Reporter)

Description

15 years ago
In bug 225703, the patch called "Version 8" was r+ -ed by Gerv, but both the
email notification and the activity log show that Gerv actually canceled the
request.

Jouni managed to reproduce it on the CVS tip. It seems it happens on secondary
review requests, which show correctly next to the patch but get logged/reported
in a wrong way.

Justdave said this might be already reported somewhere but Jouni and I couldn't
find it.
Yes, I noticed this - and was going to get around to filing a bug. Nice one.

Gerv
Blocks: 171553
Flags: blocking2.18+
Target Milestone: --- → Bugzilla 2.18
flagging dataloss as this records incorrect data in the activity log.
Severity: normal → critical
Keywords: dataloss
Severity: critical → blocker
(Reporter)

Comment 3

15 years ago
The code in question that causes this is located in globals.pl, in the
DiffStrings sub:

    my (@remove, @add) = ();

    # Find values that were removed
    foreach my $value (@old) {
        push (@remove, $value) if !grep($_ eq $value, @new);
    }

    # Find values that were added
    foreach my $value (@new) {
        push (@add, $value) if !grep($_ eq $value, @old);
    }


This gets it all wrong. For example, if we have already a review+ flag, and
someone comes and does a second review+ flag on it, then the code will not
detect any new values, because review+ already existed, in the @old array.

The code works perfectly as long as every flag appears one time at most. When a
flag appears twice or more, then we start to have problems due to the algorithm
that we use above, which is conceptually wrong.

The current complexity of the current algorithm is O(n^2), because grep is O(n)
and we iterate though each value.

I propose a different algorithm which is O(n^2) as well: iterate though each
pair of values (x, y) where x is a flag from the @old array and y is a flag from
the @new array. If those are the same, then mark both x = "" and y = "" in the
@old and @new arrays respectively.

After this ends, @old will contain correctly only the values that got removed,
and @new only the values that got added.
Summary: Secondary flags are reported wrong in email notification and activity log → The DiffStrings sub from globals.pl doesn't always work correctly for multiple values
(Reporter)

Comment 4

15 years ago
CCing jake who contributed to the original code. This code is also related to
Myk's first checkin to Bugzilla :-)
I believe DiffStrings was intended for use with CC and dependency lists, which
*should* only have one of each item in it (and should probably weed duplicates).

So we should probably special-case this for flags?

Or is this at a point in the code where we know duplicates have already been
stripped when dealing with CC and dependencies, and thus it shouldn't need to
test for and remove them here?
(Assignee)

Comment 6

15 years ago
Ugh, Joel talked me into this... I have this mostly figured out, I'll try to
turn it into a patch tomorrow.
Assignee: myk → jouni
(Assignee)

Comment 7

15 years ago
DiffStrings is used for keywords, dependency data and flags. Keywords and
dependency data have their own systems for weeding out duplicates; keywords use
special "keywordsseen" et al. hashes in process_bug, while dependency data is
based on db snapshots (which naturally eliminate duplicates). Cc field has its
own handling mechanism. It could probably be written to use DiffStrings, but I
don't view that as a priority.

Regardless of that, I don't think it's worth the effort to special case flags
anyway. In the future (custom fields?) we'll probably hit this again; it's just
easier to make DiffStrings manage duplicates. The patch I'm about to attach does
exactly that.
Status: NEW → ASSIGNED
Summary: The DiffStrings sub from globals.pl doesn't always work correctly for multiple values → DiffStrings needs to manage fields with duplicate values
(Assignee)

Comment 8

15 years ago
Created attachment 148179 [details] [diff] [review]
v1
(Assignee)

Comment 9

15 years ago
Comment on attachment 148179 [details] [diff] [review]
v1

Joel, might you be able to take look? I tested flag, dep. and keyword changes,
all worked ok.
Attachment #148179 - Flags: review?(bugreport)

Comment 10

15 years ago
Comment on attachment 148179 [details] [diff] [review]
v1

r=joel

Can anybody think of a reason that we would need DiffStrings to ignore changes
that impact only  duplicates?
Attachment #148179 - Flags: review?(myk)
Attachment #148179 - Flags: review?(bugreport)
Attachment #148179 - Flags: review+

Updated

15 years ago
Priority: -- → P1
(Assignee)

Comment 11

15 years ago
Oops... Long break from developing Bugzilla can be spotted there. My patch
contains 2-space indents, but I'll fix them to 4-space pre-commit or at the next
patch, depending on the 2nd review result. 
Comment on attachment 148179 [details] [diff] [review]
v1

>Index: globals.pl

>+    foreach my $oldv (@old) {
>+      foreach my $newv (@new) {
>+        next if ($newv eq '');
>+        if ($oldv eq $newv) {
>+          $newv = $oldv = '';
>+        }
>+      }

This sets only the local variables to an empty string--it doesn't change the
array items at all--so it's not going to do what you expect.
Attachment #148179 - Flags: review?(myk) → review-

Comment 13

15 years ago
I thought the same thing and almost rejected it as well.  It turns out that
Jouni is correct.
(In reply to comment #12)
> This sets only the local variables to an empty string--it doesn't change the
> array items at all--so it's not going to do what you expect.

That's not true, it does exactly what he expects.  I've been bitten by this
other places where I didn't want it to update the array and it did. :)  Editing
the loop variable in a foreach loop does indeed update the array it's looping
through.
> That's not true, it does exactly what he expects.  I've been bitten by this
> other places where I didn't want it to update the array and it did. :)  Editing
> the loop variable in a foreach loop does indeed update the array it's looping
> through.

Is that documented anywhere?  If so, I'm happy to rescind my negative review. 
Otherwise let's not rely on it.

Comment on attachment 148179 [details] [diff] [review]
v1

Silly me, of course it's documented (man perlsyn). r=myk
Attachment #148179 - Flags: review-
Flags: approval+

Comment 17

15 years ago
Checking in globals.pl;
/cvsroot/mozilla/webtools/bugzilla/globals.pl,v  <--  globals.pl
new revision: 1.267; previous revision: 1.266
done
Status: ASSIGNED → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED
(Assignee)

Comment 18

15 years ago
Well, you guys figured this out while I was sleeping. That's fine, thanks. :-)
Here's the fix for indentation issue mentioned in comment 11 but which Joel
forgot when checking in:

Checking in globals.pl;
/cvsroot/mozilla/webtools/bugzilla/globals.pl,v  <--  globals.pl
new revision: 1.268; previous revision: 1.267
done
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.