Open Bug 724881 Opened 13 years ago Updated 5 years ago

Bugzilla should detect duplicate attachments

Categories

(Bugzilla :: Attachments & Requests, enhancement, P4)

enhancement

Tracking

()

People

(Reporter: kats, Unassigned)

Details

I've seen this happen a number of times: 1) Somebody uploads a patch for review 2) Reviewer points out some problems, gives r- 3) Author fixes patch, but mistakenly re-uploads initial patch for review 4) Reviewer points out mistake 5) Author uploads correct patch 6) Review proceeds It seems like steps 3 and 4, which consist of an entire round-trip between author and reviewer, could be eliminated by some simple duplicate-checking code in bugzilla. Something like "if an attachment is uploaded, and it is the same as another attachment on the same bug, notify the uploader". You could even limit it's scope further if this is too risky - "if an attachment is uploaded while obsoleting another attachment, and the new attachment is the same as the obsoleted one, notify the uploader".
This happens so rarely that it IMO doesn't worth it. Also, some attachments can be very large, and having to load them from the DB to compare their content may be slow and generate useless I/O with the DB. Marking as P4 for now, but it may be resolved as wontfix.
Severity: normal → enhancement
OS: Mac OS X → All
Priority: -- → P4
Hardware: x86 → All
Something similar to this just happened to me, but in my case, my patch was reviewed with nits fixed, so the reviewer wasn't expecting to see another patch, and the "new" patch could easily have got landed by mistake :-(
(In reply to Frédéric Buclin from comment #1) > This happens so rarely that it IMO doesn't worth it. Also, some attachments > can be very large, and having to load them from the DB to compare their > content may be slow and generate useless I/O with the DB. I don't know what % of attachment uploads have this problem but I've definitely seen it a number of times since I reported this issue. It might still not be worth it, I don't know. As for attachment size, storing a hash of the attachment and just comparing those would be good enough, I think. Since it would be a warning rather than an error it would be fine to have a few false positives (which should be astronomically rare anyway).
Here's a thought: What about adding a "checksum" field to attach_data? One could calculate the sum on the incoming attachment. If the sum doesn't match any existing entry in attach_data, one would create entries in attach_data, attachments, etc. just like the existing implementation (except now with the new checksum field in attach_data). If the sum DOES match an existing entry in attach_data AND if there exists at least one other row in the attachments table that shares the same bug_id as the one for which the attachment is being submitted, a warning could be presented (e.g., are you sure?). If the submitter proceeds, a new entry in the attachments table could be created which references the existing entry in attach_data. That seems like it would alleviate at least part of the brute force expense identified in comment 1. This is kind of like what git does.
Oops...when I made comment 4, I didn't realize: 1) That's actually what kats was suggesting (in less detail) in comment 3. 2) In the current implementation attachments.attach_id always equals attach_data.id, which is unfortunate. My suggestion in comment 4 would likely require adding a "attach_data_id" or something to the attachments table. The good news is that backfilling for prior versions would be straightforward: UPDATE attachments SET attach_data_id = attach_id WHERE attach_id IS NULL (or something similar).

I'd like to bring this back into focus. we are using bugzilla as ticketing system with clients as well as for internal development use. Since we support submission via mail, we add tons of attachments to the database. Everybody has logos, vcards, ads or legal declarations attached to each and every mail.
I'm currently trying to filter on the mail server and am uncertain if I should patch email_in.pl. A even more general approach would be the best solution.

You need to log in before you can comment on or make changes to this bug.