Open Bug 577532 Opened 14 years ago Updated 12 years ago

Only store text attachments in the database

Categories

(Bugzilla :: Attachments & Requests, enhancement, P2)

3.7.2
enhancement

Tracking

()

People

(Reporter: mkanat, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [relnote that data/attachments will now generally take more space])

Attachments

(1 file, 8 obsolete files)

So, the only good reason to store attachments in the database is so that you can search through their data, really. However, for binary attachments, you don't really want to be searching through their data. So, by default, Bugzilla should always store binary attachments outside of the database.

The advantage here is two things:

1) The database will be smaller, which makes doing a mysqldump much easier.
2) Due to the way that MySQL works, it's theoretically possible for attachment data to fill up all of the memory that MySQL is using (the innodb_buffer_pool) to cache other data. This cache makes queries much faster--it allows MySQL to return all data from memory. However, it will also attempt to cache attachment data if that data is in the database, which is so enormous that it is likely to force out all other data from the cache. There are some protections in MySQL 5.1 to help prevent this, but it still would probably be nice to store binary attachments outside the DB, always.

Here's basically how we determine if an attachment is text:

1) If it has a text/ content-type, then we should always consider it text, even if it isn't really.

2) Otherwise, if it has no null bytes (\0), it's text. This allows us to catch attachments uploaded by IE or by people who specify the wrong content type.

If an attachment "becomes" text (by the above test) on an update, it should be moved into the database. If it "stops" being text on an update, it should be moved out of the database.
Priority: -- → P2
Okay, there's actually something even better we can do as far as a test goes, which is to use perl's "-T" test on a file handle. We can just always write out uploads to file handles. This would also have the advantage that we'd never have to load the attachment into memory when it was being uploaded.

We can probably create a new data/upload directory to put files in.

Then, when an attachment is updated, we can decide whether or not to move it into the DB by checking the content-type.

We should still always insert things into the DB if somebody specifies a text/ type for them, though.
I'm going to try my hand at this, since I think it would be such a big DB performance win.
Assignee: attach-and-request → mkanat
Target Milestone: --- → Bugzilla 4.2
Depends on: 618443
Attached patch Work In Progress (obsolete) — Splinter Review
Status: NEW → ASSIGNED
Comment on attachment 496993 [details] [diff] [review]
Work In Progress

>=== modified file 'Bugzilla/Attachment.pm'

>-    if (length($self->{data}) == 0) {
>+    if (!defined $self->{data}) {

$self->{data} is always defined. It cannot be NULL.


>+    if (!ref $data) {
>+        my $file = new File::Temp();
>+        binmode $file;
>+        print $file $data
>+            or die $file->filename . ": $!";
>+        $data = $file;
>+        seek $data, 0, SEEK_SET;
>+    }

Why do you copy the data into a temporary file? CGI already does it for us.
(In reply to comment #4)
> $self->{data} is always defined. It cannot be NULL.

  No, it can be now. (See the rest of the patch, and the Install::DB code that I haven't written yet. :-) )

> Why do you copy the data into a temporary file? CGI already does it for us.

  I only copy it into a file if it's not already a file handle. (The WebService, at least, files attachments as strings.)
Attached patch Work In Progress 2 (obsolete) — Splinter Review
Okay, this one works for everything surrounding creating and updating attachments. All that's missing now is the Install::DB code.
Attachment #496993 - Attachment is obsolete: true
Attached patch v1 (obsolete) — Splinter Review
Okay, here we go! In testing, this reduces the size of the bugs_tip mysqldump on landfill to about half its size. I haven't tested it on a bmo dump yet, but I suspect I'd see an even better result there.

The UPDATE_VALIDATORS change is pretty much unrelated to this bug, but it was a harmless change that will help make it simpler to do VALIDATOR_DEPENDENCIES if we need to, so I just went ahead and did it. I'm happy to revert it if you'd like.
Attachment #497186 - Attachment is obsolete: true
Attachment #497314 - Flags: review?(LpSolit)
Whiteboard: [relnote that data/attachments will now generally take more space]
No longer depends on: 618443
Attached patch v2 (obsolete) — Splinter Review
I realized that BmpConvert also needed to be updated. I updated it, but it's not working. I think there's a bug in ImageMagick when dealing with IO::Handle instances, or something. I get "Exception 450: WriteBlob Failed `' @ error/png.c/PNGErrorHandler/1483"

In any case, I don't really want to waste any more time trying to figure out a bug in ImageMagick when this patch is totally ready otherwise, so here's the patch.
Attachment #497314 - Attachment is obsolete: true
Attachment #497352 - Flags: review?(LpSolit)
Attachment #497314 - Flags: review?(LpSolit)
Attached patch v3 (obsolete) — Splinter Review
Okay, I switched from ImageMagick to a different library called "Imager" that actually works with Perl file handles. Another nice thing about Imager is that it can actually be installed by install-module.pl, unlike ImageMagick. Anyhow, I've now tested BmpConvert and it works.
Attachment #497352 - Attachment is obsolete: true
Attachment #497366 - Flags: review?(LpSolit)
Attachment #497352 - Flags: review?(LpSolit)
Attached patch v4 (obsolete) — Splinter Review
I had to make sure to do $data->seek(0, SEEK_SET) at the end of BmpConvert, or the file would always be stored in the database. (-T wasn't working unless the file pointer was at the start of the file.)
Attachment #497366 - Attachment is obsolete: true
Attachment #497375 - Flags: review?(LpSolit)
Attachment #497366 - Flags: review?(LpSolit)
What about if an attachment "is patch"?
(In reply to comment #11)
> What about if an attachment "is patch"?

  Then its contents will be text, so this code will figure that out.
Does this impact deleting attachments at all? In some cases, it's possible that an attachment will be both on file and in the DB. Does the delete code handle that possibility?
(In reply to comment #13)
> Does this impact deleting attachments at all? In some cases, it's possible that
> an attachment will be both on file and in the DB. Does the delete code handle
> that possibility?

  That's answered by reading the patch. :-)
Sorry for really late comment,,

(In reply to comment #0)
> So, the only good reason to store attachments in the database is so that you
> can search through their data, really. However, for binary attachments, you
> don't really want to be searching through their data. So, by default, Bugzilla
> should always store binary attachments outside of the database.

Just FYI, one advantage to store every attachments in the database for 'small' bugzilla sites might be easy backup. Just with backup the database (via mysqldump or pg_dump), user can get full backup-ed data.

If possible, I'd rather have an option for this behavior.
(In reply to comment #15)
> Just FYI, one advantage to store every attachments in the database for 'small'
> bugzilla sites might be easy backup. Just with backup the database (via
> mysqldump or pg_dump), user can get full backup-ed data.

  Ah, well, installations still need to back up localconfig and the data/ directory already in order to be truly backed up, so this doesn't change that.

> If possible, I'd rather have an option for this behavior.

  I'll consider that in a follow-up bug--it seems like there could be various good reasons to have such an option. But for the sake of keeping this review simple (or at least, no more complex than it is) I think we should add the option later.
Just wondering: isn't this going to slow down Bugzilla as an anti-virus will check the attachment content every time (I think) Bugzilla tries to read it?
(In reply to comment #17)
> Just wondering: isn't this going to slow down Bugzilla as an anti-virus will
> check the attachment content every time (I think) Bugzilla tries to read it?

You're talking about Windows, I assume? Could relnote that people should add an exemption for their attachment directory...
(In reply to comment #17)
> Just wondering: isn't this going to slow down Bugzilla as an anti-virus will
> check the attachment content every time (I think) Bugzilla tries to read it?

  That does sound like a reasonable concern, although Bugzilla already has problems with anti-virus solutions. I think that exempting the attachments directory would be a good idea, and it's something that we should both document and relnote.

  Also, not all AV systems check every file opened, as I understand it, particularly not files that couldn't possibly contain a virus, as most attachments would be.

  At the least, Linux installations will be totally unaffected, which is most of our userbase.
(In reply to comment #19)
> problems with anti-virus solutions. I think that exempting the attachments
> directory would be a good idea

Note that scanning the attachments on upload is still a good idea. This way the anti-virus would catch malware and infected files for us.
(In reply to comment #16)
> > Just FYI, one advantage to store every attachments in the database for 'small'
> > bugzilla sites might be easy backup. Just with backup the database (via
> > mysqldump or pg_dump), user can get full backup-ed data.
> 
>   Ah, well, installations still need to back up localconfig and the data/
> directory already in order to be truly backed up, so this doesn't change that.

Some configuration file like localconfig and data/params will be needed to be backed up, but they will not be required every time but the time on change (by administrators).
I think other parts in the data/ directory are not in need to be backed up.

> > If possible, I'd rather have an option for this behavior.
> 
>   I'll consider that in a follow-up bug--it seems like there could be various
> good reasons to have such an option. But for the sake of keeping this review
> simple (or at least, no more complex than it is) I think we should add the
> option later.

Totally agree with this. (So, just FYI.)
# But I think I will forgot my comments soon :-p
(In reply to comment #21)
> I think other parts in the data/ directory are not in need to be backed up.

  Well, data/mining/ needs to be backed up, if you're using Old Charts.
  Hey hey. This isn't a higher-priority than 4.0-related work, but if I could get a review on it at some point that would be awesome.
Attached patch v5 (obsolete) — Splinter Review
Hey hey. Here's an un-bitrotted version.

Could I get a review on this? It's going to make a major database performance difference for large installations, and it's also going to make their backup processes way easier (in addition to this patch having a fair bit of nice cleanup).
Attachment #497375 - Attachment is obsolete: true
Attachment #514111 - Flags: review?(LpSolit)
Attachment #497375 - Flags: review?(LpSolit)
I think there should be a parameter which lets us store all attachments in the DB, unconditionally, and this should be the default (i.e. admins must explicitly accept to have some attachments stored out of the DB, as it's the case currently). I have some concerns about anti-virus, as I said in comment 17, and about dataloss as some admins tend to only backup the DB and data/params (or even the DB only).
(In reply to comment #25)
> I think there should be a parameter which lets us store all attachments in the
> DB, unconditionally, and this should be the default (i.e. admins must
> explicitly accept to have some attachments stored out of the DB, as it's the
> case currently).

  I understand what you are saying, but non-text attachments should be stored outside of the database by default. This is the behavior that leads to the best database performance, and very few installations are in a situation where they are more limited in data/ space than in DB space. There is no really *good* reason to be storing binary data in the database, there is only the reason "we do it that way now" and "admins have the system configured a certain way because we do it that way now."

  I do agree, however, that there should be a parameter. However, that will add significant complexity to the solution, and thus I am going to do it in the patch after this one.

> I have some concerns about anti-virus, as I said in comment 17,

  If the user has anti-virus of that nature, then it would be already triggering every time the web server ran a CGI, opened a script file, or sent CSS. It would be triggered every time we read data/params, display a graph, work with graphviz, compile a template, or even load a compiled template. So this doesn't seem like a concern that we should be having as part of this bug.

> and about dataloss as some admins tend to only backup the DB and
> data/params (or even the DB only).

  Yes, we should definitely notify administrators in the release notes about this very clearly, just like we did for the Apache configuration changes. There is already dataloss from collectstats.pl if you don't backup the full data/ directory, and we have never advised anybody in any written documentation to do otherwise than backup or move the full data/ directory.
(In reply to comment #26)
>   I do agree, however, that there should be a parameter. However, that will add
> significant complexity to the solution, and thus I am going to do it in the
> patch after this one.

Why do you agree that there should be a parameter based on what you just said? Also, I much prefer to see it included in this patch, not in a separate one.
(In reply to comment #27)
> Why do you agree that there should be a parameter based on what you just said?
> Also, I much prefer to see it included in this patch, not in a separate one.

  I agree that there should be a parameter because users obviously want it. There will be situations in which the configuration of the server doesn't allow for a larger data/params, and there will be situations in which people do not want to store *any* attachments in the database, because they have greater database size constraints than they have disk space constraints, or they don't care about searching attachment data at all. These aren't the common cases, but they are cases that we need to support, even if optionally.
  Once we add the parameter, we could possibly make it like the utf8 parameter, where it retains the old behavior for existing installations, and defaults to the new behavior for new installations. I think that a lot of installations would miss out on the performance benefits then, though. It would operate well with the Principle of Least Surprise for a checksetup upgrade, though. (I'm not sure how important that is for a checksetup upgrade, though.)
(In reply to comment #29)
>   Once we add the parameter, we could possibly make it like the utf8 parameter,
> where it retains the old behavior for existing installations, and defaults to
> the new behavior for new installations.

Yeah, that's what I had in mind too. Once the admin turns on the parameter, then he knows what's going on.
(In reply to comment #30)
> Yeah, that's what I had in mind too. Once the admin turns on the parameter,
> then he knows what's going on.

  Okay, we can do that as a 4.2 blocker, but I really don't want to do it before checking in this first piece of it, as I think some parts will get much more complex when the parameter is added, and this patch is already pretty complex.
Comment on attachment 514111 [details] [diff] [review]
v5

Since this will significantly benefit Mozilla, dkl is also welcome to review it.
Attachment #514111 - Flags: review?(dkl)
(In reply to comment #30)
> (In reply to comment #29)
> >   Once we add the parameter, we could possibly make it like the utf8 parameter,
> > where it retains the old behavior for existing installations, and defaults to
> > the new behavior for new installations.
> 
> Yeah, that's what I had in mind too. Once the admin turns on the parameter,
> then he knows what's going on.

Having this as a parameter (I vote for off by default as well) will help for installations that have multiple web heads. The environment would be need to be configured somehow to keep the attachment directories synchronized across each server or use some sort of shared filesystem such as GFS. That is one advantage to having all attachments in the database since even if you have multiple slaves the data is automatically replicated.

Dave
(In reply to comment #33)
> Having this as a parameter (I vote for off by default as well) will help for
> installations that have multiple web heads.

  If you have multiple web heads, you already have a way to share the data/ directory between multiple web heads.
Yeah, we already have /data/ shared via NFS across all the webheads right now (at least within the same colo).  The other colo is synced with it via rsync periodically (But the way we're set up now, it's only there for failover purposes, so it doesn't need to be realtime).
  Ah, interesting, okay. You guys would benefit quite a bit from this patch, I think, though. But I do see why we should have a parameter, and I'll add it after this bug and before 4.2.
Comment on attachment 514111 [details] [diff] [review]
v5

I didn't look at the patch very carefully yet, but from what I can see at https://rt.cpan.org/Public/Bug/Display.html?id=35992, Imager doesn't support 64-bit platforms.
(In reply to comment #37)
> I didn't look at the patch very carefully yet, but from what I can see at
> https://rt.cpan.org/Public/Bug/Display.html?id=35992, Imager doesn't support
> 64-bit platforms.

  It is building fine on all 64-bit platforms on ActiveState:

  http://ppm4.activestate.com/idx/I.html
Comment on attachment 514111 [details] [diff] [review]
v5

>=== modified file 'Bugzilla/Attachment.pm'

>+ sub _data_fh {

Please keep private subroutines grouped together. Move it after "sub datasize".


>+        if (my $fh = $self->_data_fh) {
>+            $self->{datasize} = -s $fh;
>         }

You should close $fh.


>-    return $self->{datasize};
>+    return $self->{datasize} || 0;

This change is not needed, because the DB already returns 0 if there is no attachment, and we won't enter the IF block above if the file doesn't exist. Also, at this point, $self->{datasize} *must* be defined. Please revert this change.


>+sub _make_local_dir {
>+    my ($self) = @_;
>+    my $filename = $self->_get_local_filename;

_make_local_dir() is called from _data_fh() only, which already called _get_local_filename(). So simply call _make_local_dir($filename) from _data_fh(). Also, I don't think this should be a method of Bugzilla::Attachment. Both _make_local_dir() and _make_file_read_only() should be moved into Bugzilla::Install::Filesystem, taking the filename as argument. I could imagine other places where these two subroutines would be useful.


>+sub _make_file_read_only {

As said above, it should also go into Bugzilla::Install::Filesystem, or be skipped entirely as _make_file_read_only() is exactly the same as fix_file_permissions($filename, CGI_READ), which is really not more complex to write. Between both solutions, I prefer to see it die entirely.


>+        my $to_fh = $attachment->_data_fh('write');
>+        my $filename = $attachment->_get_local_filename;
>+        copy($data, $to_fh) or die "$filename: $!";
>+        $attachment->_make_file_read_only();

I think it's safer to first close $data and $to_fh before altering file access permissions.


>+sub _should_store_in_db {

>+    # Doing -T on a filehandle can move the data pointer, so we have to
>+    # reset it.
>+    $data_fh->seek(0, SEEK_SET);

Is it just a guess or is it written somewhere that the data pointer behaves that way?


>+sub _insert_attachment_data {
>+    my ($self, $data) = @_;
>+    my $dbh = Bugzilla->dbh;
>+    if (ref $data) {

How could $data not be a IO::File object at this point?? _check_data() enforces this.


>+        $data->untaint;

I know trick_taint() was called from here before, but this is the job of _check_data() to untaint data once it's validated, and so this line should be moved there.


>+sub _move_data_to_file_if_not_text {

>+    if (-e $filename) {
>+        die "$filename already exists";
>+    }

There is no reason to die when going from DB to file, but skip silently when going from file to DB. In all cases, die should be ThrowCodeError() as this is not supposed to happen.



>=== modified file 'Bugzilla/Install/DB.pm'

>+sub _reconfigure_attachment_storage {

>+    if (scalar(@$attachment_info) == 0) {
>+        _attachment_reconfiguration_done();
>+        return;
>+    }

This code is useless as you won't enter the foreach loop if it has 0 item.


>+    _attachment_reconfiguration_done();

With the removal of the useless code above, you could skip yet another subroutine which does nothing more than:

      open(my $fh, '>', $filename) or die "$filename: $!";
      close $fh;

This honestly helps making the code easier to read instead of having to jump from one subroutine to another one all the time (especially for such minor things).


>+sub _move_attachment_if_required {
>+    my ($id, $ctype, $in_db) = @_;

$in_db is neither passed to _move_attachment_if_required() nor used.


>+    # All other stuff might need to be moved out into a file, if it's
>+    # currently in the DB.
>+    my $is_in_db = $dbh->selectrow_array(
>+        "SELECT 1 FROM attach_data WHERE id = ?", undef, $id);
>+    return if !$is_in_db;

Make it clearer in the comment that we do this check to catch deleted attachments. I was going to say that this check was useless before thinking about them.


Otherwise looks good, but I haven't test your patch yet.
Attachment #514111 - Flags: review?(dkl)
Attachment #514111 - Flags: review?(LpSolit)
Attachment #514111 - Flags: review-
(In reply to comment #39)
> You should close $fh.

  Actually not necessary--it gets closed automatically when it goes out of scope. (And if the "if" block doesn't trigger, there's never an fh to close anyhow.)

> >-    return $self->{datasize};
> >+    return $self->{datasize} || 0;
> 
> This change is not needed, because the DB already returns 0 if there is no
> attachment, and we won't enter the IF block above if the file doesn't exist.
> Also, at this point, $self->{datasize} *must* be defined. Please revert this
> change.

  No, actually, for deleted attachments, it can be undef here now.

> >+sub _make_local_dir {
> >+    my ($self) = @_;
> >+    my $filename = $self->_get_local_filename;
> 
> _make_local_dir() is called from _data_fh() only, which already called
> _get_local_filename(). So simply call _make_local_dir($filename) from
> _data_fh().

  Oh, that's true, I could do that. I don't think it would actually give us any noticeable efficiency, though. I liked being able to just call the subroutine with no argument, I think that's simpler.

> Also, I don't think this should be a method of
> Bugzilla::Attachment. Both _make_local_dir() and _make_file_read_only()
> should be moved into Bugzilla::Install::Filesystem, taking the filename as
> argument. I could imagine other places where these two subroutines would be
> useful.

  Yeah, agreed that they could be useful there. How about this: as soon as we want to use them somewhere else, we'll move them into Install::Filesystem. For now it's convenient to be able to call $self->id inside of _get_local_filename.

> As said above, it should also go into Bugzilla::Install::Filesystem, or be
> skipped entirely as _make_file_read_only() is exactly the same as
> fix_file_permissions($filename, CGI_READ), which is really not more complex
> to write. Between both solutions, I prefer to see it die entirely.

  Ah, the primary reason that I wanted to make it into a subroutine was so that there was only one place where we chose the specific permissions that were going to be used (CGI_READ, in this case). You're right that I could put that code int there manually, but it seemed simpler to encapsulate it so that the choice would only be in one place.


> >+sub _should_store_in_db {
> 
> >+    # Doing -T on a filehandle can move the data pointer, so we have to
> >+    # reset it.
> >+    $data_fh->seek(0, SEEK_SET);
> 
> Is it just a guess or is it written somewhere that the data pointer behaves
> that way?

  I originally found it out directly by seeing it happen while I was coding. The docs aren't too clear about it, they just say the IO buffer will be examined:

  http://perldoc.perl.org/functions/-X.html

> >+        $data->untaint;
> 
> I know trick_taint() was called from here before, but this is the job of
> _check_data() to untaint data once it's validated, and so this line should
> be moved there.

  Merely the fact of opening a file on the filesystem causes it to be tainted data. So _data_fh is always tainted, even if we didn't go through _check_data. (So, for example, during _move_data_into_database this will be tainted.)

> >=== modified file 'Bugzilla/Install/DB.pm'
> 
> >+sub _reconfigure_attachment_storage {
> 
> >+    if (scalar(@$attachment_info) == 0) {
> >+        _attachment_reconfiguration_done();
> >+        return;
> >+    }
> 
> This code is useless as you won't enter the foreach loop if it has 0 item.

  Oh, it prevented us from printing out the attachment_storage_reconfig message when there was no work to do. However, I've change the code to just skip that message if there's no work to do, which you're right, totally makes things simpler! :-)


  Also, all your other comments were great and I'm implementing fixes for them right now. :-)
Attached patch v6 (obsolete) — Splinter Review
Okay, here we go.

By the way, do you want me to split the Imager part off into a separate patch?
Attachment #514111 - Attachment is obsolete: true
Attachment #532803 - Flags: review?(LpSolit)
Target Milestone: Bugzilla 4.2 → Bugzilla 5.0
Comment on attachment 532803 [details] [diff] [review]
v6

>=== modified file 'Bugzilla.pm'

>-# $::SIG{__DIE__} = i_am_cgi() ? \&CGI::Carp::confess : \&Carp::confess;
>+ $::SIG{__DIE__} = i_am_cgi() ? \&CGI::Carp::confess : \&Carp::confess;

This change is only useful for debugging. :)



>=== modified file 'Bugzilla/Attachment.pm'

>-    # If we have a filehandle, we need its content to store it in the DB.
>-    elsif (ref $data) {
>-        local $/;
>-        $data = <$data>;
>-    }

Bitrot due to the security bug 660502. To avoid reproducing this bug again, make sure you explicitly close filehandles everywhere.
Attachment #532803 - Flags: review?(LpSolit) → review-
(In reply to Frédéric Buclin from comment #42)
> Bitrot due to the security bug 660502. To avoid reproducing this bug again,
> make sure you explicitly close filehandles everywhere.

  Okay, I have done some research on this. The Perl docs say that file handles stored in *local variables* will be closed as soon as the variable goes out of scope. But of course, file handles in globs are not local variables, so they would not have been closed.

  So FWIW, according to the documentation, doing "close $fh" and not checking the result is the same as letting $fh just go out of scope.
(In reply to Max Kanat-Alexander from comment #43)
>   So FWIW, according to the documentation, doing "close $fh" and not
> checking the result is the same as letting $fh just go out of scope.

  I just checked though, and everywhere I open a filehandle in this patch, I do close it, so nothing has changed from the current state of security, in any case, for this now. (I have to close them for other reasons, like you can't unlink an open filehandle safely, and you want to close filehandles to make sure they've flushed, sometimes.)
Attached patch v7Splinter Review
Okay, fixed both of those issues! :-)
Attachment #532803 - Attachment is obsolete: true
Attachment #553310 - Flags: review?(LpSolit)
Blocks: 686169
Hey hey. If you have the chance, a review on this patch would be really cool. It refactors some stuff that would be nice to have refactored.
Comment on attachment 553310 [details] [diff] [review]
v7

>=== modified file 'Bugzilla/Attachment.pm'

> use List::Util qw(max);

max() is no longer used in this file. You can remove this line.


>+sub _move_data_into_database {

>+    if ($in_db) {
>+        ThrowCodeError('attachment_data_exists',
>+                       { id => $self->id, file => $filename });
>+    }

If the mimetype was not text/* but was in fact a text file, the attachment remained in the DB. So if you change the mimetype back to text/*, this method tries to insert the attachment into the DB again, triggering the following error:

Bugzilla tried to move attachment # into the database from a file called "./data/attachments/group.11/attachment.11", but there is already attachment data in the database for this attachment.

Traceback:

 at Bugzilla/Attachment.pm line 976
	Bugzilla::Attachment::_move_data_into_database(...) called at Bugzilla/Attachment.pm line 924
	Bugzilla::Attachment::update(...) called at /var/www/html/bugzilla-bzr/attachment.cgi line 725
	main::update(...) called at /var/www/html/bugzilla-bzr/attachment.cgi line 126

Note also the lack of attachment ID in the error message ("... attachment # into ..."); see below for the reason why. What's wrong with this error is that we should have returned earlier as the filehandle is supposed to be undefined, but this doesn't happen. And I checked, there is no file named "./data/attachments/group.11/attachment.11". Maybe ->_data_fh is working incorrectly?



>=== modified file 'Bugzilla/Install/DB.pm'

>+sub _move_attachment_if_required {

>+    $attachment->_move_data_to_file_if_not_text();

This is pretty annoying. You are going to write hundreds of Mb or even several Gb of data just to make sure the attachment is really a text file or not. During the migration process, I would only look at the mimetype and ignore if the file is really text or not.



>=== modified file 'Bugzilla/Install/Filesystem.pm'

>-         $attachdir         => { files => CGI_WRITE,
>+         $attachdir         => { files => CGI_READ,
>                                   dirs => DIR_CGI_WRITE },

Are we sure we will still be able to edit data/params with this change?



>=== modified file 'attachment.cgi'

>-         filename      => $cgi->param('attach_text') ? "file_$bugid.txt" : scalar $cgi->upload('data'),
>+         filename      => $cgi->param('attach_text') ? "file_$bugid.txt" : scalar $cgi->param('data'),

Why replacing ->upload by ->param?



>=== modified file 'template/en/default/global/code-error.html.tmpl'

>+  [% IF error == "attachment_data_exists" %]
>+    [% title = "Attachment Data Exists" %]
>+    [% terms.Bugzilla %] tried to move attachment 
>+    #[% attachment.id FILTER html %] into the database from a file called 

You use attachment.id here, but you passed id itself to this error message. This explains why the attachment ID didn't appear in the error message above.


>+  [% ELSIF error == "attachment_file_exists" %]
>+    [% title = "File Exists" %]
>+    [% terms.Bugzilla %] tried to move attachment 
>+    #[% attachment.id FILTER html %] from the database into a file called

Same here.


Everything else looks good, but I will still need to do more testing once the error above is fixed.
Attachment #553310 - Flags: review?(LpSolit) → review-
Comment on attachment 553310 [details] [diff] [review]
v7

>=== modified file 'Bugzilla/Attachment.pm'

>+    my $filename = $self->_get_local_filename;
>+    if (-e $filename) {
>+        unlink $filename or warn "Couldn't unlink $filename: $!";
>+    }

One more thing: as the deletion of the file is now done in remove_from_db(), please remove this same code from attachment.cgi, see bug 684225.
We are going to branch for Bugzilla 4.4 next week and this bug is either too invasive to be accepted for 4.4 at this point or shows no recent activity. The target milestone is reset and will be set again *only* when a patch is attached and approved.

I ask the assignee to reassign the bug to the default assignee if you don't plan to work on this bug in the near future, to make it clearer which bugs should be fixed by someone else.
Target Milestone: Bugzilla 4.4 → ---
Assignee: mkanat → attach-and-request
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: