Closed Bug 504104 Opened 15 years ago Closed 15 years ago

Editing attachments as comment doesn't render text correctly

Categories

(Bugzilla :: Attachments & Requests, defect)

3.2.4
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 477442

People

(Reporter: whimboo, Unassigned)

References

()

Details

Attachments

(1 file, 1 obsolete file)

With the current Bugzilla installation at bugzilla.mozilla.org it is not possible to edit attachment as comment without breaking umlauts. This behavior makes it hard for us to review translations like the one below in German. Just open the link and click on "Edit as comment". Watch out for words like "Veröffentlichen".

https://bugzilla.mozilla.org/attachment.cgi?id=373651&action=edit
CC'ing Pascal who could probably know some attachments from the web translation work.
I have some, thanks to shimono. I have a patch mostly ready. It just needs testing.
Assignee: attach-and-request → LpSolit
Status: NEW → ASSIGNED
Target Milestone: --- → Bugzilla 3.2
Version: unspecified → 3.2.4
Summary: Editing attachments as comment breaks umlauts → Editing attachments as comment doesn't render text correctly
Attached patch patch, v1 (obsolete) — Splinter Review
I cannot name the new function decode_utf8(), because it conflicts with Encode::decode_utf8(), making email_in.pl to not compile correctly.
Attachment #388520 - Flags: review?(mkanat)
To make things clear, the goal of my patch is not to fix bug 365926 (when the encoding is not UTF8). The goal of my patch is to render files using UTF8 correctly when editing them as comment.

Everything else is to avoid code duplication.
Attachment #388520 - Flags: review?(mkanat) → review-
Comment on attachment 388520 [details] [diff] [review]
patch, v1

You can't actually guarantee that an attached file is utf8, and in fact you might be attempting to convert invalid bytes into utf8 here, which would cause a Perl error or various other unknown behaviors.
Comment on attachment 388520 [details] [diff] [review]
patch, v1

Oh, also, all this "block of code into a function" refactoring stuff I'd only want on HEAD (I do like it, though--although I'm concerned slightly about passing around $attachment->data as a scalar).
(In reply to comment #6)
> I'm concerned slightly about
> passing around $attachment->data as a scalar).

What's wrong with that? $attachment->data returns a scalar.


(In reply to comment #5)
> You can't actually guarantee that an attached file is utf8, and in fact you
> might be attempting to convert invalid bytes into utf8 here, which would cause
> a Perl error or various other unknown behaviors.

We already use it for patches in Diff mode, so there is nothing new here. If it fails in "edit as comment", it will also fails when viewing them in Diff mode. If the encoding is not UTF8, no error is thrown.
(In reply to comment #7)
> (In reply to comment #6)
> > I'm concerned slightly about
> > passing around $attachment->data as a scalar).
> 
> What's wrong with that? $attachment->data returns a scalar.

  Imagine that you have a 50MB attachment.

  utf8::decode($attachment->{data}) changes the value in-place.

  bz_decode_utf8 decodes the value in-place, and then for some reason, returns the value, thus causing two copies of the attachment to exist, thus temporarily occupying 100MB of RAM.


> We already use it for patches in Diff mode, so there is nothing new here. If it
> fails in "edit as comment", it will also fails when viewing them in Diff mode.
> If the encoding is not UTF8, no error is thrown.

  Diffs are generally unlikely to even have non-ASCII characters in them, whereas text files (particularly text/html files) could be in any number of encodings, that we can't determine.
(In reply to comment #8)
>   Imagine that you have a 50MB attachment.

Which is way above the default limit of 1MB.


>   Diffs are generally unlikely to even have non-ASCII characters in them,
> whereas text files (particularly text/html files) could be in any number of
> encodings, that we can't determine.

I already answered this in comment 4.
(In reply to comment #8)
>   Imagine that you have a 50MB attachment.

Oh, and note that this code is in Attachment::PatchReader, so it's only used for patches in Diff mode, not for other attachments *at all*. How often do you attach a 50MB patch? :)
Attached patch patch, v2Splinter Review
Compared to my previous patch:

- I use Encode::decode() to decode strings. By default, bz_decode() (renamed from bz_decode_utf8()) assumes the encoding to be utf8. Encode::decode() replaces invalid sequences by substitution characters, so this is safe.

- I had to fix Bugzilla::CGI::param() which was calling utf8::decode() even when $cgi->param() was returning a ref, for instance when called from $cgi->upload. I checked in the original CGI.pm on CPAN, and they also use this exact same check, so we are safe here.

- bz_default() now accepts a $charset argument, so that email_in.pl can use it too. Also, this will let me fix bug 365926 more easily.

I tested, and it's working fine.
Attachment #388520 - Attachment is obsolete: true
Attachment #388604 - Flags: review?(mkanat)
Blocks: 365926
(In reply to comment #11)
> - bz_default() now accepts a $charset argument

err... I meant bz_decode(), of course.
Target Milestone: Bugzilla 3.2 → Bugzilla 3.4
Target Milestone: Bugzilla 3.4 → ---
Comment on attachment 388604 [details] [diff] [review]
patch, v2

Okay, what I'd really like for this is to split bz_decode into a separate bug, and for it to operate directly on the scalar that's input without creating a copy of it.

It would be really nice to have bz_decode, it's just hard for me to review when there's both an architectural change and a bug fix.

(Also, bz_decode won't go on the 3.4 branch in any case, so we'll need a separate patch anyway for that.)
Attachment #388604 - Flags: review?(mkanat) → review-
Oh, it's a dupe.
Assignee: LpSolit → attach-and-request
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
No longer blocks: 365926
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: