Closed
Bug 410902
Opened 17 years ago
Closed 17 years ago
Some characters are mangled in diff and interdiff modes when viewing attachment
Categories
(Bugzilla :: Attachments & Requests, defect)
Tracking
()
RESOLVED
FIXED
Bugzilla 3.2
People
(Reporter: LpSolit, Assigned: LpSolit)
References
Details
(Keywords: regression)
Attachments
(1 file, 5 obsolete files)
6.00 KB,
patch
|
mkanat
:
review+
himorin
:
review+
|
Details | Diff | Splinter Review |
Same issue as in bug 408446. Bugzilla::Attachment::PatchReader::(inter)diff also need to be fixed. Less critical than the original bug, but still needs to be fixed ASAP.
Flags: blocking3.1.3?
Comment 1•17 years ago
|
||
This isn't so critical that it *must* be fixed in a devel release (patches are usually mostly ASCII), but it should definitely be fixed before 3.2.
Flags: blocking3.2+
Flags: blocking3.1.3?
Flags: blocking3.1.3-
Comment 2•17 years ago
|
||
Some user of Bugzilla-jp says, this bug might be critical for documents and web-site based projects. # But, i think it might be ok with blocking3.2+ :)
Comment 3•17 years ago
|
||
it seems working with this..
Assignee: attach-and-request → shimono
Status: NEW → ASSIGNED
Assignee | ||
Comment 4•17 years ago
|
||
I thought about fixing it this way too, but I think the problem is that you only want the diff of the patch to use RAW, but the footer of the page should still use UTF8, else some e.g. saved search names may be rendered incorrectly. Is what I'm saying crazy or correct?
Comment 5•17 years ago
|
||
Comment on attachment 297800 [details] [diff] [review] working(?) patch Right, what LpSolit said is correct. We should have a "utf8_disable" and "utf8_enable" function in Bugzilla::Util, probably, so that we don't have to keep adding these :raw and :utf8 statements manually. The functions can do the check on the utf8 param themselves, so we can also eliminate that from the callers.
Attachment #297800 -
Flags: review-
Comment 6•17 years ago
|
||
(In reply to comment #4) > I thought about fixing it this way too, but I think the problem is that you > only want the diff of the patch to use RAW, but the footer of the page should > still use UTF8, else some e.g. saved search names may be rendered incorrectly. > Is what I'm saying crazy or correct? With the patch v.1, i could get correct output from localized Japanese templates. # I've localized only two of them to test this.. Have someone tested with localized templates on current tip? Of cource, i should file another bug for this, if this is a REAL problem.. (In reply to comment #5) > (From update of attachment 297800 [details] [diff] [review]) > Right, what LpSolit said is correct. > > We should have a "utf8_disable" and "utf8_enable" function in Bugzilla::Util, > probably, so that we don't have to keep adding these :raw and :utf8 statements > manually. The functions can do the check on the utf8 param themselves, so we > can also eliminate that from the callers. I agree with this.. But, it seems that we should have another bug for localized template processing..
Comment 7•17 years ago
|
||
working(?) patch without 'binmode raw'. But, PatchReader/DiffPrinter/raw.pm causes many errors for 'Wide Charactor'. # We should set utf8 for PerlIO.
Attachment #297800 -
Attachment is obsolete: true
Assignee | ||
Comment 8•17 years ago
|
||
Some methods in PatchReader itself need to be overriden to support utf8.
Assignee | ||
Comment 9•17 years ago
|
||
This patch avoids to override PatchReader methods, and works fine with: - diff mode - raw diff mode - raw interdiff mode - view mode (already working without this patch) But it still fails with: - interdiff mode I'm pretty sure the problem comes from: open my $interdiff_fh, "$lc->{interdiffbin} $old_filename $new_filename|"; binmode $interdiff_fh; For some reason, $interdiff_fh remains unsensitive to FILTER utf8. I tried to pass different MODE flags to open(), without success. This last case still needs investigation. I would like to avoid overriding PatchReader methods, if possible.
Attachment #311909 -
Attachment is obsolete: true
Assignee | ||
Comment 10•17 years ago
|
||
OK, I finally managed to fix the problem in a nice way, which doesn't break non-patch attachments and which works with all modes.
Assignee: shimono → LpSolit
Attachment #298277 -
Attachment is obsolete: true
Attachment #311987 -
Attachment is obsolete: true
Attachment #312086 -
Flags: review?(mkanat)
Assignee | ||
Updated•17 years ago
|
Attachment #312086 -
Flags: review?(shimono)
Comment 11•17 years ago
|
||
Comment on attachment 312086 [details] [diff] [review] patch, v5 >+ $attachment->data; # Initialize ->{data}. >+ utf8::decode($attachment->{data}) if Bugzilla->params->{'utf8'}; That assumes that the attachment data is valid UTF8. utf8::decode only handles valid utf8. Any other encoding will probably cause an error to be thrown. >Index: Bugzilla/Template.pm >- $var =~ s/[\x{202a}-\x{202e}]//g; >+ # If there are invalid utf8 characters (for instance in >+ # attachments) leave the string alone. >+ eval { $var =~ s/[\x{202a}-\x{202e}]//g; }; Why put that in an eval? Can it actually die somehow??
Assignee | ||
Comment 12•17 years ago
|
||
(In reply to comment #11) > That assumes that the attachment data is valid UTF8. utf8::decode only > handles valid utf8. Any other encoding will probably cause an error to be > thrown. Errors never happened in my case. We can hardly make a better guess as the whole page is displayed as UTF-8. That's why I asked himorin for review, to know if this triggers errors with another encoding. > Why put that in an eval? Can it actually die somehow?? Yes, everytime the patches being compared in interdiff have another encoding that UTF-8. e.g. é and à in ISO-8859-1 triggers an error. By ignoring them, the output is fine (i.e. ISO-8859-1 characters are displayed as they used to be when the page encoding is UTF-8).
Comment 13•17 years ago
|
||
(In reply to comment #12) > > That assumes that the attachment data is valid UTF8. utf8::decode only > > handles valid utf8. Any other encoding will probably cause an error to be > > thrown. > > Errors never happened in my case. We can hardly make a better guess as the > whole page is displayed as UTF-8. That's why I asked himorin for review, to > know if this triggers errors with another encoding. I've tested with patch file written in shift_jis encoding, but bugzilla doesn't crash with displaying in the 'diff' mode. I could not read the text (in the 'diff' mode) with setting encoding from ff menu, but it might be acceptable. # should we promote writting files in utf-8 :p Should i test more about using non-utf8 encodings with utf8? (like on interdiff)
Comment 14•17 years ago
|
||
from apache error log Malformed UTF-8 character (4 bytes, need 3, after start byte 0xf0) in substitution (s///) at /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi/Template/Context.pm line 359, <$interdiff_fh> line 12. This is thrown by a sjis to sjis interdiff page. But i could view the interdiff page.
Assignee | ||
Comment 15•17 years ago
|
||
(In reply to comment #14) > from apache error log Bah, I see these warnings as well. :(
Comment 16•17 years ago
|
||
(In reply to comment #12) > > Why put that in an eval? Can it actually die somehow?? > > Yes, everytime the patches being compared in interdiff have another encoding > that UTF-8. e.g. é and à in ISO-8859-1 triggers an error. By ignoring them, > the output is fine (i.e. ISO-8859-1 characters are displayed as they used to be > when the page encoding is UTF-8). Okay. Could you put that in a comment? (In reply to comment #13) > I could not read the text (in the 'diff' mode) with setting encoding from ff > menu, but it might be acceptable. Yes, that's to be expected--we don't know the encoding of patches, so we assume they're all in UTF-8 and if they're not they just won't display correctly (that is, in Diff mode).
Assignee | ||
Comment 17•17 years ago
|
||
(In reply to comment #14) > Malformed UTF-8 character (4 bytes, need 3, after start byte 0xf0) in > substitution (s///) at > /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi/Template/Context.pm line > 359, <$interdiff_fh> line 12. This seems to happen with interdiff only, not with diff, which makes me think that $reader->iterate_fh may not like UTF-8. I will try to slurp the output of interdiff into a variable, decode it, and pass it to $reader->iterate_string instead, and see if this works.
Assignee | ||
Comment 18•17 years ago
|
||
OK, I managed to fix all errors thrown, including warnings in the apache error log. The problem was that get_unified_diff() was calling new PatchReader::DiffPrinter::raw($fh); without calling binmode ':utf8', $fh first, which was responsible for the errors being thrown. This way, I no longer need to hack FILTER html in Template.pm and the eval() can go away.
Attachment #312086 -
Attachment is obsolete: true
Attachment #313109 -
Flags: review?(shimono)
Attachment #313109 -
Flags: review?(mkanat)
Attachment #312086 -
Flags: review?(shimono)
Attachment #312086 -
Flags: review?(mkanat)
Comment 19•17 years ago
|
||
Comment on attachment 313109 [details] [diff] [review] patch, v6 for log. some errors when i queried 'interdiff' mode. [Thu Apr 03 11:37:42 2008] [error] [client 130.54.52.3] [Thu Apr 3 11:37:42 2008] attachment.cgi: Use of uninitialized value in pattern match (m//) at /usr/lib/perl5/site_perl/5.8.5/PatchReader/Raw.pm line 250., referer: http://bugzilla-trunk.test.mozilla.gr.jp/attachment.cgi?id=3671&action=diff [Thu Apr 03 11:38:06 2008] [error] [client 130.54.52.3] [Thu Apr 3 11:38:06 2008] attachment.cgi: Use of uninitialized value in string ne at Bugzilla/Attachment/PatchReader.pm line 212., referer: http://bugzilla-trunk.test.mozilla.gr.jp/attachment.cgi?id=3671&action=diff Version informations. * This is Bugzilla 3.1.3+ on perl 5.8.5 * Running on Linux 2.6.9-67.0.4.ELsmp #1 SMP Sun Feb 3 07:08:57 EST 2008 Cecking for PatchReader (v0.9.4) ok: found v0.9.5 diff and interdiff modes worked fine, too.
Attachment #313109 -
Flags: review?(shimono) → review+
Comment 20•17 years ago
|
||
Comment on attachment 313109 [details] [diff] [review] patch, v6 This all looks fine, code-wise.
Attachment #313109 -
Flags: review?(mkanat) → review+
Assignee | ||
Updated•17 years ago
|
Flags: approval+
Assignee | ||
Comment 21•17 years ago
|
||
Checking in attachment.cgi; /cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v <-- attachment.cgi new revision: 1.144; previous revision: 1.143 done Checking in Bugzilla/Util.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Util.pm,v <-- Util.pm new revision: 1.69; previous revision: 1.68 done Checking in Bugzilla/Attachment/PatchReader.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Attachment/PatchReader.pm,v <-- PatchReader.pm new revision: 1.5; previous revision: 1.4 done
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 22•16 years ago
|
||
we probably have a regression again: https://bugzilla.mozilla.org/attachment.cgi?id=361327&action=diff is broken while https://bugzilla.mozilla.org/attachment.cgi?id=361360&action=diff looks good. Could somebody confirm?
You need to log in
before you can comment on or make changes to this bug.
Description
•