Closed Bug 287682 Opened 19 years ago Closed 11 years ago

UTF-8 characters are incorrectly displayed in New Charts and graphical reports

Categories

(Bugzilla :: Reporting/Charting, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 5.0

People

(Reporter: roman, Assigned: LpSolit)

References

()

Details

(Keywords: intl)

Attachments

(1 file, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7b) Gecko/20040421 MultiZilla/1.6.3.1d
Build Identifier: 

At http://tinyurl.com/4jyql please look at Spider product - UTF-8 extended
characters are broken. 

Reproducible: Always

Steps to Reproduce:
1. generate any graphic report which contains UTF-8 chars (on the picture)
Actual Results:  
UTF-8 chars are broken.

Expected Results:  
UTF-8 chars should be displayed correctly.

According to http://www.boutell.com/gd/manual2.0.33.html - 'The string may
contain UTF-8 sequences like: "À"', but I don't know if it's usefull in
this case.
If you are going to switch to UTF-8 in 2.20 (bug 126266), it would be nice to
display all UTF-8 characters correctly
Flags: blocking2.20?
Depends on: bz-charset
bug 126266 which is a prerequisite for this has already missed the boat for
2.20, pushing this back accordingly.

Confirming, I can reproduce this on landfill even with all the UTF8 stuff applied. 
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking2.20? → blocking2.20-
Target Milestone: --- → Bugzilla 2.22
Attached patch Patch (obsolete) — Splinter Review
This patch goes the way suggested in comment 0. We need at least gdlib 2.0.26 for this, but I don't know how to enforce this. The patch doesn't work for me, btw -- either I have a too-low version of gdlib (how can I check?), or there is some other error left.
Assignee: gerv → wurblzap
Status: NEW → ASSIGNED
Attachment #205847 - Flags: review?
Note to self: the entity mangling should be conditional on the utf8 parameter. Fix for next patch or checkin.
Comment on attachment 205847 [details] [diff] [review]
Patch

This doesn't fix the problem with gd 2.0.28 (version can be shown with gdlib-config --all) and GD module v2.30. Looks like the values in the resulting data hash are not quoted.

>Index: report.cgi
>===================================================================
>+foreach (@col_names) {
>+    $_ = css_class_quote($_);
>+}

Using css_class_quote seems misleading as we are not quoting CSS classes. Or are we? Maybe filters should be added to the template(s) instead?

> $vars->{'col_names'} = \@col_names;
> $vars->{'row_names'} = \@row_names;
> $vars->{'tbl_names'} = \@tbl_names;

Shouldn't row_names and possibly others be UTF encoded too?
Attachment #205847 - Flags: review? → review-
I can't seem to make it work. By now, I suspect that the sentence about entity sequences refers to gdlib's gdImageStringFT function only.

Bailing.
Assignee: wurblzap → gerv
Status: ASSIGNED → NEW
http://tinyurl.com/4jyql shows a double conversion: some process is assuming that the data is in either ISO-8859-2 or ISO-8859-4 and converting it from that to UTF-8.
so, now  there is still no solution ?
i think you should specify some font contains griphs for the data..
like

  graph.set_x_label_font(Param('graphfontname'), 9);
  graph.set_x_axis_font(Param('graphfontname'), 9);
  graph.set_title_font(Param('graphfontname'), 9);
  graph.set_legend_font(Param('graphfontname'), 9);

in template/*/default/reports/report-*.png.tmpl.
You can use utf-8 data without converting to &#xx;, i think.

# but,,  i heard that l18n is not the current target of bugzilla??
Confirmed for 3.0+ per bug 364505, kindly add 'intl' keyword.

IMHO correct solution is offered by himorin.  But there are different ideas about font value:

1. Multiple font specification is supported by set_xxx_font.  If we manage to invent reasonable cross-platform default (just like <font face="Verdana,Arial"> in HTML and CSS) -- we can put these into default templates.  See http://search.cpan.org/dist/GDGraph/Graph.pm#FONTS

2. If correct fonts are not installed at all on some platforms and/or locales, and there are good public domain fonts around -- we can document it right after  GD installation instructions, and still hardcode the reference into default templates.

3. We can use a parameter (also a backport from Bugzilla-ja).

4. If we distinguish between server administrators and Bugzilla instance maintainers (Bug 364505 comment #0):
> Why localconfig and not data/params:

> - Font file paths are OS dependent
> - Local files may be not accessible by maintainer at all
Keywords: intl
The 2.22 branch is restricted to security bugs -> 3.2 (unless you can attach a non-invasive patch for 3.0 before it becomes restricted to security bugs too, in which case I will retarget the bug to 3.0).
Target Milestone: Bugzilla 2.22 → Bugzilla 3.2
Assignee: gerv → charting
Depends on: 427961
I found a way to address the problem in GD::Graph

* define FONT_PATH in apache config: (for example, on my Ubuntu machine)

| FONT_PATH /usr/share/fonts/truetype/msttcorefonts

* in all the "template/.../default/reports/report-*.png.tmpl" define the
fonts:

|  graph.set_title_font(['verdana', 'arial'], 8);
|  graph.set_x_label_font(['verdana', 'arial'], 8);
|  graph.set_y_label_font(['verdana', 'arial'], 8);
|  graph.set_x_axis_font(['verdana', 'arial'], 8);
|  graph.set_y_axis_font(['verdana', 'arial'], 8);
|  graph.set_y_values_font(['verdana', 'arial'], 8);
|  graph.set_legend_font(['verdana', 'arial'], 8);

GD::Text will then use the TrueType font if available and will work with
UTF8.

P.S. I don't really like the idea of defining the FONT_PATH in the
apache config. Would it be a good idea to add an option in the parameters?
(In reply to comment #16)
> I found a way to address the problem in GD::Graph
> 
> * define FONT_PATH in apache config: (for example, on my Ubuntu machine)

You can specify fonts with the full path in templates.

I think we have another bug with testing patches. dupme? or did we divided the problem?
(In reply to comment #17)
> > * define FONT_PATH in apache config: (for example, on my Ubuntu machine)
> You can specify fonts with the full path in templates.

Again, separating font names from paths is good and sometimes necessary -- when server administrators and Bugzilla maintainers are not the same people (Bug 364505 comment #0):
| Why localconfig and not data/params:

| - Font file paths are OS dependent
| - Local files may be not accessible by maintainer at all

Generic font names would work for many instances, keeping templates distribution-agnostic.

Not sure whether httpd.conf is more convenient than any other place.
> (In reply to comment #17)
> I think we have another bug with testing patches. dupme? or did we divided the
> problem?

Bug 427961 you mean?  See also bug 287684.
Bugzilla 3.2 is restricted to security bugs only. Mass-retargetting to 3.6.
Target Milestone: Bugzilla 3.2 → Bugzilla 3.6
(In reply to comment #21)
> *** Bug 564629 has been marked as a duplicate of this bug. ***

Wow! 5 years this bug waiting for resolve!
Flags: blocking4.0?
Not a blocker. This bug exists for a long time.
Flags: blocking4.0? → blocking4.0-
Bugzilla 3.6 is now restricted to security fixes only, and this bug got no traction for several months. We will retarget this bug once it has a patch ready for checkin.
Target Milestone: Bugzilla 3.6 → ---
No longer depends on: 427961
Unifont is the most complete and free font I know. We should point to it by default.
http://unifoundry.com/unifont.html just released unifont-6.3.20131006.ttf, which can be installed on all machines. It has 100% coverage in the Unicode 6.3 Basic Multilingual Plane. That's all we need to fix this bug. This file is pretty big (14 Mb), so it cannot be included in the Bugzilla tarball. But it's not unreasonable to ask admins to install this file in their system. Then we can let Bugzilla look for it (/usr/share/fonts/TTF/ on Linux, C:\Windows\fonts on Windows).
Attached patch patch, v1 (obsolete) — Splinter Review
I finally added a parameter as suggested in bug 427961. This will let admins specify another path to the font (e.g. the bugzilla/ directory if installed locally) or another font if they really want to, such as the proprietary Arial Unicode font included in Microsoft Office.
Assignee: charting → LpSolit
Attachment #205847 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #816836 - Flags: review?(dkl)
Attached patch patch, v1.1Splinter Review
Oops, forgot to reword a sentence in the documentation.
Attachment #816836 - Attachment is obsolete: true
Attachment #816836 - Flags: review?(dkl)
Attachment #816841 - Flags: review?(dkl)
Comment on attachment 816841 [details] [diff] [review]
patch, v1.1

Marc: maybe you are interested in reviewing this patch as german is affected by this problem?
Attachment #816841 - Flags: review?(bugzilla.1.wurblzap)
Yup, ok.
I tried the patch, and it didn't help at all, with the parameter set to a downloaded Unifont file. What might I be doing wrong?

Non-ASCII characters are being displayed as two seemingly unrelated characters, before and now. Are you definite that this is not a character set encoding issue?
(In reply to Marc Schumann [:Wurblzap] from comment #32)
> I tried the patch, and it didn't help at all, with the parameter set to a
> downloaded Unifont file. What might I be doing wrong?

What is font_file set to? Which OS?


> Non-ASCII characters are being displayed as two seemingly unrelated
> characters, before and now. Are you definite that this is not a character
> set encoding issue?

Without the patch, non-ASCII characters are unreadable. With the patch applied and the parameter above set to point to the .ttf file, all characters are displayed correctly (even Cyrillic, Chinese, and accentuated characters)
In practice, no single font can be truly universal. For example for the CJK unified ideograms, it can contain either the most appropriate shapes for Chinese, or the most appropriate ones for Japanese, but not for both.

It would be better to be able to either specify for each unicode range which font is preferred, or to be able to list several font to use in preferential order.
The second option might well be the simplest, Chinese, Japanese and Korean users could then first list their favorite national font, and then use unifont as a backup for characters that are missing inside it.
(In reply to Jean-Marc Desperrier from comment #34)
> In practice, no single font can be truly universal.

As I said, unifont has full support for the BMP in Unicode 6.3. This should be sufficient for most cases as MySQL utf8 encoding is unable to support characters outside BMP anyway.


> It would be better to be able to either specify for each unicode range which
> font is preferred, or to be able to list several font to use in preferential
> order.

You cannot do that. It's not possible to use several fonts with ligbd. So if you have characters from a wide range of Unicode points, you have to select only one font anyway. It doesn't make sense to ask admins to specify fonts per Unicode range.
Comment on attachment 816841 [details] [diff] [review]
patch, v1.1

Ok, I know why I saw what I saw during review: I checked the Old Charts (reports.cgi). These are still broken. But the patch fixes the issue for me in New Charts (charts.cgi), so r=Wurblzap provided you change the bug title mentioning that you fix New Charts only. Is there a corresponding bug for Old Charts? If not, can you please file one?
Attachment #816841 - Flags: review?(bugzilla.1.wurblzap) → review+
(In reply to Marc Schumann [:Wurblzap] from comment #36)
> Is there a corresponding bug for Old Charts? If not, can you please file one?

I didn't find one for Old Charts. Per http://search.cpan.org/~chartgrp/Chart/Chart.pod#TO_DO, the Chart module doesn't support True Type fonts, only GD fonts, which do not support UTF8 characters. So I doubt we can do anything about old charts if we still use Chart. And per bug 232113, I doubt a rewrite of the Old Charts code is going to happen. If you want a bug for it, feel free to file it. :)

Thanks for the review!
Flags: blocking4.0-
Flags: blocking2.20-
Flags: approval?
Keywords: relnote
Summary: UTF-8 chars incorrectly displayed on the graphic reports → UTF-8 characters are incorrectly displayed in New Charts and graphical reports
Target Milestone: --- → Bugzilla 5.0
Attachment #816841 - Flags: review?(dkl)
Flags: approval? → approval+
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla/Config/Common.pm
modified Bugzilla/Config/DependencyGraph.pm
modified docs/en/xml/administration.xml
modified template/en/default/admin/params/dependencygraph.html.tmpl
modified template/en/default/reports/chart.png.tmpl
modified template/en/default/reports/report-bar.png.tmpl
modified template/en/default/reports/report-line.png.tmpl
modified template/en/default/reports/report-pie.png.tmpl
Committed revision 8806.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Added to relnotes for 5.0rc1.
Keywords: relnote
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: