PostgreSQL incorrectly detects that some summaries have control characters when they don't

RESOLVED FIXED in Bugzilla 2.20

Status

()

RESOLVED FIXED
13 years ago
13 years ago

People

(Reporter: LpSolit, Assigned: LpSolit)

Tracking

({regression})

2.20
Bugzilla 2.20
regression
Bug Flags:
approval +
blocking2.22 +
approval2.20 +
blocking2.20.1 +
blocking2.18.5 -
blocking2.16.11 -

Details

(URL)

Attachments

(1 attachment)

(Assignee)

Description

13 years ago
Since bug 101380 is applied, checksetup.pl on qa222pg keeps reporting 25 bugs which he is unable to clean:

bugs_qa222pg=> SELECT bug_id, short_desc FROM bugs WHERE short_desc ~* '[[:cntrl:]]' order by bug_id;
 bug_id |                                                          short_desc                         
--------+------------------------------------------------------------------------------------------------------------------------------
    919 | ÑÑÑÑ
    920 | ÑÑÑÑ2
    921 | ÏγβλγÏκβλμÏκÏγβλμÏβλγβαδÏλη/λθε/'ÏÏιβ'λγÏδαζÏδαζαδÏÏβλγιÏγλÏθεÏ'/
    937 | eynav try ÏιÏιεο Ïμ ÏιÏα
    942 | TEST ÏιÏιεο
   1113 | ì°ìµ
   1306 | íì¤í¸ìëë¤.
   1311 | my test íê¸ìë ¥íì¤í¸
   1384 | ÖÎ?â
   1446 | Test only 中æå­ä¹è¡
   1585 | Blokada okna przy wprowadzeniu zerowej iloï¾ç°i
   1715 | ç°è±¡/è¾»
   1736 | оÑÐµÐ½Ñ Ð¼Ð¾ÑнÑй баг
   1913 | æµè¯ä¸ä¸å¯ä»¥å
   1914 | ãã¹ãã ã
   1918 | ææ½ 
   1997 | ÐÐÐ
   2016 | ç»é¢æ¾ç¤º
   2052 | 中æ
   2162 | â¿Character repertoire and handlingâUnicode?â
   2206 | test zgÅoszenia...
   2248 | Chinese 中æ.
   2401 | ××× ×ª××× ××¢×ר×ת?
   2446 | ¨t²Ã§ï¥Ã¤¤¤å(ÃcÃé)Ãã¥Ã
   2512 | ÑеÑÑ Ð»Ð¾ÐºÐ°Ð»Ð¸Ð·Ð°Ñии
(25 rows)

So either PostgreSQL shouldn't detect them, or clean_text() should fix these bug summaries.

qa222 is not affected due to a bug on landfill when duplicating the DBs, from what mkanat said.


Probably the best workaround is to make checksetup.pl report these bugs only "if clean_text($foo) ne $foo", so that even if PostgreSQL finds these bugs on each run, it won't display them and will avoid confusion when upgrading.

We should probably fix this problem before our next releases, or the number of dupes could be large.
(Assignee)

Updated

13 years ago
Flags: blocking2.22?
Flags: blocking2.20.1?
Flags: blocking2.18.5?
Flags: blocking2.16.11?
(Assignee)

Comment 1

13 years ago
Created attachment 207931 [details] [diff] [review]
patch, v1
Attachment #207931 - Flags: review?(mkanat)

Comment 2

13 years ago
Why would this block 2.18 and 2.16? :-) They don't have any Pg support. For the other ones, though, it's definitely a blocker. Landfill currently is sending me an email every hour because of this, and checksetup really shouldn't be doing output when it's not changing anything.
Flags: blocking2.22?
Flags: blocking2.22+
Flags: blocking2.20.1?
Flags: blocking2.20.1+
Flags: blocking2.18.5?
Flags: blocking2.18.5-
Flags: blocking2.16.11?
Flags: blocking2.16.11-

Comment 3

13 years ago
Comment on attachment 207931 [details] [diff] [review]
patch, v1

Fixes the problem, in testing on landfill.
Attachment #207931 - Flags: review?(mkanat) → review+
(Assignee)

Updated

13 years ago
Assignee: installation → LpSolit
Flags: approval?
Flags: approval2.20?
Target Milestone: --- → Bugzilla 2.20
(Assignee)

Comment 4

13 years ago
For the record:

mkanat: bugzilla-tip on MySQL has correctly-encoded characters.
mkanat: bugzilla-tip on Pg has correctly-encoded characters.
mkanat: In the database, that is.
mkanat: MySQL databases are in Latin1. However, it seems to "know" somehow that some of your text is UTF-8.
mkanat: Pg databases are in ASCII. Pg does not "know" that some strings are really UTF-8.
mkanat: But perl does.
mkanat: DBD::Pg correctly translates those strings into UTF-8.
mkanat: So your SQL turns up a bunch of bugs on Pg, but when perl sees them, it doesn't see anything to clean.
mkanat: You're right in a way, that MySQL *should* be seeing the same problem, but it isn't.
Status: NEW → ASSIGNED

Updated

13 years ago
Summary: PostgreSQL seems to incorrectly detect control characters... or clean_text() incorrectly fixes them → PostgreSQL incorrectly detects that some summaries have control characters when they don't
Target Milestone: Bugzilla 2.20 → ---

Updated

13 years ago
Target Milestone: --- → Bugzilla 2.20
Version: 2.21 → 2.20
I think the real answer here is that Pg has a different definition of [:cntrl:] than MySQL does.  MySQL's [:cntrl:] only counts control characters without the high bit set.  Pg's [:cntrl:] obviously counts control characters that have the high bit set.  This is perfectly legal.  The POSIX regexp spec is remarkably ambigious, and passes it off as "locale-dependent".
Flags: approval?
Flags: approval2.20?
Flags: approval2.20+
Flags: approval+

Comment 6

13 years ago
I did the checkin, because I didn't want to get another set of emails, all night. :-)

The patch didn't apply cleanly on 2.20, but I fixed that on checkin. (I used PatchReader's ability to change the number of context lines, and with it set to "1" the patch fuzzed-in fine, and I tested it and it worked.)

Checking in checksetup.pl;
/cvsroot/mozilla/webtools/bugzilla/checksetup.pl,v  <--  checksetup.pl
new revision: 1.469; previous revision: 1.468
done

2.20:

Checking in checksetup.pl;
/cvsroot/mozilla/webtools/bugzilla/checksetup.pl,v  <--  checksetup.pl
new revision: 1.412.2.22; previous revision: 1.412.2.21
done
Status: ASSIGNED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.