Closed
Bug 304149
Opened 19 years ago
Closed 18 years ago
Until we can switch to UTF-8 fully, make new data coming into bugzilla.mozilla.org be UTF-8
Categories
(bugzilla.mozilla.org :: General, defect, P1)
bugzilla.mozilla.org
General
Tracking
()
VERIFIED
WONTFIX
People
(Reporter: Wurblzap, Assigned: justdave)
References
Details
Attachments
(2 files)
6.27 KB,
patch
|
Details | Diff | Splinter Review | |
7.00 KB,
patch
|
Details | Diff | Splinter Review |
This is inspired by bug 135762. The current plan in bug 280633 is to convert data to UTF-8 more or less one-by-one. I think this task might be made easier if we stopped the heap from growing and made sure we knew the character set of new data coming in.
Reporter | ||
Comment 1•19 years ago
|
||
This is a patch against HEAD how I think this might be facilitated. From the time this is active, longdescs are consistently ISO-8859-1 and may be converted to UTF-8 as a whole.
Reporter | ||
Updated•19 years ago
|
Summary: Make new data coming into bugzilla.mozilla.org be ISO-8859-1 → Until we can switch to UTF-8, make new data coming into bugzilla.mozilla.org be ISO-8859-1
Assignee | ||
Comment 2•19 years ago
|
||
ummm... we're already forcing new data to be UTF-8. Why backpedal on it? Changing existing bugs is the only time there's ambiguity right now.
Assignee | ||
Comment 3•19 years ago
|
||
oh, I see what you're heading for, now that I've looked at the patch. :) Can we do that with UTF-8 instead of ISO-8859-1? We're already sending a content-type header with UTF-8 in it on enter_bug, create_account, and so forth (pages where you create new bugs, accounts, and so forth). But we're not forcing the character set on the forms themselves at all.
Reporter | ||
Comment 4•19 years ago
|
||
Enforcing UTF-8 would essentially be the same thing, but I think ISO-8859-1 might be the better choice because it wouldn't clash as hard with current "legacy" data. That's me assuming it to be ISO-8859-1 mostly by a wide margin. Provided my assumption is close enough, bugs containing mixed legacy/ISO-enforced data would display better. The full switch to UTF-8 wouldn't be much harder if we went for ISO-8859-1 for the meantime. We'd need an additional step to update all longdescs having come in after a certain date, that's all.
Assignee | ||
Comment 5•19 years ago
|
||
(In reply to comment #4) > Enforcing UTF-8 would essentially be the same thing, but I think ISO-8859-1 > might be the better choice because it wouldn't clash as hard with current > "legacy" data. That's me assuming it to be ISO-8859-1 mostly by a wide margin. > Provided my assumption is close enough, bugs containing mixed > legacy/ISO-enforced data would display better. Yeah, unfortunately, that doesn't seem to be a close assumption. From random samplings and the problems I've already run into, the majority of the data that isn't plain ASCII seems to be Windows-1252.
Reporter | ||
Comment 6•19 years ago
|
||
I gather we hurt no matter which way we turn, so we can make the forms send UTF-8 right away. (Note that this patch includes account/prefs/prefs.html.tmpl which I missed in the previous patch.) Assume we apply the UTF-8 patch. We'd end up with new bugs begging for charset=UTF-8 headers, old bugs covering their eyes and wanting no such headers, and active bugs (containg older and newer comments) partly borked up. What's the plan here? We could send charset=UTF-8 headers for bugs created after today, and cut over to sending such headers for all bugs at some time in the future when we start feeling it'll hurt less if we do.
Assignee | ||
Comment 7•19 years ago
|
||
what I want to do is get some sort of utility I can run that will translate a specific single comment or bug field from whatever character set it's in to UTF-8. It should probably do this via a browser so we can make use of the browser's autodetect to do the translation. i.e. have a text field to enter a character set into on an otherwise-blank page that shows only the contents of that field, and have javascript check the charset and fill it into the text field. Then you can use the View menu in the browser to change the character set until it looks right if the auto-detect got it wrong, then submit it and have Perl's Encode module use the character set from the text field as the source character set translating it to UTF-8. Once we have a utility like that, we can just force UTF-8 via the headers on everything except that translation form, and then go use that form on things people report that don't look right.
Reporter | ||
Comment 8•19 years ago
|
||
That sounds very sensible to me. And it's covered by bug 280633, too, which is good. Until we have such a tool, how about we make life a little easier for ourselves in the future and provide for a means along the lines of one of the attached patches, so that we don't need to use this comment-by-comment tool on more comments than we already have on the heap? :)
Assignee | ||
Comment 9•19 years ago
|
||
sounds good to me. what version of Bugzilla is this patch against? b.m.o is likely getting upgraded within the next week. I'll be upgrading it to the tip of the 2.20 branch, so that would be a good place for the patch to be built against. :)
Assignee | ||
Updated•19 years ago
|
Blocks: bmo-upgrade-051022
Reporter | ||
Comment 10•19 years ago
|
||
The patch is against HEAD, but it applies to the 2.20 branch as well. Unless common browsers' charset auto-detection works really well, it seems to me we'll want to start sending charset=UTF-8 HTTP headers for show_bug.cgi not long after this, at least for new bugs.
Reporter | ||
Comment 11•19 years ago
|
||
See also bug 304944...
Reporter | ||
Comment 12•19 years ago
|
||
Morphing to ask for UTF-8 instead of ISO-8859-1 as per comment 3 ff.
Assignee: justdave → wurblzap
Summary: Until we can switch to UTF-8, make new data coming into bugzilla.mozilla.org be ISO-8859-1 → Until we can switch to UTF-8 fully, make new data coming into bugzilla.mozilla.org be UTF-8
Comment 13•19 years ago
|
||
hmmm perhaps modify the patch with 'Make forms force UTF-8' to also add acceptcharset="UTF-8" to be compatible with old Microsoft browser? Microsoft explanation: http://msdn.microsoft.com/workshop/author/dhtml/reference/properties/acceptcharset.asp
Assignee | ||
Updated•19 years ago
|
Assignee: wurblzap → justdave
Priority: -- → P1
Reporter | ||
Comment 14•18 years ago
|
||
Non-UTF-8-characters in https://bugzilla.mozilla.org/show_bug.cgi?id=323905#c17 show why I think we should do this soon. Dave?
Assignee | ||
Updated•18 years ago
|
Assignee: justdave → justdave
Assignee | ||
Updated•18 years ago
|
No longer blocks: bmo-upgrade-051022
Assignee | ||
Comment 15•18 years ago
|
||
*** Bug 345678 has been marked as a duplicate of this bug. ***
Comment 16•18 years ago
|
||
> We're already sending a content-type
> header with UTF-8 in it on enter_bug, create_account, and so forth (pages where
> you create new bugs, accounts, and so forth).
Is it true? 'enter_bug' at b.m.o still just sends 'Content-Type: text/html'. Now that localization bugs are tracked at b.m.o, I think it's very urgent to emit 'text/html; charset=UTF-8' for enter_bug and create_account.
Comment 17•18 years ago
|
||
i was said to comment to this bug about the experience of Bugzilla-ja by dynamis, pikemac, chofmann (or so). Bugzilla-ja is an one of the imprementation of i18n-ed version of Bugzilla, and currently working on bugzilla.mozilla.gr.jp (which is called as bugzilla-jp, and it's for handling bugs about Mozilla products in Japanese. # but,,, we do not test bugzilla-ja code in other 2byte languages.. sorry. Our conventional requirements was followings 1. treat all longdescs as UTF-8 instead of EUC-JP (was used in 2.16-ja) 2. can manage MIME for sending bugmails 3. and should be change char-code account by account (like ISO-2022-JP / JIS) 4. can display UTF-8 well in buglists or so 1st was easy. that we wrote a trans-code program of database contents. put off saved search, we could do it with mysqldump and nkf. but in saved search (and another some fields), there're some %xx encoded words. # this might come from 2.16-ja's code... For 2nd and 3rd, bug-ja modified Bugzilla/BugMail.pm to manage MIME of bugmails. and for 3rd, we add new column into DB/profiles. (to store char-code settings account by account) for 4th, first we thought to use Text::I18NWrap (or WrapI18N), but i could not work that module well on our system. and with another serious problem on buglist, we changed the way. in bugzilla-ja, new code called cutStringUTF8 to manage width of UTF-8 string. and we use that for buglists or bugmails. and for show_bug's comment input area, we set wrap=hard to wrap text before inputting comment data to DB. this should be treated as a bug, but we could not make it success to wrap text easily in perl/template codes. our Bugzilla-ja patch against 2.20 or 2.20.1 of Bugzilla is distributed at ftp://ftp.mozilla.gr.jp as diff patch.
Comment 18•18 years ago
|
||
why does each user have an encoding setting!? what's that used for?
Comment 19•18 years ago
|
||
japanese has many historical encoding like Shift-JIS, EUC-JP, ISO-2022-JP, and these modified version like cp932, euc-jp-ms? or so. currently, standard encoding for e-mail system in japanese encoding is not utf-8 but ISO-2022-JP (called as JIS encoding). and some web-based mail service doesn't recognize UTF-8 encoding. like yahoo or so, and many bugzilla-jp user use them. so, we need the feature to switch encoding for e-mail account by account.
Comment 20•18 years ago
|
||
Comment on attachment 192256 [details] [diff] [review] Make forms force UTF-8 all i can find about acceptcharset= was a struts bug where they messed up their content.
Attachment #192256 -
Flags: review?(justdave)
Comment 21•18 years ago
|
||
Looks like we'll be migrating to UTF-8 after the next upgrade anyway... Gerv
Assignee | ||
Comment 22•18 years ago
|
||
Yeah, we'll be almost completely UTF8 after the next upgrade, which will be happening within the next couple weeks. This isn't worth the effort now.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•18 years ago
|
Attachment #192256 -
Flags: review?(justdave)
Updated•13 years ago
|
Component: Bugzilla: Other b.m.o Issues → General
Product: mozilla.org → bugzilla.mozilla.org
You need to log in
before you can comment on or make changes to this bug.
Description
•