Closed Bug 280633 (bz-recode) Opened 20 years ago Closed 18 years ago

Tools to migrate existing legacy encoded database to UTF-8 (Unicode)

Categories

(Bugzilla :: Installation & Upgrading, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
Bugzilla 3.0

People

(Reporter: bmo, Assigned: mkanat)

References

Details

(Whiteboard: [Implementation: Comment 27])

Attachments

(3 files, 8 obsolete files)

This is a spin off bug from bug 126266 (Use UTF-8 (Unicode) charset encoding for pages and email). Tools and/or documentation are required to provide a method for users of bugzilla databases containing legacy data encodings to move to a UTF-8 encoded database. The tools will enable them to automatically transcode known encodings into UTF-8, or statistically detect encoding and where confidence is high, transcode to UTF-8. This will enable existing bugzilla installations to upgrade to the use of UTF-8 encoding throughout the entire database.
I have a work-in-progress CGI that does manual re-encoding of individual comments. It's not exactly a bulk conversion tool, but I was figuring on throwing the "not valid utf-8" detection stuff into show_bug and adding a button for privileged users next to such comments to allow them to recode them. It uses the web browser's charset detection stuff to decide which charset to recode from (because Mozilla's charset detection usually seems to work better than the stuff included with Perl, and it also lets the user override it if it's wrong and they can figure it out on their own), but uses the Perl Encode module to do the recoding once the user submits which character set to use for the source.
Would adding a field for charset to the longdescs table be appropriate? ... and then it could be converted to UTF-8 (or another selected charset) on the fly as the page is viewed. This would avoid having to touch the underlying data, unless you a specific comment must be converted to UTF-8 in the DB too (e.g. for searchability), in which case bug 540 would seem to provide an appropriate mechanism that a recoding CGI could tack on to.
Copied from bug 126266 comment #195 (there are some overlaps with comment #1 here) 1) send emails to those with their names stored in ISO-8859-1 (I've never seen anyone use non-ASCII characters in encodings other than ISO-8859-1 for their names at bugzilla.mozilla.org) to update their account info in UTF-8. 2) Begin to emit 'charset=UTF-8' for bugs filed after a certain day.(say, 2005-03-01). Do the same for current bugs with ASCII characters alone in their comments and title. 3) For existing bugs, add a very prominent warning right above 'additional comments' text area that 'View | Character Encoding' should be set to UTF-8 if one's comment includes non-ASCII characters. This is to ensure existing bugs are not further 'polluted' with non-UTF-8 comments. We may go a step further and add a server-side check for 'UTF8ness' of a new comment. If it's not UTF-8, send them back with the following message: - Your comment contains non-ASCII characters, but it's not in UTF-8. Please, go back, set your browser to use 'UTF-8' and enter your comment again.... The actual content of a comment : blahblah..... 4) If we really want to migrate all existing bugs to UTF-8, add a button to each comment to indicate the current character encoding. If necessary, this button can be made available only to the select group of people knowledgable enough to identify encodings reliably (and perhaps the authors of comments) 5) search/query may need some more tinkering... As for the charset detection, I wouldn't trust it much for a short span of text like bugzilla comments. If it's not automatic and merely given as a hint to the privileged users (as described in comment #1), that would be all right.
Sorry for putting it on the wrong bug. Bug 126266 comment 198: # A while back someone proposed on IRC that we turned on UTF-8 for every bug ID # that was greater than a certain number. Although that isn't a perfect solution # (still mixed character encoding in the database) it would at least make all new # bugs and all comments on new bugs forward compatible.
(In reply to bug 126266 comment #202) > (In reply to bug 126266 comment #201) > > Anne: that breaks things when you have content from multiple bugs on the same > > page, such as buglists or longlist.cgi output. This is not new, either. It was given as a reason for not doing that in 2002(?). Result? We've kept accumulating bugs with mixed encodings for 2+ more years. > So does the current solution of allowing any encoding. Absolutely. > Sending pages that have content from multiple bugs as UTF-8 and sending all bugs > with ID > current bug number as UTF-8 seems like a reasonable start to me. Can we start sometime soon? At minimum, a prominent warning (in red or whatever is the most clear way) just after 'Additional Comments' that reads "Set View | Encoding to UTF-8 before adding a comment with non-ASCII characters" should be added now. > We'll stop accumulating content of unknown encoding in new bugs, and there will > still be a way to view the content on the older bugs (by viewing the bug as its > own page) if there's some content that isn't UTF-8. For list pages with multiple encodings, one can always override the encoding emitted by HTTP/meta manually. Needless to say, only bugs with that encoding will be shown correctly at a time while all other bugs in the list will be mangled. However, that's not any worse than what we have now. Currently we don't emit MIME charset so that the default char. encoding of a user is used (or with a browser with autodetector like Mozilla, whatever autodetector detects). With UTF-8 emitted, UTF-8 will be used by default.
Does anyone have a hunch about the number of bugs where the following would be wrong: 1) If a given comment is a valid UTF-8 byte sequence, assume it is UTF-8. 2) Else, assume it is Windows-1252-encoded and convert to UTF-8. 3) Label the output as UTF-8. ? (FWIW, when I assessed, whether that approach would work for www.mozilla.org, it appeared that it was ok for pretty much everything except i18n test cases and evang letter translations.)
(In reply to comment #6) > Does anyone have a hunch about the number of bugs where the following would be > wrong: I can't give you the number, but theare quite a lot of them. > 1) If a given comment is a valid UTF-8 byte sequence, assume it is UTF-8. > 2) Else, assume it is Windows-1252-encoded and convert to UTF-8. Step 2 is dangerous and we should never take that step. We don't have to do that, either. What justdave and I sketched in comment #1 and comment #4 (step 4) is a more sensible approach. We can live with mixed encodings in the output of lists and in existing bugs, but we cannot afford to lose/damage data. > (FWIW, when I assessed, whether that approach would work for www.mozilla.org, www.mozilla.org and bugzilla.mozilla.org are totally different beasts. In the former, most, if not all, documents had been in ISO-8859-1 (actually US-ASCII). If somebody wanted to add content beyond ISO-8859-1, they used UTF-8. So, it's not surprising that it worked well for www.mozilla.org. bugzilla.mozilla.org has a number of bugs with comments in KOI8-R, Windows-1251, Shift_JIS, EUC-KR, GB2312, Big5, etc.
For b.m.o, do we have any statistics on how many old non-UTF8 comments we have? Or how many bugs are affected? Various people have asserted "many", "thousands", "hundreds", etc. What's the actual number? I favour the idea of a run-once script to convert to UTF-8 (using a per-comment guessed encoding based on an operator-set default), and providing an administrator mode to specify an encoding to re-convert any given comment (i.e. back from UTF-8 to the guessed encoding, then forward to the newly-specified encoding). How about a tool which allows the administrator to specify a small number of encodings and then request "show me what this comment would have looked like if I had converted it in each of these encodings". Pick one and you're done. We would have to store the guessed encoding for each comment, probably in a separate table. Yes, I know that b.m.o (for instance) has a bajillion non-ASCII comments (although I don't know, and it looks like nobody knows, how big the "bajillion" actually is). I don't know how many of those can't be correctly guessed using an informed encoding guesser. Does anyone? Of course, we can't automatically tell that an encoding guesser has got it wrong: they need eyeballing. My suspicion is that b.m.o has a few million comments, of which a few thousand are non-ISO-8859-15, and maybe a hundred of those are hard to guess. But that's really just a wild guess.
Nick: if you can write some SQL query which will tell us, I'm happy to run it. I could also dump the comments to a file and run a script over that if necessary. Gerv
Just a quick thought that occurred to me... (and looking back appears to have occurred to me in a different form in comment 2) A column could be added to the longdescs table recording the browser's charset (with NULL for unknown, and all existing data) with almost immediate effect. i.e. no conversions, no UI changes etc. but just recording the data. This way, we would at least know that all future data has tha capability to be reliably converted to UTF-8 when the rest of the code catches up. It could also act as a hint for existing data on the same bug, or by the same user.
Attached file script to count non-utf8 data (obsolete) —
OK, here's the statistics from b.m.o as of 2005-03-24 23:00 PST: total rows non-ascii non-utf8 attachments.description: 178536 524 199 attachments.filename: 178536 56 51 attachments.mimetype: 178536 6 0 bugs.alias: 287484 3 3 bugs.bug_file_loc: 287484 86 57 bugs.short_desc: 287484 563 455 bugs.status_whiteboard: 287484 3 2 longdescs.thetext: 2454941 40674 13631 namedqueries.name: 11757 24 23 namedqueries.query: 11757 0 0 quips.quip: 1805 13 6 series.name: 788 0 0 series.query: 788 0 0 series_categories.name: 143 0 0 whine_events.subject: 9 0 0 whine_events.body: 9 0 0 whine_queries.query_name: 7 0 0 whine_queries.title: 7 0 0 These numbers were generated with the attached script (though I prettied it up a little bit for this bug comment).
of course, I swiped the header off of another file and forgot to fix the contributor line. oops :)
Ack, I missed one: profiles.realname: total: 192221 non-ascii: 3688 non-utf8: 3618
Here's a fixed up copy of the script. Includes the column I forget, fixes the license header, and the output actually looks like what I posted to the bug now.
Attachment #178544 - Attachment is obsolete: true
Dave, Nice work :-) We need to remember, of course, that just because something decodes as UTF-8 doesn't necessarily mean that it is. It would be good to have figures for open bugs only - that gives us a better handle on the scale of the problem in practice. I would do it myself but the script is owned by root and not writable by other users. Looking at the data, the big problems are the values which no-one can fix up afterwards - attachment descriptions and comments. No surprise there. Gerv
Here's some data for open bugs, given that I think we care a lot less if resolved bugs get a bit mangled. total rows non-ascii non-utf8 attachments.description: 31153 195 75 attachments.filename: 31153 27 23 attachments.mimetype: 31153 4 0 bugs.alias: 53971 1 1 bugs.bug_file_loc: 53971 20 14 bugs.short_desc: 53971 99 68 bugs.status_whiteboard: 53971 0 0 longdescs.thetext: 415466 8580 3021 This does seem to make the problem a lot less scary. The comments and the attachment descriptions are the key ones, as people can't fix up manually if we mess them up. 75 attachment descriptions isn't really very many. I bet if we reduced it to non-obsolete attachments it would be even fewer. What's the next step? Evaluate the performance of some charset-guessing Perl modules on the relevant comments? Wild idea: could we detect the Accept-Charset of each user, store it, and use it as a first guess for the comments they've added? Gerv
What about attachments.thedata? In an installation I help to admin there are still 6560 non-utf8 attachments.
(In reply to comment #17) > What about attachments.thedata? In an installation I help to admin there are > still 6560 non-utf8 attachments. Attachments can be binary. Attachments are arbitrary data, and it doesn't really matter what charset they are. Whether it matters or not probably depends a lot on the mime type.
(In reply to comment #17) > What about attachments.thedata? In an installation I help to admin there are > still 6560 non-utf8 attachments. We MUST leave them alone other than fixing MIME type if necessary. ('text/*' => 'text/*; charset=XYZ') (In reply to comment #16) > Here's some data for open bugs, given that I think we care a lot less if > resolved bugs get a bit mangled. > > total rows non-ascii non-utf8 > attachments.description: 31153 195 75 > mess them up. 75 attachment descriptions isn't really very many. I bet if we > reduced it to non-obsolete attachments it would be even fewer. > > What's the next step? Evaluate the performance of some charset-guessing Perl > modules on the relevant comments? If it's only 75 (or 195), why bother with evaluating Perl charset-guessing modules? Just doing it manually would be easier even if that means writing to those who attached them to ask them to identify the charset. My point is that I wouldn't use any charset-guessing (other than UTF-8 vs non-UTF-8, which is not fool-proof either as you wrote) for this kind of task. It can be pretty much automated if a form mail is sent to those who attached them asking them to identify charset. Hmm, with this possibility, the number of non-ASCII attachment descriptions matters less (probably, the higher they're, the lower the response ratio will be because some people are not around any more) > Wild idea: could we detect the Accept-Charset of each user, store it, and use it > as a first guess for the comments they've added? That may or may not work. 'Accept-lang' + the most widely used charset for the language could help, too. Again, with only 75, I don't see much point in tinkering with ideas like that.
jshin: the ideas I mentioned about charset guessing and accept-lang were really ideas for the 3000+ comments, not the attachment descriptions. I agree that 75 attachment descriptions could be fixed manually - or even, perhaps, just ignored. Gerv
Sorry for misunderstanding. As for 3000+ comments, I would try automating the conversion the way I described in my previous comment (send out form mails replies to which can be automatically validated and processed with the help of Perl's encoding module) instead of relying (solely) on the not-so-reliable perl encoding guessing module (ok, it should be quantified, but my hunch is that it's not so good).
(In reply to comment #21) > Sorry for misunderstanding. As for 3000+ comments, I would try automating the > conversion the way I described in my previous comment (send out form mails > replies to which can be automatically validated and processed with the help of Obviously, we don't have to do things the way we would have done before the web came out. Instead of sending out form mails, a simple web interface can be set up and the link to it can be mailed so that those who wrote comment can identify the encoding of their comments. The response can be processed either in batch or on-line('in situ'). I don't expect everyone to respond for every comment, but this will significantly reduce the need to rely on the manual fixing or encoding guessing
(In reply to comment #22) > link to it can be mailed ... or initially, just included in the bugzilla page header/footer when any logged in user has comments that need converting. It doesn't seem to me to be an important enough issue to mass mail everyone about. In most cases a user's comments will be in the same charset, so might be useful to offer bulk conversion (select charset for one comment, and confirm that all the others display correctly). Hmmmm.... wonder what it currently makes of these: "«ö¦¹¶i¤J°T®§½s¿è..." "¸Þ½ÃÁö¸¦ ÆíÁýÇÏ·Á¸é ÀÌ°÷À» Ŭ¸¯ÇϽÿÀ..." "ƒGƒfƒBƒbƒgƒ�ƒbƒZ�[ƒW‚𓾂é‚É‚Í�A‚±‚±‚ðƒNƒŠƒbƒN‚µ‚Ä‚­‚¾‚³‚¢..."
(In reply to comment #23) > (In reply to comment #22) > > link to it can be mailed > > ... or initially, just included in the bugzilla page header/footer when any > logged in user has comments that need converting. > > It doesn't seem to me to be an important enough issue to mass mail everyone > about. What I have in mind is to send a *single* email to each user to alert about comments that need to be converted with the link to a single *dynamic* web page that lists links to all the comments (not the actual content) made by her or him that have not yet been converted to UTF-8. One can identify the character encoding of comments (s)he made at her/his leisure. > In most cases a user's comments will be in the same charset, so might be useful > to offer bulk conversion (select charset for one comment, and confirm that all > the others display correctly). That's a nice idea, but should offer an option to do that one by one because it's not always the case.
(In reply to comment #24) > What I have in mind is to send a *single* email to each user to alert about [how a user might read such a mail if this is done naïvely] Dear user, you made one or two comments in bugs a couple of years back, and these bugs have still not been fixed... and although you never used bugzilla again, nor even asked to be kept informed about the status of the bug, you had the audacity to enter one or two non-ASCII characters in a couple of comments. So now we want you to dig out your login details and come back to the site just to save us having to figure out what the characters you meant to enter were, even though this should be obvious to anyone that cares. Not sure if anyone else will even read your comment again after you've done this, but we do like to keep our database in order, and obviously this is far more important than fixing the 2-year old bug. Even the banner I suggested would be quite in-your-face, but at least in this case the user is already logged in, so we know they can deal with it in a couple of clicks... Many comments will be barely worth 10 seconds of a users time to fix as it will be obvious to anyone reading what the missing character is. Take bug 187403 comment 10 as a random example. > That's a nice idea, but should offer an option to do that one by one because > it's not always the case. Ummm... lets try "(select charset for one comment, and confirm [by checking a box next to each one] that all the others display correctly)".
(In reply to comment #25) > (In reply to comment #24) > > What I have in mind is to send a *single* email to each user to alert about > > [how a user might read such a mail if this is done na�vely] > > Dear user, you made one or two comments in bugs a couple of years back, and > these bugs have still not been fixed... and although you never used > bugzilla again, nor even asked to be kept informed about the status of Depending on how it's worded, that's certainly possible. There's a trade-off between two approaches. Your approach would have a higher response rate among those who're asked to identify the char. encoding, but it leaves out those who don't log on on a regular basis but who may be interested enough to do the chore. Two approaches can be combined. Try your approach for a certain period (say, a month) and then mail to the rest of people.
Idea!!! Just thought of a way this can be done so that ALL pages are displayed in UTF-8 without having to change a single byte of existing comments... if a comment doesn't decode as valid UTF-8, then display it something like this: ----- Additional Comment #99 From user@example.com 2001-01-01 01:01 ----- [ No charset specified. Ambiguous characters shown as #. _View_Raw_Comment_ ] I'm getting a sense of d#j# vu about this. (or perhaps using some other character such as ' ', U+2588, '?', or the 'empty box' character that windows uses when a font is missing a character, rather than '#', and the placeholder character stylised (e.g. linkified, bold, etc.) to make it look different from the same character actually appearing in the comment) The original comment author, and any sufficiently empowered user should probably also a link to a page allowing them select the correct character set and convert to UTF-8 next to the View Raw Comment link. For "View Raw Comment", it would serve a page containing JUST that comment and no charset header, and use the browsers charset detection as at present.
slee: you're a genius! :-) That's exactly what we should do. We make a best-efforts rendering in the body of the bug (perhaps using charset guessing, perhaps not) but allow the raw comment to be served alone if necessary. Gerv
Comment on attachment 178546 [details] script to count non-utf8 data Shouldn't this also be checking: ['bugs_activity','added'], ['bugs_activity','removed'], ... and possibly also admin-defined data such as product/keyword/flag descriptions.
More specifically, we should add a show_comment.cgi script which takes a bug number and comment number, and just displays the raw text of the comment, with no processing or hyperlinking, and no charset header. A link to this could be embedded in all the comments which we couldn't convert. Gerv
For people who are lucky enough to have a database in a single encoding, here is a hack to convert an unsuspecting iso-8859-1 encoded 2.19.3 Bugzilla database to UTF-8.
(In reply to comment #28) > slee: you're a genius! :-) That's exactly what we should do. We make a > best-efforts rendering in the body of the bug (perhaps using charset guessing, > perhaps not) but allow the raw comment to be served alone if necessary. I'm with Gerv on that. I want it!!! :) :)
Attachment #185456 - Attachment description: Script to convert single-encoding databases to UTF-8 → Script to convert single-encoding 2.19.3 databases to UTF-8
*** Bug 304944 has been marked as a duplicate of this bug. ***
Nice script Any reason not to add a safety check like.... eval {Encode::decode("utf-8", $new, 1);}; if ($@) { Encode::from_to($new, "iso-8859-1","utf-8"); so that it will not reconvert anything that is already converted?
A few commnents.... 1) VERY NEAT TOOL 2) It would be best to use Bugzilla::DB to open the database and get the handle. That will make it work regardless of port/socket options. Can anyone think of any cases where this conversion could lose something? I can't unless it were run on a database that already has UTF8 data in it.
By the way... it looks like ven bugzillas that specify ISO8859-1 wind up with special "windows" characters. Probably, this means that we need to convert from "cp1252" rather than "iso8859-1" It is "supposed" to be a superset. http://czyborra.com/charsets/codepages.html#CP1252
Whiteboard: [Implementation: Comment 27]
Target Milestone: --- → Bugzilla 2.24
*** Bug 311398 has been marked as a duplicate of this bug. ***
With newer version of mysql (4.1 and up for sure), it seems that it is not necessary to do this. Mysql already knows that all the tables are Latin1. If we tell mysql which tables/fields we want to change to UTF8 and which we want to change to binary, then it will handle sorting, regexp matches, length(), properly and it will convert the existing data automatically. In fact, there is even a warning in the documentation that the only way to supress the automatic conversion is to change a table to BINARY first and then to the new encoding. I have not experimented with this yet (I used a hacked version of the tool here for my site), but it looke like the way to go.
Blocks: 322445
Blocks: 335151
Assigning this to me on my work account to make sure it stays on my plate prior to the b.m.o upgrade. It'll be at least a few weeks before I get to it though, I can promise you that. If someone else wants to grab it between now and then, please do. The sooner this is done the sooner we can upgrade b.m.o :)
Assignee: nobody → justdave
Priority: -- → P2
Attached patch Work In Progress (obsolete) — Splinter Review
Okay, things are looking good in this department. :-) It turns out there's an Encode::Detect module that hooks into the Gecko Universal Charset Detector. :-) The code I've posted works extremely well. It just doesn't actually update the database yet. But you can see in the GUI that the conversions work really well.
Assignee: justdave → mkanat
Status: NEW → ASSIGNED
Attached file v1: contrib/recode.pl (obsolete) —
Okay, here's a script that does the conversion. Its guessing is usually pretty good. If you run it, it will explain how it works.
Attachment #185456 - Attachment is obsolete: true
Attachment #239676 - Attachment is obsolete: true
Attachment #239884 - Flags: review?
Attachment #239884 - Flags: review? → review?(justdave)
*** Bug 135762 has been marked as a duplicate of this bug. ***
Alias: bz-utf8-migrate
Attached file v2 (obsolete) —
Okay, now it has a --dry-run argument. I'd be really interested in seeing the results of --dry-run from a very large, old Bugzilla database like bmo. FWIW, it works extremely well on landfill's bugzilla-tip.
Attachment #239884 - Attachment is obsolete: true
Attachment #240080 - Flags: review?
Attachment #239884 - Flags: review?(justdave)
Attachment #239884 - Attachment is patch: true
Attachment #239884 - Attachment is patch: false
Attachment #240080 - Attachment is patch: false
Attachment #240080 - Flags: review? → review?(justdave)
Attached file v3 (obsolete) —
Okay, here's version 3. I've improved the failure detection slightly (it now only says it failed when it really failed) and I've added the ability to override the encodings for certain values.
Attachment #240080 - Attachment is obsolete: true
Attachment #240080 - Flags: review?(justdave)
Attached file v3.1 (obsolete) —
I fixed the output format slightly.
Attachment #240093 - Attachment is obsolete: true
Attachment #240094 - Flags: review?(justdave)
Attached file v3.2 (obsolete) —
I realized why my "eval use" wasn't working--because eval { use } never works! :-) So I fixed that here.
Attachment #240094 - Attachment is obsolete: true
Attachment #240251 - Flags: review?(justdave)
Attachment #240094 - Flags: review?(justdave)
Alias: bz-utf8-migrate → bz-recode
Attached file v4
I improved the script again. Now, if we fail a guess, but the data is valid UTF-8, we never attempt to convert it (we don't even use the fallback encoding). This produced much better results on landfill. I also fixed up the POD.
Attachment #240251 - Attachment is obsolete: true
Attachment #241153 - Flags: review?(justdave)
Attachment #240251 - Flags: review?(justdave)
Things I noticed in the POD: contrib/recode.pl [--guess [--show-failures]] [--charset=iso-8859-2] [--overrides=file_name] ^^^ Should you demonstrate all options here, including --dry-run? "Don't modify the database, just print out what the conversions will be." <-- s/,/;/ "character set into the UTF-8." <-- s/into the/into/
Blocks: 304550
Comment on attachment 241153 [details] v4 I can't wait any longer for a review on this script--the freeze is too soon.
Attachment #241153 - Flags: review?(justdave) → review?(LpSolit)
Attachment #241153 - Flags: review?(LpSolit) → review?(justdave)
Comment on attachment 241153 [details] v4 OK, this looks really good... I've ran it three or four times on the production database, and for what it's supposed to do, it works good, and I don't see any problems sticking out at me looking at the source. There's probably still room for improvement, but it's good enough to include in a release, and we can continue bugfixing it as we encounter issues.
Attachment #241153 - Flags: review?(justdave) → review+
Flags: approval+
(In reply to comment #47) > Created an attachment (id=241153) [edit] > v4 If Encode-Detect is not installed, the script fails to print out the correct error message. Instead it aborts with # contrib/recode.pl --dry-run Bareword "ROOT_USER" not allowed while "strict subs" in use at contrib/recode.pl line 141. Execution of contrib/recode.pl aborted due to compilation errors.
Is the script expected to work with a 2.22.1 database? When I run in in --dry-run mode on a 2.22.1 DB, it fails with: Converting attachments.description... Converting attachments.mimetype... Converting attachments.filename... [lots of further output] Converting logincookies.ipaddr... Converting longdescs.thetext... Use of uninitialized value in concatenation (.) or string at contrib/recode.pl line 209. Use of uninitialized value in split at contrib/recode.pl line 213. DBD::mysql::st execute failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM longdescs WHERE thetext IS NOT NULL ' at line 1 [for Statement "SELECT thetext, FROM longdescs WHERE thetext IS NOT NULL AND thetext != ''"] at contrib/recode.pl line 218
(In reply to comment #52) > Is the script expected to work with a 2.22.1 database? No. Nor is it expected to work with 2.22.1 code.
RCS file: /cvsroot/mozilla/webtools/bugzilla/contrib/recode.pl,v done Checking in contrib/recode.pl; /cvsroot/mozilla/webtools/bugzilla/contrib/recode.pl,v <-- recode.pl initial revision: 1.1 done
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Okay, so the script wasn't working when the UTF-8 parameter was turned on. I've fixed this with this patch. The script also wasn't running if you set shutdownhtml, which didn't make sense, so I exempted it from shutdownhtml. Checking in Bugzilla.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla.pm,v <-- Bugzilla.pm new revision: 1.53; previous revision: 1.52 done Checking in contrib/recode.pl; /cvsroot/mozilla/webtools/bugzilla/contrib/recode.pl,v <-- recode.pl new revision: 1.2; previous revision: 1.1 done
Hi! I just tried recode.pl and got the following errors: #./recode.pl --dry-run Can't find param named user_verify_class at ../Bugzilla/Config.pm line 171. BEGIN failed--compilation aborted at ../Bugzilla/Auth.pm line 43. Compilation failed in require at ../Bugzilla.pm line 28. BEGIN failed--compilation aborted at ../Bugzilla.pm line 28. Compilation failed in require at ./recode.pl line 26. BEGIN failed--compilation aborted at ./recode.pl line 26. What I did was to copy recode.pl version 4 into the contrib directory and run it. Is there something else I have to do? I checked the whole bug and did not find any instructions. Regards Werner
(In reply to comment #56) > What I did was to copy recode.pl version 4 into the contrib directory and run > it. Well, that won't work! :-) You have to actually upgrade all of Bugzilla--you can't just take this script. This is a support question--for any more details, please ask on the support list, described here: http://www.bugzilla.org/support/
Which version is it supposed to run with? Is it just the 2.23.4 or should it run with the new release 2.22.2? I'll ask all further questions there. Regards Werner
Blocks: 229010
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: