Closed Bug 281362 Opened 20 years ago Closed 16 years ago

lxr results should use utf-8 encoding per default

Categories

(Webtools Graveyard :: MXR, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: axel, Assigned: Biesinger)

References

Details

Attachments

(3 files)

As most of our l10n files are encoded in utf-8, we should use that format to send
the data.

(The only exceptions to utf-8 are searchengines, AFAICT, which are mac latin.)
QA Contact: mitchell → cmp
Is the lack of this information in the server header and/or the file header
causing problems?  I've checked myself and my browser infers the page content
encoding as UTF-8.  What am I missing?
My Firefox displayed everything in ISO-8859-2. I don't have LiveHeaders
extension but I can bet that site should sent encoding in header.
Using LiveHTTPHeader I can see the server does not define this page's character
set.
Just FYI, that patch was from:

 
http://lxr.mozilla.org/l10n/source/cs-CZ/browser/chrome/browser/pageInfo.properties

The page source doesn't define a character set, either.  I think in my case
Firefox chooses UTF-8 and in yours it chooses ISO-8859-2.
Attached file same from me
same log from me
Component: Miscellaneous → LXR
Product: mozilla.org → Webtools
QA Contact: cmp → timeless
Mass reassign of open bugs for chase@mozilla.org to build@mozilla-org.bugs.
Assignee: chase → build
*** Bug 338119 has been marked as a duplicate of this bug. ***
the bug duped to this one wasn't l10n specific, adjusting summary
Summary: l10n lxr results should use utf-8 encoding per default → lxr results should use utf-8 encoding per default
QA Contact: timeless → lxr
Reassigning all LXR bugs assigned to build@mozilla-org.bugs to the default LXR owner (sorry Bear!)

We aren't actively working on these (if that's wrong, please reassign to yourself/a real person). 
Assignee: build → bear
QA Contact: lxr → timeless
A related (but separate) issue is that most localized files are  assumed to be 'binary' (e.g. http://lxr.mozilla.org/l10n-mozilla1.8/source/ko/dom/chrome/layout/css.properties). Probably as a result, bonsai's 'blame' doesn't work either.
Has this issued been filed as a bug? If not, I'll file it. 
QA Contact: timeless → lxr
Attached patch patchSplinter Review
Assignee: bear → cbiesinger
Status: NEW → ASSIGNED
Attachment #287436 - Flags: review?(bear)
Attachment #287436 - Flags: review?(bear) → review+
Checking in webtools/lxr/lib/LXR/Common.pm;
/cvsroot/mozilla/webtools/lxr/lib/LXR/Common.pm,v  <--  Common.pm
new revision: 1.32; previous revision: 1.31
done

I don't really know what the process is for getting that onto the various LXR installations we have...
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment on attachment 287436 [details] [diff] [review]
patch

jshin: binary is entirely unrelated to bonsai. 

http://bonsai-l10n.mozilla.org/cvsblame.cgi?file=/l10n/ko/dom/chrome/layout/css.properties&rev=1.2.2.7

the problem w/ mxr and lxr for the l10n roots is that their bonsai integration links are bad.

however. I question any patch that simply tags files.

I could have done this years ago if i thought it was correct.
simply put, it's wrong.

what do you do w/ source or search when files are in random encodings?

sure some files are UTF8, but some aren't. worse, some are mixed. search results can and easily will contain mixed results.

OpenOffice was among the interesting ones iirc.

the right thing theoretically would be to parse each line and guess its encoding, and then recode it to utf8 :) of course, i have no idea what one does if a "line" isn't really a line.

However, mxr search results are already slower than lxr. and there's a risk of timing out.

i chose the approach of relying on browser autodetect.

now, i suppose we could add a per tree mimetype configuration. i wouldn't strictly speaking oppose that.
Attachment #287436 - Flags: review-
OK then, I backed this out for now. I don't know if I'm going to do anything else with this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Both mxr and cvsblame WFM, sending "Content-Type: text/html; charset=UTF-8":

http://mxr.mozilla.org/l10n/source/cs/browser/chrome/browser/pageInfo.properties
http://bonsai-l10n.mozilla.org/cvsblame.cgi?file=l10n/cs/browser/chrome/browser/pageInfo.properties
Status: REOPENED → RESOLVED
Closed: 17 years ago16 years ago
Resolution: --- → WORKSFORME
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: