Closed Bug 106843 Opened 23 years ago Closed 22 years ago

add special character back for hwaara's name to folderPane.css

Categories

(Core :: CSS Parsing and Computation, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.1beta

People

(Reporter: sspitzer, Assigned: bzbarsky)

References

Details

(Keywords: helpwanted, intl)

Attachments

(2 files, 2 obsolete files)

the special character causes us to fail to convert folderPane.css to UTF-8

boris writes:

"we now default to UTF8 for all the chrome sheets, since
 that's the charset of XML documents by default

In ISO-8859-1 that char is apparently not a problem....

1) take that char out
2) put an @charset rule in that sheet
3) fix the code that gets the charset"

before you add @charset, I'd want to know what it costs us.

if that means a performance hit, let's come up with something else.

http://bugzilla.mozilla.org/show_bug.cgi?id=106674

sorry about that, hwaara
OK.  "fix the code that gets the charset" means one of:

a) "don't ask the document for its charset". I believe this would make the intl
   folks very unhappy.
b) "don't ask the document for its charset for chrome:// urls".  This seems
   silly and is a bigger perf hit than adding @charset, imo.

adding @charset _would_ mean a bit of a perf hit, but very very little. Right
now we check for @charset and if it's found we get the charset.  So we would
need to do the getting.  On the other hand, we would not have to ask the
document for a charset, which avoids a string copy.  Once I fix bug 80106 the
getting will be no slower than reading past an extra line or two of comments in
the file and may well be faster than asking the document and copying that string.

So I think my vote is for just adding an |@charset "ISO-8859-1";| rule to the
sheet... Pierre?  Thoughts?
Status: NEW → ASSIGNED
OS: Windows 2000 → All
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → mozilla0.9.6
in the mean time, can you add an assertion so that we'll know where to look if 
this happens again?
Yeah, no matter what solution we pick the patch I attach will include that
assertion.  Unless you want the assertion ASAP, in which case please let me
know.
It is a bit harsh to reject a stylesheet just because a character conversion 
failed.  Is there any way we can ignore the error and go on with the conversion?
This is very similar to bug 91933 that I filed a while ago, that makes
.properties and DTD files fail if I add my name there.  Boris, do you think you
could look into that next (if it's related) ?
Can't we just throw away unrecognized chars if we are in a comment ? I agree with
Pierre that it sounds weird to refuse a sheet because of a comment we don't deal
with anyway.

Håkan: be glad you live in an iso-8859-1 country ; another charset and you just
could not add your name to a file w/o transliteration or utf-8 encoding.
OK.  I don't know _that_ much about charsets.  ccing some intl folks for
insight.

The problem with UTF8 in particular is that chars are encoded using a variable
number of bytes.  So picking up after an error may well have alignment issues.

Looking at the nsIUnicodeConverter interface, it looks like on error the
converter _does_ tell us how much data was actually consumed so we can "restart"
it at some point further along the string.  The wisdom of doing so I again leave
to the intl folks.

Daniel, with the current system we have no way of knowing we are in a comment.
We convert the string to UCS2 before we ever give it to the CSS parser.

Furthermore, because of the variable-width thing a single char can affect the
way the rest of the document is decoded.  I don't know how robust the various
encodings are at permitting a unique interpretation of the remainder of the
document given an error like this.  The HTML parser does some handling of this
sort of error at:

http://lxr.mozilla.org/seamonkey/source/htmlparser/src/nsScanner.cpp#343

so apparently it's not unreasonable.  If the intl folks have no issues with me
doing something like that, I could do it.

Adding dependency since depending on what we decide here the patch in bug 80106
may need to be reworked.

Let's get a decision here ASAP, please.  I'd really like to fix bug 80106.  :)

On a side note, why was the assertion on conversion failure in
nsUTF8ToUnicode::Convert commented out?  That would have made diagnosing this
problem much simpler... :)
Blocks: 80106
Keywords: intl
The i18n team should define how errors are processed.  A solution could be to 
pass a boolean ('aIgnoreErrors') to Convert() and let the i18nn code replace the 
bad characters with U+FFFD or something else. This code shouldn't be in the HTML 
or CSS parsers, I think.

If 'aIgnoreErrors' is true, Convert() would still return an error code but it 
would complete the character conversion.  Returning the error code would leave 
the caller the choice to use a buffer that is known to contain conversion errors, 
or attempt a conversion with a different character set.
No longer blocks: 80106
Pushing out, but could we please come to a decision on this one?  Especially if
this'll involve changing the converter interface, we should do it as soon as we can.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
I filed a bug 107790 for nsIUnicodeDecoder enhancement request.
Depends on: 107790
>before you add @charset, I'd want to know what it costs us.
>if that means a performance hit, let's come up with something else.
I don't think so.
*** Bug 111219 has been marked as a duplicate of this bug. ***
Pushing out to match bug this depends on.

I suggest that we do something here (add an assertion, add a workaround like the
HTML parser, something) in the near future, however.  I'll be unable to work on
Mozilla starting a week from now for about a month, so if someone wants to do
this in the meantime....

Pierre, would you be willing to add an assertion before the 0.9.7 freeze?  If
so, I'll attach a patch.
Keywords: helpwanted
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Agreed: an assertion is a good idea when a stylesheet is rejected because of a 
bad character.  We had a couple of dups and they are not easy to find for those 
who are not already aware of the problem.
Attached patch Add an assertion (obsolete) — Splinter Review
Can't we use SheetLoadData.mURL instead of getting the url from the channel?
Attached patch Assertion patch using mURL (obsolete) — Splinter Review
So we can.  :)
Attachment #61083 - Attachment is obsolete: true
Comment on attachment 61087 [details] [diff] [review]
Assertion patch using mURL

r=pierre
Attachment #61087 - Flags: review+
Comment on attachment 61087 [details] [diff] [review]
Assertion patch using mURL

It is unclear to me if we know that the reason for the failure could only be a
charset problem. Assmuming that there are other possible causes of failure, I
think it might be a bad idea to simply assert like that. From reading the
comments, however, it seems that this is just a temporary thing to keep track
of a specific situation, in which case we should have a nice big comment in the
DEBUG block please. 

Also, you have two errorMessage nsCAutoStrings declared, the first one is not
neccesary and probably will cause some compiler to choke.

Address these comments and sr=attinasi
Attachment #61087 - Flags: superreview+
Comment on attachment 61087 [details] [diff] [review]
Assertion patch using mURL

removed extra decl, added nice long comment and checked in.
Attachment #61087 - Attachment is obsolete: true
Boris can you remove my name from those files, too please? (If it can't be my
real first name, Håkan (as opposed to Hakan), then please just remove it.)
Simon is making progress on this over in bug 128896
Depends on: 128896
setting target milestone... :(
Target Milestone: mozilla0.9.9 → mozilla1.0
Why can't we put a native UTF8 a-ring in the file?
Why not, indeed?
if you do this, then you get to write patches for bugzilla and bonsai and lxr 
to teach them how to correctly detect endcodings :-)
Mozilla's code should not be constrained by the limitations of the supporting tools.
Attachment #73579 - Flags: approval+
Comment on attachment 73579 [details] [diff] [review]
Håkan's name in UTF-8

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Comment on attachment 73579 [details] [diff] [review]
Håkan's name in UTF-8

oops, approved wrong bug. fixing.
Attachment #73579 - Flags: approval+
I'm going out of town in two days and not getting back till after 1.0 freeze. 
Setting realistic milestones for all 1.0 bugs that depend on other bugs or on
possible interface work.

Any help on getting these fixed would be much appreciated...
Target Milestone: mozilla1.0 → mozilla1.1alpha
Now that bug 128896 is fixed, there are no less than 3 ways we could fix this:
1) Change Hakan to HÃ¥kan (attachment 73579 [details] [diff] [review])
2) Change Hakan to Håkan, and add @charset ISO-8859-1
3) Change Hakan to Håkan without adding @charset

If it was up to me I would choose option #2.
I say we just revert the patch from bug 106674, unless there's a good reason to
add the @charset to the files (thus preventing any use of UTF-8 in them).
cc'ing myself
OK.  Could we please just pick something?  hwaara, what would your preference 
be?
Target Milestone: mozilla1.1alpha → mozilla1.1beta
If we can just change the "a" to a "å" without changing anything else now that
bug 128896, that'd be fine by me.
OK.  Could we do _something_ here.  This bug is hovering at the top of my
buglist for no reason.....

Reviews?  On either of the patches available?
Comment on attachment 89992 [details] [diff] [review]
Revert the patch from bug 106674

assuming the original bug with these special characters in css files is fixed,
sr=sspitzer
Attachment #89992 - Flags: superreview+
Yeah, I tested that.  ;)

Fixed.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
verified as per comments 
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: