Last Comment Bug 650653 - test_ident_escaping.html and test_parse_ident.html fail if Firefox is built on DBCS locales
: test_ident_escaping.html and test_parse_ident.html fail if Firefox is built o...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla6
Assigned To: Masatoshi Kimura [:emk]
:
: Jet Villegas (:jet)
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-17 11:13 PDT by Masatoshi Kimura [:emk]
Modified: 2011-04-18 14:08 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (2.02 KB, patch)
2011-04-17 11:16 PDT, Masatoshi Kimura [:emk]
dbaron: review+
Details | Diff | Splinter Review
patch (3.03 KB, patch)
2011-04-17 11:30 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Splinter Review
patch for check in (2.71 KB, patch)
2011-04-17 11:35 PDT, Masatoshi Kimura [:emk]
no flags Details | Diff | Splinter Review

Description Masatoshi Kimura [:emk] 2011-04-17 11:13:30 PDT
Microsoft C++ compiler doesn't support UTF-8 without BOM.
So nsCssScanner.cpp is treated as Shift_JIS source on Japanese locale.
Shift_JIS converter will eat second byte regardless the second byte is valid or not.
So lines 117, 119, 121 of nsCssScanner.cpp will be "commented out" because line terminators are eaten by last charcters in the previous lines.
https://hg.mozilla.org/mozilla-central/annotate/af907a37dcf0/layout/style/nsCSSScanner.cpp#l116
For example, 'ß' is encoded 'C3 9F' in UTF-8. '9F' is a valid first byte of Shift_JIS. so it eats next byte (0A). Therefore the next line is treated as if it is a part of the previous comment line.
This causes test_ident_escaping.html and test_parse_ident.html test failure.

I suppose this will also occur on Other DBCS locales.
Comment 1 Masatoshi Kimura [:emk] 2011-04-17 11:16:38 PDT
Created attachment 526606 [details] [diff] [review]
patch

Add dummy characters to prevent line terminators from being eaten.
Comment 2 David Baron :dbaron: ⌚️UTC-10 2011-04-17 11:29:57 PDT
Comment on attachment 526606 [details] [diff] [review]
patch

I'd prefer if you remove the non-ASCII literals entirely, perhaps by replacing them with numbers like this:

// `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
   0,  XSI,XSI,XSI,XSI,XSI,XSI,SI, SI, SI, SI, SI, SI, SI, SI, SI,
// p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~
   SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, 0,  0,  0,  0,  0,
// U+008*
   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
// U+009*
   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
// U+00A*
   SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI,
// U+00B*
   SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI,
// U+00C*
   SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI,
// U+00D*
   SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI,
// U+00E*
   SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI,
// U+00F*
   SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI, SI,

But r=dbaron either way.  Thanks for figuring this one out.  (Thankfully, I don't think this should affect our official builds -- only builds that developers build on Japanese (etc.) systems.)
Comment 3 Masatoshi Kimura [:emk] 2011-04-17 11:30:57 PDT
Created attachment 526610 [details] [diff] [review]
patch

Added comments and static asserts
Comment 4 Masatoshi Kimura [:emk] 2011-04-17 11:33:05 PDT
Sorry, mid-air collision occured. I'll fix a patch in accordance with review comment.
Comment 5 David Baron :dbaron: ⌚️UTC-10 2011-04-17 11:34:39 PDT
The PR_STATIC_ASSERT (and the removal of the explicit length) is a good idea -- keep that.
Comment 6 Masatoshi Kimura [:emk] 2011-04-17 11:35:57 PDT
Created attachment 526613 [details] [diff] [review]
patch for check in
Comment 7 Masatoshi Kimura [:emk] 2011-04-17 11:37:59 PDT
(In reply to comment #2)
> (Thankfully, I
> don't think this should affect our official builds -- only builds that
> developers build on Japanese (etc.) systems.)
Right. It's the reason this bug was not caught by official automated test.
Comment 8 Mounir Lamouri (:mounir) 2011-04-17 14:17:28 PDT
Pushed in cedar:
http://hg.mozilla.org/projects/cedar/rev/aada30c4e0d3
Comment 9 David Baron :dbaron: ⌚️UTC-10 2011-04-17 15:30:08 PDT
(In reply to comment #7)
> (In reply to comment #2)
> > (Thankfully, I
> > don't think this should affect our official builds -- only builds that
> > developers build on Japanese (etc.) systems.)
> Right. It's the reason this bug was not caught by official automated test.

Well, locale-specific bugs that depend on the locale the user is *running* the build on (such as bug 646882 or bug 416581 comment 6) also aren't caught by our tests.
Comment 10 Mounir Lamouri (:mounir) 2011-04-18 08:19:57 PDT
Pushed to mozilla-central:
http://hg.mozilla.org/mozilla-central/rev/aada30c4e0d3
Comment 11 Zack Weinberg (:zwol) 2011-04-18 09:00:34 PDT
It doesn't really matter here, but it would be nice to not have to worry about putting non-ASCII characters in comments.  Is there a way we could force MSVC++ to treat all our C++ source files as UTF-8 regardless of the OS locale in use?  (From inside the build system, presumably.)
Comment 12 Masatoshi Kimura [:emk] 2011-04-18 14:08:29 PDT
The only way is inserting a BOM which will make non-MS compilers unhappy :(
http://connect.microsoft.com/VisualStudio/feedback/details/341454/

Note You need to log in before you can comment on or make changes to this bug.