Closed
Bug 635933
Opened 15 years ago
Closed 15 years ago
Validator ignores files with UCS2-Little endian encoding
Categories
(addons.mozilla.org Graveyard :: Developer Pages, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
Q2 2011
People
(Reporter: basta, Assigned: basta)
References
Details
It appears that files with UCS2-Little endian encoding are not passed through the JS tests.
Updated•15 years ago
|
Severity: normal → minor
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P3
| Assignee | ||
Comment 1•15 years ago
|
||
Through my research, it seems like little-endian UCS-2 is invalid (it should always be big-endian). Can anyone confirm this?
If it's not invalid, we should be focusing on ways to detect and decode it. If it is invalid, then a filter needs to be written that converts the raw bytes to ASCII or UTF-8.
| Assignee | ||
Comment 2•15 years ago
|
||
I've got a preliminary fix which I'll be testing tonight and tomorrow. However, I should note that a file from the original package (content/overlay.js) is using an encoding which cannot be decoded by SpiderMonkey, which means that the validator will fail with a compilation error for that file.
In the other files (namely content/smftn.js), there are characters which cannot be properly encoded for error output, so you get a lot of question marks, but the actual validation is taking place. Should it be encoded to something like UTF-8, everything would be just peachy. No issue there.
| Assignee | ||
Comment 3•15 years ago
|
||
Waiting on the resolution of bug 648102. This will make the tests pass.
Depends on: 648102
Comment 4•15 years ago
|
||
obsolete response to Comment 1 - both big and little-ending UTF-16 are supported; this is probably true for its obsolete predecessor, UCS-2, as well. We determine big or little-endian based on the BOM (byte order marker) at the beginning of the content.
Strings are stored in the JS engine roughly as uint16[], in the platform's native byte order.
Supplying UTF-8 input, as noted in comment 2, is far safer; it is endian-proof, won't trigger BOM bugs. Also, the JS engine knows how to convert from UTF-8 to UTF-16 native-endian encoding, which is what JS developers expect to find in their strings.
Matt, can you point me to a failing test?
| Assignee | ||
Comment 5•15 years ago
|
||
Hey Wes
There's only one failing test at the moment. You can find it on the "encoding" branch of the validator:
https://github.com/mattbasta/amo-validator/branches/encoding
The test is found here:
https://github.com/mattbasta/amo-validator/blob/encoding/tests/test_controlchars.py#L33
When the following string is passed to Spidermonkey via read():
function täst() {}
...Spidermonkey throws this error:
missing ( before formal parameters
Updated•15 years ago
|
Target Milestone: --- → Q2 2011
| Assignee | ||
Comment 6•15 years ago
|
||
Merged into mozilla/amo-validator:
https://github.com/mozilla/amo-validator/commit/6e2e1fd6b4b98a968b66acae7529cebcc9fe19c6
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•