Validator ignores files with UCS2-Little endian encoding

RESOLVED FIXED in Q2 2011

Status

addons.mozilla.org Graveyard
Developer Pages
P3
minor
RESOLVED FIXED
7 years ago
2 years ago

People

(Reporter: basta, Assigned: basta)

Tracking

unspecified
Q2 2011
x86
All
Dependency tree / graph

Details

(Assignee)

Description

7 years ago
It appears that files with UCS2-Little endian encoding are not passed through the JS tests.
Severity: normal → minor
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P3
(Assignee)

Comment 1

7 years ago
Through my research, it seems like little-endian UCS-2 is invalid (it should always be big-endian). Can anyone confirm this?

If it's not invalid, we should be focusing on ways to detect and decode it. If it is invalid, then a filter needs to be written that converts the raw bytes to ASCII or UTF-8.
(Assignee)

Comment 2

7 years ago
I've got a preliminary fix which I'll be testing tonight and tomorrow. However, I should note that a file from the original package (content/overlay.js) is using an encoding which cannot be decoded by SpiderMonkey, which means that the validator will fail with a compilation error for that file.

In the other files (namely content/smftn.js), there are characters which cannot be properly encoded for error output, so you get a lot of question marks, but the actual validation is taking place. Should it be encoded to something like UTF-8, everything would be just peachy. No issue there.
(Assignee)

Comment 3

7 years ago
Waiting on the resolution of bug 648102. This will make the tests pass.
Depends on: 648102

Comment 4

7 years ago
obsolete response to Comment 1 - both big and little-ending UTF-16 are supported; this is probably true for its obsolete predecessor, UCS-2, as well. We determine big or little-endian based on the BOM (byte order marker) at the beginning of the content.

Strings are stored in the JS engine roughly as uint16[], in the platform's native byte order.

Supplying UTF-8 input, as noted in comment 2, is far safer; it is endian-proof, won't trigger BOM bugs. Also, the JS engine knows how to convert from UTF-8 to UTF-16 native-endian encoding, which is what JS developers expect to find in their strings.

Matt, can you point me to a failing test?
(Assignee)

Comment 5

7 years ago
Hey Wes

There's only one failing test at the moment. You can find it on the "encoding" branch of the validator:

https://github.com/mattbasta/amo-validator/branches/encoding

The test is found here:

https://github.com/mattbasta/amo-validator/blob/encoding/tests/test_controlchars.py#L33

When the following string is passed to Spidermonkey via read():

function täst() {}

...Spidermonkey throws this error:

missing ( before formal parameters
Target Milestone: --- → Q2 2011
(Assignee)

Updated

7 years ago
Blocks: 644750
(Assignee)

Updated

7 years ago
Blocks: 648596
Depends on: 650728
(Assignee)

Comment 6

7 years ago
Merged into mozilla/amo-validator:

https://github.com/mozilla/amo-validator/commit/6e2e1fd6b4b98a968b66acae7529cebcc9fe19c6
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.