Closed Bug 635933 Opened 13 years ago Closed 13 years ago

Validator ignores files with UCS2-Little endian encoding

Categories

(addons.mozilla.org Graveyard :: Developer Pages, defect, P3)

x86
All

Tracking

(Not tracked)

RESOLVED FIXED
Q2 2011

People

(Reporter: basta, Assigned: basta)

References

Details

It appears that files with UCS2-Little endian encoding are not passed through the JS tests.
Severity: normal → minor
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P3
Through my research, it seems like little-endian UCS-2 is invalid (it should always be big-endian). Can anyone confirm this?

If it's not invalid, we should be focusing on ways to detect and decode it. If it is invalid, then a filter needs to be written that converts the raw bytes to ASCII or UTF-8.
I've got a preliminary fix which I'll be testing tonight and tomorrow. However, I should note that a file from the original package (content/overlay.js) is using an encoding which cannot be decoded by SpiderMonkey, which means that the validator will fail with a compilation error for that file.

In the other files (namely content/smftn.js), there are characters which cannot be properly encoded for error output, so you get a lot of question marks, but the actual validation is taking place. Should it be encoded to something like UTF-8, everything would be just peachy. No issue there.
Waiting on the resolution of bug 648102. This will make the tests pass.
Depends on: 648102
obsolete response to Comment 1 - both big and little-ending UTF-16 are supported; this is probably true for its obsolete predecessor, UCS-2, as well. We determine big or little-endian based on the BOM (byte order marker) at the beginning of the content.

Strings are stored in the JS engine roughly as uint16[], in the platform's native byte order.

Supplying UTF-8 input, as noted in comment 2, is far safer; it is endian-proof, won't trigger BOM bugs. Also, the JS engine knows how to convert from UTF-8 to UTF-16 native-endian encoding, which is what JS developers expect to find in their strings.

Matt, can you point me to a failing test?
Hey Wes

There's only one failing test at the moment. You can find it on the "encoding" branch of the validator:

https://github.com/mattbasta/amo-validator/branches/encoding

The test is found here:

https://github.com/mattbasta/amo-validator/blob/encoding/tests/test_controlchars.py#L33

When the following string is passed to Spidermonkey via read():

function täst() {}

...Spidermonkey throws this error:

missing ( before formal parameters
Target Milestone: --- → Q2 2011
Blocks: 644750
Blocks: 648596
Depends on: 650728
Merged into mozilla/amo-validator:

https://github.com/mozilla/amo-validator/commit/6e2e1fd6b4b98a968b66acae7529cebcc9fe19c6
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.