Closed Bug 635933 Opened 15 years ago Closed 15 years ago

Validator ignores files with UCS2-Little endian encoding

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Q2 2011

People

(Reporter: basta, Assigned: basta)

References

Details

Matt Basta [:basta]

Assignee

Description

•

15 years ago

It appears that files with UCS2-Little endian encoding are not passed through the JS tests.

Jorge Villalobos [:jorgev] (he/him)

Updated

•

15 years ago

Severity: normal → minor

Status: UNCONFIRMED → NEW

Ever confirmed: true

Priority: -- → P3

Matt Basta [:basta]

Assignee

Comment 1

•

15 years ago

Through my research, it seems like little-endian UCS-2 is invalid (it should always be big-endian). Can anyone confirm this? If it's not invalid, we should be focusing on ways to detect and decode it. If it is invalid, then a filter needs to be written that converts the raw bytes to ASCII or UTF-8.

Matt Basta [:basta]

Assignee

Comment 2

•

15 years ago

I've got a preliminary fix which I'll be testing tonight and tomorrow. However, I should note that a file from the original package (content/overlay.js) is using an encoding which cannot be decoded by SpiderMonkey, which means that the validator will fail with a compilation error for that file. In the other files (namely content/smftn.js), there are characters which cannot be properly encoded for error output, so you get a lot of question marks, but the actual validation is taking place. Should it be encoded to something like UTF-8, everything would be just peachy. No issue there.

Matt Basta [:basta]

Assignee

Comment 3

•

15 years ago

Waiting on the resolution of bug 648102. This will make the tests pass.

Depends on: 648102

Wesley W. Garland

Comment 4

•

15 years ago

obsolete response to Comment 1 - both big and little-ending UTF-16 are supported; this is probably true for its obsolete predecessor, UCS-2, as well. We determine big or little-endian based on the BOM (byte order marker) at the beginning of the content. Strings are stored in the JS engine roughly as uint16[], in the platform's native byte order. Supplying UTF-8 input, as noted in comment 2, is far safer; it is endian-proof, won't trigger BOM bugs. Also, the JS engine knows how to convert from UTF-8 to UTF-16 native-endian encoding, which is what JS developers expect to find in their strings. Matt, can you point me to a failing test?

Matt Basta [:basta]

Assignee

Comment 5

•

15 years ago

Hey Wes There's only one failing test at the moment. You can find it on the "encoding" branch of the validator: https://github.com/mattbasta/amo-validator/branches/encoding The test is found here: https://github.com/mattbasta/amo-validator/blob/encoding/tests/test_controlchars.py#L33 When the following string is passed to Spidermonkey via read(): function täst() {} ...Spidermonkey throws this error: missing ( before formal parameters

Wil Clouser [:clouserw]

Updated

•

15 years ago

Target Milestone: --- → Q2 2011

Matt Basta [:basta]

Assignee

Updated

•

15 years ago

Blocks: 644750

Matt Basta [:basta]

Assignee

Updated

•

15 years ago

Blocks: 648596

Wil Clouser [:clouserw]

Updated

•

15 years ago

Depends on: 650728

Matt Basta [:basta]

Assignee

Comment 6

•

15 years ago

Merged into mozilla/amo-validator: https://github.com/mozilla/amo-validator/commit/6e2e1fd6b4b98a968b66acae7529cebcc9fe19c6

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: addons.mozilla.org → addons.mozilla.org Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Validator ignores files with UCS2-Little endian encoding

Categories

(addons.mozilla.org Graveyard :: Developer Pages, defect, P3)

Tracking

(Not tracked)

People

(Reporter: basta, Assigned: basta)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Updated

Updated

Comment 6

Updated