Closed Bug 747762 Opened 13 years ago Closed 8 years ago

Investigate Shift_JIS decoder changes of Encoding Standard

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla56

People

(Reporter: emk, Assigned: hsivonen)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fixed by encoding_rs])

Attachments

(1 file)

Test case 13 years ago Masatoshi Kimura [:emk] 1.82 KB, text/html		Details

Masatoshi Kimura [:emk]

Reporter

Description

•

13 years ago

Encoding Standard removed some "Gecko quirks" from the Shift_JIS decoder. http://dvcs.w3.org/hg/encoding/rev/7c876db1159c 1. The fallback code point is no longer U+30FB. 2. 0xA0 and 0xFD to 0xFF do no longer emit PUA code points 3. EUDC code ranges are no longer supported. Removing 1. and 2. may be fine. WebKit doesn't support them either. But I'm not sure it is acceptable to remove EUDC code ranges. EUDC code ranges are extensively used by Web pages for Japanese mobile phone.

Masatoshi Kimura [:emk]

Reporter

Comment 1

•

13 years ago

Attached file Test case — Details

Only Opera did not support EUDC code ranges.

Anne (:annevk)

Comment 2

•

13 years ago

Can we do something better than mapping to PUA (e.g. Unicode emoji)? Do Android and iOS have special fonts for the EUDC range?

Masatoshi Kimura [:emk]

Reporter

Comment 3

•

13 years ago

> Can we do something better than mapping to PUA (e.g. Unicode emoji)? It's impossible because PUA-to-Unicode mapping is different between careers :( > Do Android and iOS have special fonts for the EUDC range? Dunno. Web pages for Smart phones should use UTF-8 from the start. My main concern is about Japanese feature phones which support only Shift_JIS.

Anne (:annevk)

Comment 4

•

13 years ago

Is Gecko shipping on those phones? What is the end user scenario here?

Masatoshi Kimura [:emk]

Reporter

Comment 5

•

13 years ago

There is a Firefox add-on to develop Japanese mobile-phone sites on PC. http://firemobilesimulator.org/ Removing EUDC mappings will affect the add-on.

Masatoshi Kimura [:emk]

Reporter

Comment 6

•

13 years ago

BTW, KDDI, a Japanese mobile-phone career, provides Opera Mobile to their feature-phones (which is called "PC site viewer.") It seems to support KDDI emoji.

Anne (:annevk)

Comment 7

•

13 years ago

Yeah for select Japanese products it seems Opera has some mappings to PUA, but it's not exhaustive and limited to those products. It seems kind of weird to keep this given that the content can only be consumed on those specific phones.

Masatoshi Kimura [:emk]

Reporter

Comment 8

•

13 years ago

(In reply to Anne van Kesteren from comment #2) > Can we do something better than mapping to PUA (e.g. Unicode emoji)? Do > Android and iOS have special fonts for the EUDC range? It looks like Softbank iPhone supports Softbank-emoji in the EUDC range.

Masatoshi Kimura [:emk]

Reporter

Comment 9

•

13 years ago

(In reply to Anne van Kesteren from comment #7) > It seems kind of > weird to keep this given that the content can only be consumed on those > specific phones. Mobile-phone pages are also viewable from PC browsers. Some people even prefer mobile version rather than full of ads.

Masatoshi Kimura [:emk]

Reporter

Comment 10

•

13 years ago

Docomo mobile-phones pages can be served with application/xhtml+xml [1]. If those pages contain an emoji, normal browsers (including smart phones' one) can not view them at all because the fallback code point is fatal on XML. But Opera will not be affected because Opera decided to violate the spec [2]. [1] http://www.nttdocomo.co.jp/service/developer/make/content/browser/xhtml/notice/basis/index.html [2] http://my.opera.com/ODIN/blog/2011/09/28/no-more-xml-parsing-failed-errors

Masatoshi Kimura [:emk]

Reporter

Comment 11

•

13 years ago

I'm inclined to agree with Shawn. Legacy encodings are included to support legacy contents which is unlikely to be updated. Any innocent-looking changes to encodings will break some of them.

Anne (:annevk)

Comment 12

•

13 years ago

As can be seen from e.g. http://www.unicode.org/~scherer/emoji4unicode/snapshot/utc.html the KDDI mapping is different from the algorithm Gecko has employed (search for "U+EB" which is the start of a PUA the algorithm you have cannot generate). Each vendor has its own conversion table, including to PUA. Not supporting this at all seems better for the end user since we have no idea what the page meant.

Masatoshi Kimura [:emk]

Reporter

Comment 13

•

13 years ago

Some mobile-phones do not even support NEC/IBM extensions. Softbank-emoji has overlapped mappings with IBM extensions [1]. I don't care about extensions which are incompatible with Microsoft Codepage 932. Microsoft used to publish their EUDC-to-PUA mappings [2]. Although they removed the document, they did not (and will never) change their implementation. It should be documented elsewhere. [1] http://d.hatena.ne.jp/NAOI/20120423/1335164541 [2] http://web.archive.org/web/*/http://microsoft.com/typography/unicode/932.txt

Masatoshi Kimura [:emk]

Reporter

Comment 14

•

13 years ago

(In reply to Masatoshi Kimura [:emk] from comment #13) > Microsoft used to publish their EUDC-to-PUA mappings [2]. Although they > removed the document, I found that they published the data file again. https://www.microsoft.com/en-us/download/details.aspx?id=10921 They also published their algorithm. http://msdn.microsoft.com/en-us/library/cc248976%28v=prot.10%29.aspx

Masatoshi Kimura [:emk]

Reporter

Comment 15

•

13 years ago

(In reply to Masatoshi Kimura [:emk] from comment #14) > They also published their algorithm. > http://msdn.microsoft.com/en-us/library/cc248976%28v=prot.10%29.aspx This algorithm does not match what actually IE does after MS11-057. It always eats DBCS second bytes. So we need an algorithm to handle invalid sequences for security reason.

Henri Sivonen (:hsivonen)

Assignee

Comment 16

•

8 years ago

(In reply to Masatoshi Kimura [:emk] from comment #0) > Encoding Standard removed some "Gecko quirks" from the Shift_JIS decoder. > http://dvcs.w3.org/hg/encoding/rev/7c876db1159c > 1. The fallback code point is no longer U+30FB. > 2. 0xA0 and 0xFD to 0xFF do no longer emit PUA code points > 3. EUDC code ranges are no longer supported. > Removing 1. and 2. may be fine. WebKit doesn't support them either. But I'm > not sure it is acceptable to remove EUDC code ranges. EUDC code ranges are > extensively used by Web pages for Japanese mobile phone. Proceeding with removal of quirks #1 and #2. (Not supported by Blink or Presto, either.) EUDC was restored in the Encoding Standard and is supported by encoding_rs.

Depends on: encoding_rs

Henri Sivonen (:hsivonen)

Assignee

Comment 17

•

8 years ago

Bug 1261841 removed quirks #1 and #2.

Assignee: nobody → hsivonen

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Whiteboard: [fixed by encoding_rs]

Target Milestone: --- → mozilla56

You need to log in before you can comment on or make changes to this bug.