Closed Bug 788338 Opened 12 years ago Closed 12 years ago

Guess encoding of non-UTF8 app manifest files

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

Milestone:

2012-11-01

People

(Reporter: kumar, Assigned: robhudson)

Details

Votes:

Attachments

(1 file)

Example of non-UTF8 app manifest 12 years ago Kumar McMillan [:kumar] 1.68 KB, text/plain		Details

Kumar McMillan [:kumar]

Reporter

Description

•

12 years ago

When a non-UTF8 manifest file is submitted (see bug 780823) and the server did not specify a charset in the header (bug 754487) then we need to guess the encoding. We can do this with the chardet module which is already included in zamboni.

Kumar McMillan [:kumar]

Reporter

Comment 1

•

12 years ago

Attached file Example of non-UTF8 app manifest — Details

example of fixing with chardet:

In [2]: mf = open('/Users/kumar/tmp/w2mo.webapp').read()
In [3]: mf.decode('utf8')
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (2, 0))

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)

...
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 41: unexpected code byte

In [4]: import chardet
In [5]: chardet.detect(mf)
Out[5]: {'confidence': 0.55246051025679954, 'encoding': 'ISO-8859-2'}
In [6]: mf.decode('ISO-8859-2')
Out[6]: u'{  \r\n  "version": "3.8",\r\n  "name": "W2MO\u017d",\r\n...'

Rob Hudson [:robhudson]

Assignee

Comment 2

•

12 years ago

We're not already doing this?
https://github.com/mozilla/zamboni/blob/master/apps/files/utils.py#L219

Kumar McMillan [:kumar]

Reporter

Comment 3

•

12 years ago

oh, that's good! We just need to fix up all the places that aren't already doing that. escape_all() in bug 780823 might not be the only place.

Wil Clouser [:clouserw]

Comment 4

•

12 years ago

Rob: can you take a quick grep through the dev tools and see if there is anywhere else to do this?  Otherwise, please close.  Thanks.

Assignee: nobody → robhudson.mozbugs

Priority: -- → P4

Target Milestone: --- → 2012-11-01

Masatoshi Kimura [:emk]

Comment 5

•

12 years ago

Comment on attachment 658296 [details]
Example of non-UTF8 app manifest

This file have mixed-encodings.

>  "name": "W2MO®",

Here is windows-1252.

>  "locales": {  
>    "de": {  
>      "description": "W2MO: Logistikoptimierung, 3D-Simulation, Personalplanung"    },
>    "en": {  
>      "description": "W2MO: Logistics 3D-Simulation, Optimization, Workforce Planning"
>    },  
>    "fr": {  
>      "description": "W2MO: Conception logistique, 3D-simulation, optimisation, planification du personnel"    },  
>    "nl": {  
>      "description": "W2MO: Logistiek planning. 3D Simulatie, optimalisatie, arbeidskrachten planning"    },  
>    "es": {  
>      "description": "W2MO: diseño logístico, simulación 3D, optimización, planificación de la plantilla"    },  
>    "pt-br": {  
>      "description": "W2MO: logística, simulação 3D e animação, otimização, planejamento de pessoal"    },  
>    "pt-pt": {  
>      "description": "W2MO: logística, simulação 3D e animação, otimização, planejamento de pessoal"    },  
>    "ru": {  
>      "description": "W2MO: Логистический дизайн, 3D-симуляция, оптимизация, кадровое планирование, расчет стоимости"    },  
>    "cn": {  
>      "description": "W2MO: Логістичний дизайн, 3D-симуляція, оптимізація, планування персоналу, калькуляція витрат"    }  
>  },  

But here is UTF-8.

Matt Basta [:basta]

Comment 6

•

12 years ago

Per the manifest specification:

> The document must be UTF-8 in order for the app to be submitted to Firefox Marketplace. It is recommended to omit the byte order mark (BOM). Other encodings can be specified with a charset parameter on the Content-Type header (i.e. Content-Type: application/x-web-app-manifest+json; charset=ISO-8859-4), though this will not be respected by the Marketplace.

https://developer.mozilla.org/en-US/docs/Apps/Manifest?redirectlocale=en-US&redirectslug=OpenWebApps%2FThe_Manifest#Serving_manifests

The current submission flow will raise a validation error that says "Your manifest file was not encoded as valid UTF-8." if a non-UTF-8 manifest is provided.

I don't see any good reason to support non-UTF-8 manifests other than to open a whole realm of bugs and implementation difficulties. We should do everything we can to ease the technical burdens placed on app consumers and other web app marketplaces. UTF-8 is not by any means an esoteric encoding and guessing the encoding with chardet does not mean our guess will be accurate (especially with mixed encodings*), so I don't think we should venture down that path unless it becomes a pain point.

* Danny tried to copy from ISO-8859-4 into GB2312. Now his documeniIÈsQ}#Õ…µÝ. Mixed Encodings: Not even once.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WONTFIX

Kumar McMillan [:kumar]

Reporter

Comment 7

•

12 years ago

If we show a validation warning that's probably good enough. I think we do need to continue stripping BOMs because many editors on Windows add those. Not all devs are advanced enough to convert encodings of their source code so my main concern here is that we're making life hard by not supporting their valid non-UTF8 manifests. I guess we can wait to see if we get bugs and / or complaints about that.

Kumar McMillan [:kumar]

Reporter

Comment 8

•

12 years ago

I meant as long as we show a validation error when not utf8.

Matt Basta [:basta]

Comment 9

•

12 years ago

We do show a validation error, and we do continue to strip the BOM.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Guess encoding of non-UTF8 app manifest files

Categories

(Marketplace Graveyard :: Developer Pages, defect, P4)

Tracking

(Not tracked)

People

(Reporter: kumar, Assigned: robhudson)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type