Closed Bug 773871 Opened 12 years ago Closed 12 years ago

Check for charset in HTTP Content-Type response for manifest

Categories

(Marketplace Graveyard :: Validation, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED FIXED
2012-10-18

People

(Reporter: gps, Assigned: basta)

Details

In bug 761877 and bug 772191 we had to work around problems encountered if the HTTP response for the manifest JSON didn't include a charset field in the Content-Type. Basically, if the server doesn't specify a charset, the client may interpret that character stream using what local defaults it has instead of what the server knows to be the actual character set.

Firefox will now do the right thing because of the aforementioned bugs (assuming manifests are actually UTF-8). But, I think it would be nice for the manifest validation to include a check that ensures the Content-Type contains a charset field. This will ensure that any improper character set handling is the fault of a client not respecting the header and not the client not knowing what to do.

  Content-Type: application/x-web-app-manifest+json     -> BAD
  Content-Type: application/x-web-app-manifest+json; charset=UTF-8  -> GOOD
  Content-Type: application/x-web-app-manifest+json; foo=bar; charset=UTF-8 -> GOOD
It's hard enough for the average developer to set up a custom content-type header, why should we force them to declare a charset too? During validation, we warn if the manifest is not UTF8 compatible (which is the default on Marketplace). 

Is there a way to fix the client code so it falls back on UTF8 instead of the local encoding?

Also, for charsets to be useful to the Marketplace site we need to parse them: bug 754487
(In reply to Kumar McMillan [:kumar] from comment #1)
> It's hard enough for the average developer to set up a custom content-type
> header, why should we force them to declare a charset too?

Who said anything about force? I think a warning is sufficient. Those who can fix it will. Bonus points if the warning links to a guide on how to do this or something.

> Is there a way to fix the client code so it falls back on UTF8 instead of
> the local encoding?

Yes. That's what bug 761877 and bug 772191 changed. Desktop Firefox now treats missing charset as UTF-8.
ok, I see. Yeah, we could add a warning to the validator if an app is not serving their manifest with a charset declaration. cc'ing clouserw to get priority on this feature.
The warning should link to this page in the docs: https://developer.mozilla.org/en/Apps/Manifest#section_4
We can't add this to the validator because the validator doesn't download the manifest, it only reads in the resulting JSON. It would only be possible if the HTTP headers were passed in separately or the whole HTTP response was passed in, but at that point it's probably not worth going through all that work to have a single conditional on that input.
It doesn't necessarily have to be part of the validator itself, we could make it a form validation error (of which there are a couple, like, to warn when the manifest URL is a 404, etc).
P5 but would be friendly.  Let me know if you need wording
Assignee: nobody → xwraithanx
Priority: -- → P5
Target Milestone: --- → 2012-08-30
Since we don't require a content-type anymore, I don't think this is entirely possible to test for, no?
(In reply to Matt Basta [:basta] from comment #8)
> Since we don't require a content-type anymore, I don't think this is
> entirely possible to test for, no?

We don't? Why not?
Target Milestone: 2012-08-30 → 2012-09-06
Basta, is this done?
This has nothing to do with the HTTP stuff that I landed last week. All of the stuff I landed tests for things that are found in the manifest, meaning the manifest has long been downloaded at that point.
Bumping to next week.
Target Milestone: 2012-09-06 → 2012-10-04
Target Milestone: 2012-10-04 → ---
This is going to be implemented as follows for hosted apps:

- If we receive a manifest that is UTF-8 without a defined charset, we will not raise an error.
- If we receive a manifest that is UTF-8 with a charset that defines UTF-8, we will not raise an error.
- If we receive a manifest that is UTF-8 with a charset that is not UTF-8, we will raise an error that the charset does not match the document.
- If we receive a manifest that is not UTF-8, we will block the manifest's submission and require that it is re-encoded properly.

Since Firefox handles this properly and the web app spec explicitly specifies that Marketplace expects UTF-8 (and since it's a sin to encourage non-UTF-8 encodings), this shouldn't be an issue. If nothing else, this will prevent a whole host of headaches for future developers and our future selves.
Assignee: xwraithanx → mattbasta
All sorts of goodness in this commit:

https://github.com/mozilla/zamboni/commit/c60353bd804503cbbadf2ffaac768975d94c162b
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2012-10-18
Matt: would you add some steps to verify this bug for QA? thanks
It could be done pretty easily with a simple PHP script:

<?php
header('Content-Type: application/x-web-app-manifest+json; charset=foobar');
readfile("manifest.webapp");
?>

Just put that in a file (script.php) next to a manifest ("manifest.webapp"). Submit the URL for script.php. It should fail and say that the charset is invalid. Changing "foobar" to "utf-8" should not produce any error output when script.php is submitted.
It's probably also a good idea to check to make sure that hosted apps can still be submitted without the charset in the Content-Type, but I'll assume that we're testing that anyway (via selenium, human, or other means).
You need to log in before you can comment on or make changes to this bug.