781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly

Reporter

Description

•

13 years ago

It hard-codes the filename encoding UTF-8. Windows compressed folder creates zip files whose filenames are encoded in the ANSI code-page.

Andrea Marchesini [:baku]

Assignee

Comment 1

•

13 years ago

It seems that windows uses the "local" charset when creating a zip file. The charset is not stored in the zip file so I use nsContentUtils::GuessCharset() in order to 'guess' the charset for decoding the filename string. The patch is almost ready. Waiting for green on try server.

Assignee: nobody → amarchesini

Masatoshi Kimura [:emk]

Reporter

Comment 2

•

13 years ago

GuessCharset() would not work very well for short strings such as filenames. 1. Try converting in UTF-8. 2. If failed, fallback to the platform charset. would be better and more stable. This is the same strategy we are using for file: scheme URIs.

Andrea Marchesini [:baku]

Assignee

Comment 3

•

13 years ago

The fallback to the platform charset fails if the local charset is Unicode but the filename is not. NS_CopyNativeToUnicode(filename) returns NS_OK printing a warning and returning a non-valid Unicode string. ...but maybe this is the wrong method. Any hints?

Andrea Marchesini [:baku]

Assignee

Comment 4

•

13 years ago

Attached patch Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly (obsolete) — Details — Splinter Review

Please, review this patch.

Attachment #650587 - Flags: review?(VYV03354)

Masatoshi Kimura [:emk]

Reporter

Comment 5

•

13 years ago

Comment on attachment 650587 [details] [diff] [review] Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly This patch didn't apply cleanly. Is there any prerequisties to apply this patch? applying nonUTF8.patch patching file dom/file/ArchiveEvent.cpp Hunk #1 FAILED at 91 1 out of 1 hunks FAILED -- saving rejects to file dom/file/ArchiveEvent.cpp.rej patching file dom/file/test/Makefile.in Hunk #1 FAILED at 30 1 out of 1 hunks FAILED -- saving rejects to file dom/file/test/Makefile.in.rej patch failed, unable to continue (try -v) patch failed, rejects left in working dir errors during apply, please fix and refresh nonUTF8.patch

Andrea Marchesini [:baku]

Assignee

Comment 6

•

13 years ago

yes, sorry. Apply this patch too: https://bugzilla.mozilla.org/show_bug.cgi?id=781153

Masatoshi Kimura [:emk]

Reporter

Comment 7

•

13 years ago

Comment on attachment 650587 [details] [diff] [review] Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly This patch, I fear, misidentified some Japanese filenames. That being said, it will be much better than the current. I didn't find any obvious problems in the patch, but it should be reviewed by an official reviewer. > + } > + > + // Let's try to guess the charset: > + else { nit: don't put newlines between "}" and "else".

Attachment #650587 - Flags: review?(jst)

Attachment #650587 - Flags: review?(VYV03354)

Attachment #650587 - Flags: feedback+

Masatoshi Kimura [:emk]

Reporter

Comment 8

•

13 years ago

Attached file Failing example for the record — Details

Actual result: ŠG/ˆê–‡–Ú.txt 絵/最終.txt ŠG/ŽO–‡–Ú.txt ŠG/Žl–‡–Ú.txt ŠG/“ñ–‡–Ú.txt Expected result: 絵/一枚目.txt 絵/最終.txt 絵/三枚目.txt 絵/四枚目.txt 絵/二枚目.txt

Andrea Marchesini [:baku]

Assignee

Comment 9

•

13 years ago

It seems nonsense but I see the same result when I open the attachment on my win7. win-to-win fails too... in CC Henri Sivonen may have some better idea about this issue. Thanks.

Masatoshi Kimura [:emk]

Reporter

Comment 10

•

13 years ago

(In reply to Andrea Marchesini (:baku) from comment #9) > It seems nonsense but I see the same result when I open the attachment on my > win7. win-to-win fails too... If your system code page (a.k.a. "Language for non-Unicode programs") is not Japanese, it's expected to "fail" to display the filename. It inherently depends on the system settings.

Masatoshi Kimura [:emk]

Reporter

Comment 11

•

13 years ago

(In reply to Andrea Marchesini (:baku) from comment #3) > The fallback to the platform charset fails if the local charset is Unicode > but the filename is not. > > NS_CopyNativeToUnicode(filename) returns NS_OK printing a warning and > returning a non-valid Unicode string. > > ...but maybe this is the wrong method. Any hints? How about 1. Try UTF-8. 2. If the platform charset is not UTF-8, try it. 3. Use the charset from GuessCharset() as a last resort. 4. If GuessCharset() failed or returned "UTF-8", use ISO-8859-1 (or IBM Code Page 437, according to zip spec) to prevent dataloss (we already know the filename is not encoded in UTF-8).

Andrea Marchesini [:baku]

Assignee

Comment 12

•

13 years ago

Attached patch Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly (obsolete) — Details — Splinter Review

Ok! This is "exactly" what you suggested :) 1. UTF8 2. native charset 3. guess charset 4. ISO8859-1 It's green on try but I need your help to test it on a non-UTF8 OS. Thanks! b

Attachment #650587 - Attachment is obsolete: true

Attachment #650587 - Flags: review?(jst)

Attachment #651106 - Flags: review?(jst)

Attachment #651106 - Flags: feedback?(VYV03354)

Masatoshi Kimura [:emk]

Reporter

Updated

•

13 years ago

Attachment #651106 - Flags: feedback?(VYV03354) → feedback+

Henri Sivonen (:hsivonen) (on vacation)

Comment 13

•

13 years ago

(In reply to Masatoshi Kimura [:emk] from comment #0) > It hard-codes the filename encoding UTF-8. Windows compressed folder creates > zip files whose filenames are encoded in the ANSI code-page. Why do we need to support non-ASCII filenames in .zip files created with non-UTF-8 tools? What concrete problem is being solved here? Are these .zip files packaged for Firefox or random existing .zip files off the Web? If packaged for Firefox, I think it's reasonable to enforce an UTF-8-only policy. Authors with defective tools can always use ASCII-only file names if they can't generate UTF-8 otherwise. (In reply to Masatoshi Kimura [:emk] from comment #11) > How about > 1. Try UTF-8. > 2. If the platform charset is not UTF-8, try it. Making the handling of .zip contents depending on Firefox platform seems like a bad idea on the face of things. What scenario makes this a sensible course of action? > 3. Use the charset from GuessCharset() as a last resort. Does this mean using chardet? The behavior of chardet isn't well specified. While we need something like it for legacy HTML content out there, I think we should try hard to avoid adding new dependencies on code whose behavior is effectively defined as "whatever the ancient piece of Netscape code happens to do". Adding dependencies like that means that any changes to the mystery code involved (chardet) can break random things, because no one would know for sure which behaviors are ones that some files depend on.

Masatoshi Kimura [:emk]

Reporter

Comment 14

•

13 years ago

> Authors with defective tools can always use ASCII-only file names if they can't generate UTF-8 otherwise. The problem is Windows standard zip folder is "defective". It is very very common Zip files contain filenames in local encoding on Windows. And most Windows users do not even know the problem exists unless they exchange the zip file with non-Windows environments.

Masatoshi Kimura [:emk]

Reporter

Comment 15

•

13 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #13) > > 3. Use the charset from GuessCharset() as a last resort. > > Does this mean using chardet? The behavior of chardet isn't well specified. > While we need something like it for legacy HTML content out there, I think > we should try hard to avoid adding new dependencies on code whose behavior > is effectively defined as "whatever the ancient piece of Netscape code > happens to do". Adding dependencies like that means that any changes to the > mystery code involved (chardet) can break random things, because no one > would know for sure which behaviors are ones that some files depend on. That said, I will not oppose removing the GuessCharset() usage.

Henri Sivonen (:hsivonen) (on vacation)

Comment 16

•

13 years ago

(In reply to Masatoshi Kimura [:emk] from comment #14) > > Authors with defective tools can always use ASCII-only file names if they can't generate UTF-8 otherwise. > The problem is Windows standard zip folder is "defective". It is very very > common Zip files contain filenames in local encoding on Windows. What's the zip reading usage scenario being addressed here? Are we talking about reading zip files authored for Firefox (extensions, packaged Web apps) or are we talking about using Gecko to implement an extractor for random zip files for B2G? > And most > Windows users do not even know the problem exists unless they exchange the > zip file with non-Windows environments. Making the platform encoding (of the platform Gecko happens to be running on) part of the guessing doesn't seem to help with that!

Masatoshi Kimura [:emk]

Reporter

Comment 17

•

13 years ago

Other compromise is: 1. Try UTF-8. 2. If it failed, fallback to IBM437 per zip spec. If non-UTF-8 filenames are forcefully decoded in UTF-8, some characters will turn into replacement characters and some files may become inaccessible. IBM437 fallback will prevent this because it is reversible. Page authors can workaround for the broken zip utility (including Windows built-in) on their own responsibility if they want. Also this behavior is independent from the local environment. Thoughts?

Masatoshi Kimura [:emk]

Reporter

Comment 18

•

13 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #16) > What's the zip reading usage scenario being addressed here? Are we talking > about reading zip files authored for Firefox (extensions, packaged Web apps) > or are we talking about using Gecko to implement an extractor for random zip > files for B2G? One usage scenario is client-side preview of user uploaded files. Service providers can't force all users using UTF-8-aware zip utility. > Making the platform encoding (of the platform Gecko happens to be running > on) part of the guessing doesn't seem to help with that! It helps because it's sufficient users can view their own filenames on the above usage scenario.

Henri Sivonen (:hsivonen) (on vacation)

Comment 19

•

13 years ago

(In reply to Masatoshi Kimura [:emk] from comment #17) > Other compromise is: > 1. Try UTF-8. > 2. If it failed, fallback to IBM437 per zip spec. This seems a lot better, since it's deterministic, independent of the local environment and doesn't depend on non-standard probabilities baked into chardet. It might be prudent to make the fallback togglable by the caller (defaulting to the old behavior) so that the fallback doesn't get introduced into places where the zip decoding code has been so far been used successfully with an UTF-8-only policy. (In reply to Masatoshi Kimura [:emk] from comment #18) > One usage scenario is client-side preview of user uploaded files. Service > providers can't force all users using UTF-8-aware zip utility. Is there a concrete plan for adding such a feature to Firefox?

Masatoshi Kimura [:emk]

Reporter

Comment 20

•

13 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #19) > This seems a lot better, since it's deterministic, independent of the local > environment and doesn't depend on non-standard probabilities baked into > chardet. One problem is that IBM437 is not supported by the Encoding Standard. It may be better to hard-code other round-trippable encoding (such as windows-1252). > It might be prudent to make the fallback togglable by the caller (defaulting > to the old behavior) so that the fallback doesn't get introduced into places > where the zip decoding code has been so far been used successfully with an > UTF-8-only policy. General purpose flag bit 11 will be useful, too. The fallback should be enabled only if the bit is zero. BTW what should be done in case non-UTF8 bytes are found when the fallback is disabled? Currently, filenames are displayed as if they are empty. > (In reply to Masatoshi Kimura [:emk] from comment #18) > > One usage scenario is client-side preview of user uploaded files. Service > > providers can't force all users using UTF-8-aware zip utility. > > Is there a concrete plan for adding such a feature to Firefox? There's no such a plan, AFAIK.

Andrea Marchesini [:baku]

Assignee

Comment 21

•

13 years ago

(In reply to Masatoshi Kimura [:emk] from comment #20) > One problem is that IBM437 is not supported by the Encoding Standard. It may > be better to hard-code other round-trippable encoding (such as windows-1252). I'm not able to decide this. I prefer you decide which encoding we should use.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 22

•

13 years ago

> > One usage scenario is client-side preview of user uploaded files. Service > > providers can't force all users using UTF-8-aware zip utility. > > Is there a concrete plan for adding such a feature to Firefox? I think the intent is that websites can use the ArchiveReader API to create such a UI. In which case the website doesn't have control over the zip files used with the API.

Masatoshi Kimura [:emk]

Reporter

Comment 23

•

13 years ago

(In reply to Andrea Marchesini (:baku) from comment #21) > > One problem is that IBM437 is not supported by the Encoding Standard. It may > > be better to hard-code other round-trippable encoding (such as windows-1252). > I'm not able to decide this. I prefer you decide which encoding we should > use. At least, GuessCharset() fallback should be removed unless the use-case is provided. I'd prefer leaving the platform charset fallback. (In reply to Jonas Sicking (:sicking) from comment #22) > I think the intent is that websites can use the ArchiveReader API to create > such a UI. In which case the website doesn't have control over the zip files > used with the API. Correct. While websites can write a workaround as long as the fallback is invertible and deterministic, it would be better that websites "just work" without writing such a workaround.

Johnny Stenback (:jst)

Comment 24

•

13 years ago

This is a tricky one... Given that the primary use case for this API is to enable web applications to use zip files in their apps I think that having non-deterministic file name handling is less than ideal, and likewise having platform dependent behavior is also likely to make lives harder for app developers. If our behavior is locale dependent then developers will likely fall into a hole where they'll be able to create zip files locally and have them work fine in their own testing in their own locale, but cause broken behavior on other locales. So that's one use case, where developers explicitly create zip files for use in their own apps, but another use case is of course exploring zip files that the app developer knows nothing about in which case the locale may help, and might get it wrong too. And a third use case is where an app expects zip files from the user of the app in which case using the locale is likely, but by no means guaranteed, to do the right thing. Given those different use cases it seems hard to have this API be fully interoperable between different implementations, given how broken zip files are. And that leads me to think that maybe we need help from the API itself here. In the first use case it seems the way to guarantee interop would be to either require UTF-8 filenames, or let the developers specify the charset, for the other cases there doesn't seem to be a interoperable way to do this, short of defining how the charset guessing game should be played. Jonas tells me that there's discussion related to this happening on the working group mailing list, and there's talk about not requiring filenames for keys to access data in a zip file, so if that's changed in the API this becomes less important... I think we should see where that conversation goes before deciding what to do here.

Andrea Marchesini [:baku]

Assignee

Comment 25

•

13 years ago

Attached patch Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly (obsolete) — Details — Splinter Review

As discussed, the ArchiveReader accepts an optional parameter: var ar = ArchiveReader(blob, { "encoding" : "ISO-8859-1" }); This encoding format is used for the filenames contained in the zip file if these are not UTF-8. This patch also implements an ArchiveReader.getFiles() that returns (through an ArchiveRequest) an array of DOM files. I have to propose this changes to the public mailing-list, but in the meantime I would ask a feedback.

Attachment #651106 - Attachment is obsolete: true

Attachment #651106 - Flags: review?(jst)

Attachment #653605 - Flags: review?(jonas)

Attachment #653605 - Flags: feedback?(VYV03354)

Masatoshi Kimura [:emk]

Reporter

Comment 26

•

13 years ago

Comment on attachment 653605 [details] [diff] [review] Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly >+ NS_ASSERTION(NS_IsMainThread(), "main thread only"); Are any main-thread-only things really present? ConvertStringFromCharset uses nsICharsetConverterManager and nsIUnicodeDecoder, and both are thread-safe IIRC. The overall idea is great. This is deterministic and Webpage authors don't have to write any hackish workarounds.

Attachment #653605 - Flags: feedback?(VYV03354) → feedback+

Andrea Marchesini [:baku]

Assignee

Comment 27

•

13 years ago

Attached patch Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly (obsolete) — Details — Splinter Review

I agree. MainThread check removed + title updated.

Attachment #653605 - Attachment is obsolete: true

Attachment #653605 - Flags: review?(jonas)

Attachment #653772 - Flags: review?(jonas)

Masatoshi Kimura [:emk]

Reporter

Comment 28

•

13 years ago

Comment on attachment 653772 [details] [diff] [review] Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly >@@ -30,25 +36,55 @@ ArchiveZipItem::~ArchiveZipItem() >+ if (mOptions.encoding.IsEmpty()) >+ return NS_ERROR_FAILURE; The current implementation will fail if the filename is not encoded in UTF-8 and if the fallback encoding was not given in the constructor. Could you set the fallback encoding to "windows-1252" by default? The mailing list proposals also against to failing. http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2012-August/036918.html http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2012-August/036923.html

Masatoshi Kimura [:emk]

Reporter

Comment 29

•

13 years ago

It's very easy to set a default value of a dictionary member. +dictionary ArchiveReaderOptions +{ + DOMString encoding = "windows-1252";

Andrea Marchesini [:baku]

Assignee

Comment 30

•

13 years ago

I would suggest to have the default value "utf-8". In this way we don't have 2 steps for decoding the filenames but just one. Actually because a non-UTF8 string can be decoded as UTF-8 and this makes this API not predictable.

Masatoshi Kimura [:emk]

Reporter

Comment 31

•

13 years ago

(In reply to Andrea Marchesini (:baku) from comment #30) > I would suggest to have the default value "utf-8". > In this way we don't have 2 steps for decoding the filenames but just one. The fallback will be used when the string is not valid UTF-8. So "fallback to UTF-8" is meaningless. > Actually because a non-UTF8 string can be decoded as UTF-8 and this makes > this API not predictable. No, non-UTF8 string can not be decoded as UTF-8 without dataloss. For example, both "0x81 0x40" and "0x82 0x40" would be decoded into "U+FFFD U+0040" in UTF-8.

Andrea Marchesini [:baku]

Assignee

Comment 32

•

13 years ago

What I meant is: 0xEA 0x96 0xB6 0x2E 0x74 0x78 0x74 this is a valid UTF-8 string (ꖶ.txt) but it can be: "ê–¶.txt" in ISO-8859-1 and 齧ｶ.txt in Shift_JIS.

Masatoshi Kimura [:emk]

Reporter

Comment 33

•

13 years ago

(In reply to Andrea Marchesini (:baku) from comment #32) > What I meant is: > 0xEA 0x96 0xB6 0x2E 0x74 0x78 0x74 this is a valid UTF-8 string (ꖶ.txt) > but it can be: "ê–¶.txt" in ISO-8859-1 and 齧ｶ.txt in Shift_JIS. In this case, the filename will be decoded as UTF-8 because the fallback will be called only when the filename is invalid as UTF-8. I don't propose make the fallback encoding prefer over UTF-8 (like Glen's proposal). I just say the filename should be decoded as some non-UTF-8 encodings if and only if the filename cannot be decoded as UTF-8 (like Henri's proposal).

Andrea Marchesini [:baku]

Assignee

Comment 34

•

13 years ago

I think, this is a problem because it means there is no way to have the 'real' filename: UTF-8 will be always able to decode the string (wrongly) and the fallback encoding will not be used. The fallback approach makes the decoding not predictable. but, again: I prefer to read what Henri and Jonas and maybe other guys think about this issue :)

Masatoshi Kimura [:emk]

Reporter

Comment 35

•

13 years ago

(In reply to Andrea Marchesini (:baku) from comment #34) > I think, this is a problem because it means there is no way to have the > 'real' filename: UTF-8 will be always able to decode the string (wrongly) > and the fallback encoding will not be used. If the fallback encoding will not be used, why we are bothering to add fallback handling in the first place? NS_ConvertUTF8toUTF16 *FAILS* if the string is invalid as UTF-8, and it is exactly the cause of this bug. Your current patch will really fail to decode the filename if the filename is invalid as UTF-8 and the fallback encoding is not given. What your patch does is not what you says. Even worse, now those files will be completely ignored from the getFilenames() list. People may not even notice some files are silently ignored. > The fallback approach makes the decoding not predictable. It is 100% predictable what sequence is invalid and what sequence will be decoded by the fallback encoding. The concept "invalid UTF-8 sequence" is well-defined by the Unicode Standard, RFC 3628, the Encoding spec, etc. > but, again: I > prefer to read what Henri and Jonas and maybe other guys think about this > issue :) Henri said: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2012-August/036918.html > If no fallback encoding is provided by the caller > of the constructor, "Windows-1252" is set as the fallback encoding.

Andrea Marchesini [:baku]

Assignee

Comment 36

•

13 years ago

Attached patch Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly (obsolete) — Details — Splinter Review

new patch :)

Attachment #653772 - Attachment is obsolete: true

Attachment #653772 - Flags: review?(jonas)

Attachment #656835 - Flags: review?(jonas)

Attachment #656835 - Flags: feedback?(VYV03354)

Masatoshi Kimura [:emk]

Reporter

Comment 37

•

13 years ago

Comment on attachment 656835 [details] [diff] [review] Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly Could you add --- [diff] git = 1 showfunc = 1 unified = 8 --- to your .hgrc to make the review easier? (See <https://developer.mozilla.org/en-US/docs/Mercurial_FAQ>) >+ nsCString mFilenameU; >+ArchiveZipItem::GetFilename(nsCString& aFilename) nsString would be better to reduce converting UTF-16 to UTF-8 and converting back UTF-8 to UTF-16. >+ ArchiveReaderOptions mOptions; Don't pass around the entire dictionary class. Only store the required member (that is, encoding). >+ if (mOptions.encoding.IsEmpty()) >+ return NS_ERROR_FAILURE; Move this check to ArchiveReader::Initialize(). >+ // Let's use the econding value for the dictionary Typo.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

13 years ago

Attachment #656835 - Flags: review?(jonas) → review?(mounir)

Mounir Lamouri (:mounir)

Comment 38

•

13 years ago

Comment on attachment 656835 [details] [diff] [review] Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly Review of attachment 656835 [details] [diff] [review]: ----------------------------------------------------------------- Andrea, your patch seems to be doing very different things like fixing small thinks, adding a GetFiles() method and doing what the bug is saying. Usually, it is preferred to divide the patches in a way that each of them is doing one logical thing. In that case, having a patch for all the random fixes, a patch for GetFiles() and a patch for the real bug would be appropriate. Ideally, you could even open a bug for each of those, but do as you prefer. Having separated patches makes the log more readable and helps reviewing.

Andrea Marchesini [:baku]

Assignee

Comment 39

•

13 years ago

Mounir, this patch is the output of a discussion on WHATWG and a talk with Jonas. The way we want to fix this bug is: 1. to add a custom 'optional' parameter for the ArchiveReader that can contain the encoding for the filenames 2. to add a getFiles() that returns an array of DOM Files. Without this method, a web developer must get the list of filenames from getFilenames() and then he/she has to use the filenames with getFile(). But the filenames could have some 'encoding' problem. Read comment 32 for some example. This getFiles() does not involve the filenames, so this cannot have any problem with encoding.

Andrea Marchesini [:baku]

Assignee

Comment 40

•

13 years ago

> >+ ArchiveReaderOptions mOptions; > Don't pass around the entire dictionary class. Only store the required > member (that is, encoding). the encoding is a 'attribute' useful just for the 'zip' format. Right now we support just ZIP but in the future we could support more than 1 archive format. So probably it's better to pass the dictionary and any format can use what it knows. Maybe we can avoid to copy it and use just a pointer.

Masatoshi Kimura [:emk]

Reporter

Comment 41

•

13 years ago

(In reply to Andrea Marchesini (:baku) from comment #39) > Mounir, this patch is the output of a discussion on WHATWG and a talk with > Jonas. The way we want to fix this bug is: > > 1. to add a custom 'optional' parameter for the ArchiveReader that can > contain the encoding for the filenames > > 2. to add a getFiles() that returns an array of DOM Files. Without this > method, a web developer must get the list of filenames from getFilenames() > and then he/she has to use the filenames with getFile(). But the filenames > could have some 'encoding' problem. Read comment 32 for some example. This > getFiles() does not involve the filenames, so this cannot have any problem > with encoding. Then let's separate the patch into two pieces on your patch-queue and ask for the review about both patches. You don't have to separate the bug if you prefer.

Andrea Marchesini [:baku]

Assignee

Comment 42

•

13 years ago

Attached patch patch 1 - encoding + dictionary (obsolete) — Details — Splinter Review

This patch contains only the dictionary + encoding issue.

Attachment #656835 - Attachment is obsolete: true

Attachment #656835 - Flags: review?(mounir)

Attachment #656835 - Flags: feedback?(VYV03354)

Attachment #657548 - Flags: superreview?(VYV03354)

Attachment #657548 - Flags: review?(mounir)

Andrea Marchesini [:baku]

Assignee

Comment 43

•

13 years ago

Attached patch patch 1b - getFiles (obsolete) — Details — Splinter Review

This patch implements getFiles()

Attachment #657549 - Flags: superreview?(VYV03354)

Attachment #657549 - Flags: review?(mounir)

Masatoshi Kimura [:emk]

Reporter

Comment 44

•

13 years ago

Actually, it looks like the first patch contains getFiles() implementation and the second patch contains nsCString -> nsString changes.

Masatoshi Kimura [:emk]

Reporter

Comment 45

•

13 years ago

Comment on attachment 657549 [details] [diff] [review] patch 1b - getFiles I'm not a superreviwer. +++ b/dom/file/ArchiveEvent.cpp @@ -87,29 +87,30 @@ ArchiveReaderEvent::ShareMainThread() >+ nsCString filename = NS_ConvertUTF16toUTF8(tmp); Change this nsCString to nsString, too. > nsCString type; > if (NS_SUCCEEDED(GetType(filename, type))) >- item->SetType(type); >+ item->SetType(NS_ConvertUTF8toUTF16(type)); And here. +++ b/dom/file/ArchiveZipEvent.h @@ -19,35 +19,35 @@ BEGIN_FILE_NAMESPACE >- ArchiveReaderOptions mOptions; >+ ArchiveReaderOptions& mOptions; Is it guaranteed that ArchiveReaderZipEvent will never be gc'ed as long as ArchiveZipItem is live? Also, you don't have to pass the entire dictionary here because you already know the archive is a zip file in ArchiveReaderZipEvent.

Attachment #657549 - Flags: superreview?(VYV03354) → feedback-

Masatoshi Kimura [:emk]

Reporter

Comment 46

•

13 years ago

Comment on attachment 657548 [details] [diff] [review] patch 1 - encoding + dictionary This patch fails to apply with the following error: applying nonUTF8.patch patching file dom/file/ArchiveRequest.cpp Hunk #2 FAILED at 139 1 out of 4 hunks FAILED -- saving rejects to file dom/file/ArchiveRequest.cpp.re j patch failed, unable to continue (try -v) patch failed, rejects left in working dir errors during apply, please fix and refresh nonUTF8.patch Did you pull the latest source from the tree?

Attachment #657548 - Flags: superreview?(VYV03354)

Andrea Marchesini [:baku]

Assignee

Comment 47

•

13 years ago

> +++ b/dom/file/ArchiveEvent.cpp > @@ -87,29 +87,30 @@ ArchiveReaderEvent::ShareMainThread() > >+ nsCString filename = NS_ConvertUTF16toUTF8(tmp); > Change this nsCString to nsString, too. I need to extract the extension. This is the reason why I need to convert it to a nsCString. The extension is used as parameter for the mimeservice and it must be a nsCString. > > nsCString type; > > if (NS_SUCCEEDED(GetType(filename, type))) > >- item->SetType(type); > >+ item->SetType(NS_ConvertUTF8toUTF16(type)); > And here. The output of mimeservice is a nsCString. When I get the type I convert it to a nsString. > Is it guaranteed that ArchiveReaderZipEvent will never be gc'ed as long as > ArchiveZipItem is live? > Also, you don't have to pass the entire dictionary here because you already > know the archive is a zip file in ArchiveReaderZipEvent. good point.

Masatoshi Kimura [:emk]

Reporter

Comment 48

•

13 years ago

Even when I used getFiles(), if the fallback encoding is utf-8 and if the filename is invalid as utf-8, the file is silently ignored. (In reply to Andrea Marchesini (:baku) from comment #39) > But the filenames > could have some 'encoding' problem. Read comment 32 for some example. This > getFiles() does not involve the filenames, so this cannot have any problem > with encoding. The current implementation defeats the purpose of getFiles(). It has become just a syntax sugar of getFilenames() + getFile().

Masatoshi Kimura [:emk]

Reporter

Comment 49

•

13 years ago

And I think the failure should be notified somehow.

Mounir Lamouri (:mounir)

Comment 50

•

13 years ago

The second patch is labelled as implementing GetFiles() but it seems that the first patch have some traces of GetFiles(). Is that on purpose?

Andrea Marchesini [:baku]

Assignee

Comment 51

•

13 years ago

(In reply to Mounir Lamouri (:mounir) from comment #50) > The second patch is labelled as implementing GetFiles() but it seems that > the first patch have some traces of GetFiles(). Is that on purpose? my fault. I did a mess with the 2 patches :)

Andrea Marchesini [:baku]

Assignee

Comment 52

•

13 years ago

Attached patch patch 2 - encoding + dictionary (obsolete) — Details — Splinter Review

this patch doesn't contain any getFiles() code.

Attachment #657548 - Attachment is obsolete: true

Attachment #657548 - Flags: review?(mounir)

Attachment #657804 - Flags: review?(mounir)

Attachment #657804 - Flags: feedback?(VYV03354)

Andrea Marchesini [:baku]

Assignee

Comment 53

•

13 years ago

Attached patch patch 2 - getFiles() (obsolete) — Details — Splinter Review

GetFiles()

Attachment #657549 - Attachment is obsolete: true

Attachment #657549 - Flags: review?(mounir)

Attachment #657805 - Flags: review?(mounir)

Attachment #657805 - Flags: feedback?(VYV03354)

Masatoshi Kimura [:emk]

Reporter

Comment 54

•

13 years ago

getFiles() still silently ignores files with an invalid name. > ArchiveZipItem::File(ArchiveReader* aArchiveReader) > { >- return new ArchiveZipFile(NS_ConvertUTF8toUTF16(mFilename), >- NS_ConvertUTF8toUTF16(GetType()), >+ nsString filename; >+ nsresult rv = GetFilename(filename); >+ >+ if (NS_FAILED(rv)) >+ return 0; This check made a dependency on the filename. Is it possible to create a File object anyway even if the conversion failed?

Andrea Marchesini [:baku]

Assignee

Comment 55

•

13 years ago

No. A DOM File must have a name and a type.

Masatoshi Kimura [:emk]

Reporter

Comment 56

•

13 years ago

But it can be an empty string. http://dev.w3.org/2006/webapi/FileAPI/#attributes-blob > type > ... If conforming user agents cannot determine the media type of the Blob, they must return the empty string. ... http://dev.w3.org/2006/webapi/FileAPI/#file-attrs > name > ... On getting, if user agents cannot make this information available, they must return the empty string.

Andrea Marchesini [:baku]

Assignee

Comment 57

•

13 years ago

mmm... but what's the point to have an 'unnamed' file from a ZIP archive? We can implement this, but it's a separated bug.

Masatoshi Kimura [:emk]

Reporter

Comment 58

•

13 years ago

> mmm... but what's the point to have an 'unnamed' file from a ZIP archive? It would be accessible via index from the getFiles() result. I thought it was the raison d'etre of getFiles(). > We can implement this, but it's a separated bug. I'm fine with separating the bug.

Masatoshi Kimura [:emk]

Reporter

Updated

•

13 years ago

Attachment #657804 - Flags: feedback?(VYV03354) → feedback+

Masatoshi Kimura [:emk]

Reporter

Updated

•

13 years ago

Attachment #657805 - Flags: feedback?(VYV03354) → feedback+

Masatoshi Kimura [:emk]

Reporter

Comment 59

•

13 years ago

Good News: ConvertStringFromCharset() will no longer fail with invalid UTF-8 sequence due to my patch for bug 784367.

Andrea Marchesini [:baku]

Assignee

Updated

•

13 years ago

Whiteboard: [need review]

Masatoshi Kimura [:emk]

Reporter

Comment 60

•

13 years ago

Mounir, ping? I want this feature in Mozilla 18.

Mounir Lamouri (:mounir)

Comment 61

•

13 years ago

(In reply to Masatoshi Kimura [:emk] from comment #60) > Mounir, ping? I want this feature in Mozilla 18. I will review this today as soon as I am able to catch Andrea on IRC.

Mounir Lamouri (:mounir)

Comment 62

•

13 years ago

Comment on attachment 657804 [details] [diff] [review] patch 2 - encoding + dictionary Review of attachment 657804 [details] [diff] [review]: ----------------------------------------------------------------- Some general comments: I think ArchiveItem and ArchiveReaderEvent should be prefixed with a 'I'. Those are interfaces as far as I understand it. Also, it's great that you add comments on top of your classes (really, *way* simpler to understand the code) but it would be awesome if you could use this format: /** * Comment */ Which is the common java doc syntax used by most doc generator, AFAIK. I agree with the general approach of that patch but I had to request a lot of changes so I would like to review the updated patch. ::: dom/file/ArchiveEvent.cpp @@ +94,5 @@ > > + nsString tmp; > + nsresult rv = item->GetFilename(tmp); > + nsCString filename = NS_ConvertUTF16toUTF8(tmp); > + if (NS_FAILED(rv)) Why do you convert to UTF8? @@ +110,5 @@ > > // This is a nsDOMFile: > nsRefPtr<nsIDOMFile> file = item->File(mArchiveReader); > + if (file) > + fileList.AppendElement(file); nit: coding style: if (file) { foo; } ::: dom/file/ArchiveEvent.h @@ +28,5 @@ > virtual ~ArchiveItem(); > > // Getter/Setter for the type > + virtual nsString GetType(); > + virtual void SetType(const nsString& aType); No need for "virtual" here. This is re-defined nowhere. If it happens some day, "virtual" can be added. Also, I do not understand why you switched from nsCString to nsString here. It seems that mType contains the mime type of the file and we get that info in UTF8. Converting to UTF16 seems to be a waste of time and memory. @@ +33,3 @@ > > // Getter for the filename > + virtual nsresult GetFilename(nsString& aFilename) = 0; Later, I put a comment that says you should probably do: virtual nsString GetFilename() = 0; @@ +37,5 @@ > // Generate a DOMFile > virtual nsIDOMFile* File(ArchiveReader* aArchiveReader) = 0; > > protected: > + nsString mType; I think that should be: nsCString mType; see above. ::: dom/file/ArchiveReader.cpp @@ +52,4 @@ > nsCOMPtr<nsIDOMBlob> blob; > blob = do_QueryInterface(nsContentUtils::XPConnect()->GetNativeOfWrapper(aCx, obj)); > if (!blob) { > return NS_ERROR_UNEXPECTED; This should be NS_ERROR_INVALID_ARG, not NS_ERROR_UNEXPECTED. Same for the NS_ENSURE_TRUE above. It's not unexpected to have: var r = new ArchiveReader(); or: var r = new ArchiveReader("foo"); ... That could happen in a web content. @@ +54,5 @@ > if (!blob) { > return NS_ERROR_UNEXPECTED; > } > > mBlob = blob; Maybe you could set mBlob at the end, when you know everything is fine? @@ +57,5 @@ > > mBlob = blob; > > + // Extra param is an object > + if (aArgc > 1) { Earlier, you are doing: NS_ENSURE_TRUE(aArgc > 0, ...); you should probably do: NS_ENSURE_TRUE(aArgc == 1 || aArgc == 2, NS_ERROR_INVALID_ARG); IOW, you shouldn't get aArgc > 2 or < 1. @@ +59,5 @@ > > + // Extra param is an object > + if (aArgc > 1) { > + nsresult rv = mOptions.Init(aCx, &aArgv[1]); > + NS_ENSURE_SUCCESS(rv, rv); if (NS_FAILED(mOptions.Init(aCx, &aArgv[1]))) { return NS_ERROR_INVALID_ARG; } ::: dom/file/ArchiveRequest.cpp @@ +209,5 @@ > NS_ENSURE_SUCCESS(rv, rv); > > if (filename == mFilename) { > + nsresult rv = nsContentUtils::WrapNative(aCx, JS_GetGlobalForScopeChain(aCx), > + file, &NS_GET_IID(nsIDOMFile), aValue); nit: 80 chars limit ::: dom/file/ArchiveZipEvent.cpp @@ +27,3 @@ > : mFilename(aFilename), > + mCentralStruct(aCentralStruct), > + mOptions(aOptions) This is not following the coding convention. We usually do: Class::Class() : mFoo , mBar { } Which has the advantage or only changing one line if you add a member variable. Also, why not setting mFilename to the correct value here instead of waiting for it being accessed and waste memory? @@ +41,3 @@ > { > + if (mOptions.encoding.IsEmpty()) > + return NS_ERROR_FAILURE; nit: the coding style says that you should always have { } even for one-liners. @@ +43,5 @@ > + return NS_ERROR_FAILURE; > + > + nsString filenameU; > + nsresult rv = nsContentUtils::ConvertStringFromCharset(NS_ConvertUTF16toUTF8(mOptions.encoding), > + mFilename, filenameU); nit: 80 chars limit @@ +44,5 @@ > + > + nsString filenameU; > + nsresult rv = nsContentUtils::ConvertStringFromCharset(NS_ConvertUTF16toUTF8(mOptions.encoding), > + mFilename, filenameU); > + NS_ENSURE_SUCCESS(rv, rv); The default behaviour when we fail to convert |mFilename| to the target charset, we should do |mFilenameU = mFilename;| It's better than having GetFilename() returning the empty string. @@ +59,1 @@ > { IMO, this should only return mFilename which is already UTF16. That would allow you to have this method returns nsString instead of taking an out parameter. @@ +81,5 @@ > + nsString filename; > + nsresult rv = GetFilename(filename); > + > + if (NS_FAILED(rv)) > + return 0; I think that code should be removed but two things here: 1. this is a nit but we usually do: if (NS_FAILED(GetFilename(filename))) { when we don't return |rv|. 2. never do |return 0;| ;) but: return nullptr; ::: dom/file/ArchiveZipEvent.h @@ +20,5 @@ > { > public: > ArchiveZipItem(const char* aFilename, > + ZipCentral& aCentralStruct, > + ArchiveReaderOptions& aOptions); Why aren't those two things const ref? @@ +48,5 @@ > class ArchiveReaderZipEvent : public ArchiveReaderEvent > { > public: > + ArchiveReaderZipEvent(ArchiveReader* aArchiveReader, > + ArchiveReaderOptions& aOptions); Shouldn't that be a const ref? ::: dom/file/test/test_archivereader_nonUnicode.html @@ +9,5 @@ > + <script type="text/javascript" src="/tests/SimpleTest/SimpleTest.js"></script> > + <link rel="stylesheet" type="text/css" href="/tests/SimpleTest/test.css"/> > + > + <script type="text/javascript;version=1.7"> > + function createNonUnicodeData() { I will just believe that this method is doing what it says ;) @@ +11,5 @@ > + > + <script type="text/javascript;version=1.7"> > + function createNonUnicodeData() { > + var Cc = SpecialPowers.wrap(Components).classes; > + var Ci = SpecialPowers.wrap(Components).interfaces; nit: just FTR, you could use |const| instead of |var| here if you are using js 1.7. @@ +45,5 @@ > + // GetFilename > + var handle = r.getFilenames(); > + isnot(handle, null, "ArchiveReader.getFilenames() cannot be null"); > + handle.onsuccess = function() { > + ok(true, "ArchiveReader.getFilenames() should return a 'success'"); No need for that check, if we don't go there, the test will fail by timing out. @@ +49,5 @@ > + ok(true, "ArchiveReader.getFilenames() should return a 'success'"); > + is(this.result instanceof Array, true, "ArchiveReader.getFilenames() should return an array"); > + is(this.result.length, 1, "ArchiveReader.getFilenames(): the array contains 1 item"); > + ok(this.reader, r, "ArchiveRequest.reader should be == ArchiveReader"); > + dump('Content: ' + this.result[0] + '\n'); Could you comment that? It's clearly needed to show info if you want to debug but not in the general case. @@ +85,5 @@ > + <script type="text/javascript;version=1.7" src="helpers.js"></script> > + > +</head> > + > +<body onload="runTest();"> I think it's better to use addLoadEvent(runTest); or even: addLoadEvent(function() { // content of runtest. });

Attachment #657804 - Flags: review?(mounir)

Attachment #657804 - Flags: review-

Attachment #657804 - Flags: feedback+

Mounir Lamouri (:mounir)

Comment 63

•

13 years ago

Comment on attachment 657804 [details] [diff] [review] patch 2 - encoding + dictionary >diff --git a/dom/file/nsIDOMArchiveReader.idl b/dom/file/nsIDOMArchiveReader.idl >--- a/dom/file/nsIDOMArchiveReader.idl >+++ b/dom/file/nsIDOMArchiveReader.idl >@@ -9,15 +9,20 @@ interface nsIDOMArchiveRequest; > > [scriptable, builtinclass, uuid(a616ab85-fc3a-4028-9f10-f8620ee1b8e1)] > interface nsIDOMArchiveReader : nsISupports > { > nsIDOMArchiveRequest getFilenames(); > nsIDOMArchiveRequest getFile(in DOMString filename); > }; > >+dictionary ArchiveReaderOptions >+{ >+ DOMString encoding = "windows-1252"; >+}; Could you put a comment explaining why this as been chosen as the default value?

Mounir Lamouri (:mounir)

Comment 64

•

13 years ago

Comment on attachment 657805 [details] [diff] [review] patch 2 - getFiles() Review of attachment 657805 [details] [diff] [review]: ----------------------------------------------------------------- r/sr=me khuey, could you review this patch? Particularly the JS API usage? ::: dom/file/ArchiveRequest.cpp @@ +233,5 @@ > + jsval* aValue, > + nsTArray<nsCOMPtr<nsIDOMFile> >& aFileList) > +{ > + JSObject* array = JS_NewArrayObject(aCx, aFileList.Length(), nullptr); > + nit: no need for that empty line. @@ +241,5 @@ > + > + for (PRUint32 i = 0; i < aFileList.Length(); ++i) { > + nsCOMPtr<nsIDOMFile> file = aFileList[i]; > + > + jsval value; nit: s/jsval/JS::Value/ @@ +243,5 @@ > + nsCOMPtr<nsIDOMFile> file = aFileList[i]; > + > + jsval value; > + nsresult rv = nsContentUtils::WrapNative(aCx, JS_GetGlobalForScopeChain(aCx), > + file, &NS_GET_IID(nsIDOMFile), &value); By any chance, can't you refactor some code from here and GetFileResult? For example by adding a method taking a nsIDOMFile, a context, and returning a JS::Value? Seems like it could be used at those two places. @@ +250,5 @@ > + } > + } > + > + if (!JS_FreezeObject(aCx, array)) { > + return NS_ERROR_FAILURE; I'm not sure that's needed. @@ +253,5 @@ > + if (!JS_FreezeObject(aCx, array)) { > + return NS_ERROR_FAILURE; > + } > + > + *aValue = OBJECT_TO_JSVAL(array); aValue->setObject(*array); ::: dom/file/ArchiveRequest.h @@ +74,1 @@ > } mOperation; As long as you are here, could you change the syntax too: enum mOperation { GetFilenames, GetFile, GetFiles }; ::: dom/file/test/test_archivereader.html @@ +250,5 @@ > + handleFinish(); > + } > + handle5.onerror = function() { > + ok(false, "ArchiveReader.getFiles() should not return an 'error'"); > + handleFinish(); FYI, it's fine to have tests failing with a test time out. Typically, here, not handling the |error| event would time out the test. Given that you do not expect that to happen, it's fine.

Attachment #657805 - Flags: review?(mounir)

Attachment #657805 - Flags: review?(khuey)

Attachment #657805 - Flags: review+

Mounir Lamouri (:mounir)

Comment 65

•

13 years ago

Andrea, keep in mind that you will need a super-review and a review to push those patches. You can take both my review as super-review or review given that I'm a peer for the module and a super-reviewer. However, you will need a review from another peer or a super-review. Kyle can review the second patch (and I would be sr) but for the first one, given the size, maybe you could ask Jonas to super-review it... unless Kyle wants to review it :)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 66

•

13 years ago

Comment on attachment 657805 [details] [diff] [review] patch 2 - getFiles() Review of attachment 657805 [details] [diff] [review]: ----------------------------------------------------------------- I don't understand why you're freezing the array, but other than that the jsapi usage looks fine.

Attachment #657805 - Flags: review?(khuey) → review+

Andrea Marchesini [:baku]

Assignee

Comment 67

•

13 years ago

> > > > + nsString tmp; > > + nsresult rv = item->GetFilename(tmp); > > + nsCString filename = NS_ConvertUTF16toUTF8(tmp); > > + if (NS_FAILED(rv)) > > Why do you convert to UTF8? because mMimeService = do_GetService(NS_MIMESERVICE_CONTRACTID, &rv); mMimeService->GetTypeFromExtension(aExt, aMimeType); uses nsCString and it's easier (for what I know) to cut a string using RFindChar. > Also, I do not understand why you switched from nsCString to nsString here In general I work with nsString everywhere except: 1. ZipFile - before knowing which encoding I have to use. 2. GetType - because mMimeService works with nsCString. mType is shared with nsDOMFileCC() and there it must be nsAString. > This is not following the coding convention. We usually do: > Class::Class() > : mFoo > , mBar > { > } but it's horrible :) And it's not true this convention is used everywhere. > Also, why not setting mFilename to the correct value here instead of waiting > for it being accessed and waste memory? I don't get this. > > + nsString filenameU; > > + nsresult rv = nsContentUtils::ConvertStringFromCharset(NS_ConvertUTF16toUTF8(mOptions.encoding), > > + mFilename, filenameU); > > + NS_ENSURE_SUCCESS(rv, rv); > > The default behaviour when we fail to convert |mFilename| to the target > charset, we should do |mFilenameU = mFilename;| It's better than having > GetFilename() returning the empty string. GetFIlename doesn't return an empty string. If the string is empty we return NS_ERROR_FAILURE; > That would allow you to have this method returns nsString instead of taking > an out parameter. In this way we cannot return an 'error'. And errors can happen when converting strings. For the rest, I'm going to upload a new patch.

Andrea Marchesini [:baku]

Assignee

Comment 68

•

13 years ago

Attached patch patch 3 - encoding + dictionary (obsolete) — Details — Splinter Review

Attachment #657804 - Attachment is obsolete: true

Attachment #668887 - Flags: review?(mounir)

Andrea Marchesini [:baku]

Assignee

Comment 69

•

13 years ago

Attached patch patch 3 - getFiles() (obsolete) — Details — Splinter Review

Attachment #657805 - Attachment is obsolete: true

Attachment #668891 - Flags: review+

Andrea Marchesini [:baku]

Assignee

Comment 70

•

13 years ago

Attached patch patch 3b - encoding + dictionary (obsolete) — Details — Splinter Review

rebased

Attachment #668887 - Attachment is obsolete: true

Attachment #668887 - Flags: review?(mounir)

Attachment #668895 - Flags: review?(mounir)

Andrea Marchesini [:baku]

Assignee

Comment 71

•

13 years ago

Attached patch patch 3b - getFiles() (obsolete) — Details — Splinter Review

rebased

Attachment #668891 - Attachment is obsolete: true

Attachment #668896 - Flags: review+

Mounir Lamouri (:mounir)

Comment 72

•

13 years ago

Comment on attachment 668895 [details] [diff] [review] patch 3b - encoding + dictionary Review of attachment 668895 [details] [diff] [review]: ----------------------------------------------------------------- I think we should discuss the two points where we are in disagreement. Whether on IRC or here. I do not want to be stubborn but you did not convinced me yet ;) ::: dom/file/ArchiveEvent.cpp @@ +110,5 @@ > nsRefPtr<ArchiveItem> item = mFileList[index]; > > + nsString tmp; > + nsresult rv = item->GetFilename(tmp); > + nsCString filename = NS_ConvertUTF16toUTF8(tmp); What I do not understand here is that you updated this code to use UTF16 file names but you are here converting that UTF16 file name to UTF8 which is doing a lossy conversion. I am not a string expert but that feels wrong to me. ::: dom/file/ArchiveEvent.h @@ +39,5 @@ > // Generate a DOMFile > virtual nsIDOMFile* File(ArchiveReader* aArchiveReader) = 0; > > protected: > + nsString mType; I insist, there is no reason to have this as a nsString given that you get the value from the MimeService which is giving you a nsCString. Converting to nsString increase the size of the object for no valid reason. You are indeed asked to pass a nsAString to create a ArchipeZipItem but you can simply convert at that moment. Could you also change GetType() and SetType() to still take and return nsCString? ::: dom/file/ArchiveReader.h @@ +23,5 @@ > class ArchiveRequest; > > +/** > + * This is the ArchiveReader object > + */ nit: not very useful comment... ::: dom/file/ArchiveRequest.cpp @@ +45,5 @@ > } > > +/** > + * This is the ArchiveRequest object > + */ No need to use javadoc comment here. You can keep the previous one. ::: dom/file/ArchiveRequest.h @@ +17,5 @@ > BEGIN_FILE_NAMESPACE > > +/** > + * This is the ArchiveRequest object > + */ nit: this comment doesn't seem useful. ::: dom/file/ArchiveZipEvent.cpp @@ +21,5 @@ > #endif > > +/** > + * ArchiveZipItem inherits ArchiveItem > + */ nit: no need for that comment here. @@ +70,5 @@ > + > + // Let's use the enconding value for the dictionary > + else { > + nsresult rv = ConvertFilename(); > + NS_ENSURE_SUCCESS(rv, rv); Does that really make sense to return an error code if we faile dto convert the filename to the appropriate encoding? I wonder if that wouldn't be better to simply return |mFilename| (with no encoding change). Seems that failing wouldn't help the developer (what do we show in that case actually? nothing?). On the other hand, returning the raw value we have could clearly help the developer. ::: dom/file/ArchiveZipEvent.h @@ +49,4 @@ > }; > > +/** > + * ArchiveReaderZipEvent inhertis ArchiveReaderEvent nit: no need for that comment if it doesn't tell more than the line it is describing. @@ +54,5 @@ > class ArchiveReaderZipEvent : public ArchiveReaderEvent > { > public: > + ArchiveReaderZipEvent(ArchiveReader* aArchiveReader, > + ArchiveReaderOptions& aOptions); const ArchiveReaderOptions& aOptions ::: dom/file/ArchiveZipFile.cpp @@ +16,5 @@ > #define ZIP_CHUNK 16384 > > +/** > + * Input stream object for zip files > + */ nit: no need for javadoc comments here. You can keep the previous one. ::: dom/file/ArchiveZipFile.h @@ +16,5 @@ > > BEGIN_FILE_NAMESPACE > > +/** > + * ZipFile to DOMFileCC nit: that comment isn't really useful. ::: dom/file/test/test_archivereader_nonUnicode.html @@ +45,5 @@ > + // GetFilename > + var handle = r.getFilenames(); > + isnot(handle, null, "ArchiveReader.getFilenames() cannot be null"); > + handle.onsuccess = function() { > + ok(true, "ArchiveReader.getFilenames() should return a 'success'"); This check isn't necessary. @@ +65,5 @@ > + // GetFilename > + var handle = r.getFilenames(); > + isnot(handle, null, "ArchiveReader.getFilenames() cannot be null"); > + handle.onsuccess = function() { > + ok(true, "ArchiveReader.getFilenames() should return a 'success'"); ditto

Attachment #668895 - Flags: review?(mounir) → review-

Mounir Lamouri (:mounir)

Updated

•

13 years ago

Status: NEW → ASSIGNED

Whiteboard: [need review]

Masatoshi Kimura [:emk]

Reporter

Comment 73

•

13 years ago

(In reply to Mounir Lamouri (:mounir) from comment #72) > What I do not understand here is that you updated this code to use UTF16 > file names but you are here converting that UTF16 file name to UTF8 which is > doing a lossy conversion. I am not a string expert but that feels wrong to > me. UTF16 to UTF8 conversion is not lossy. UTF16 to ASCII conversion is.

Andrea Marchesini [:baku]

Assignee

Comment 74

•

13 years ago

Attached patch patch 3c - encoding + dictionary (obsolete) — Details — Splinter Review

Attachment #668895 - Attachment is obsolete: true

Attachment #669134 - Flags: review?(mounir)

Mounir Lamouri (:mounir)

Comment 75

•

13 years ago

Comment on attachment 669134 [details] [diff] [review] patch 3c - encoding + dictionary Seems like you didn't attached the updated patch :(x

Attachment #669134 - Flags: review?(mounir)

Andrea Marchesini [:baku]

Assignee

Comment 76

•

13 years ago

Attached patch patch 3c - encoding + dictionary (obsolete) — Details — Splinter Review

Attachment #669134 - Attachment is obsolete: true

Attachment #669142 - Flags: review?(mounir)

Mounir Lamouri (:mounir)

Comment 77

•

13 years ago

Comment on attachment 669142 [details] [diff] [review] patch 3c - encoding + dictionary Review of attachment 669142 [details] [diff] [review]: ----------------------------------------------------------------- Generally speaking, I think it's not useful to add class description comments if you don't actually add information that isn't in the class name. You have added a couple of comments like that. Anyway, r=me with the following things fixed: ::: dom/file/ArchiveEvent.cpp @@ +31,5 @@ > + if (mType.IsEmpty()) { > + return NS_LITERAL_CSTRING("binary/octet-stream"); > + } > + > + return mType; Why did you change that? The previous code was fine. ::: dom/file/ArchiveRequest.cpp @@ -44,5 @@ > return NS_OK; > } > > -/* ArchiveRequest */ > - You could have kept it. Just not with a Java Doc style comment. ::: dom/file/ArchiveZipEvent.h @@ +54,5 @@ > class ArchiveReaderZipEvent : public ArchiveReaderEvent > { > public: > + ArchiveReaderZipEvent(ArchiveReader* aArchiveReader, > + ArchiveReaderOptions& aOptions); const ArchiveReaderOptions& ::: dom/file/nsIDOMArchiveReader.idl @@ +15,5 @@ > }; > > +dictionary ArchiveReaderOptions > +{ > + DOMString encoding = "windows-1252"; Actually, could you add a comment describing what |ArchiveReaderOptinos| is for and why "windows-1252" is the default value of |encoding|? The first comment should be for |dictionary ArchiveReaderOptions| and the second for |encoding|.

Attachment #669142 - Flags: review?(mounir) → review+

Andrea Marchesini [:baku]

Assignee

Comment 78

•

13 years ago

Attached patch first patch - encoding + dictionary (obsolete) — Details — Splinter Review

Attachment #669142 - Attachment is obsolete: true

Attachment #669146 - Flags: review+

Andrea Marchesini [:baku]

Assignee

Comment 79

•

13 years ago

Attached patch second patch - getFiles() (obsolete) — Details — Splinter Review

rebased

Attachment #668896 - Attachment is obsolete: true

Attachment #669149 - Flags: review+

Andrea Marchesini [:baku]

Assignee

Comment 80

•

13 years ago

https://tbpl.mozilla.org/?tree=Try&rev=0c7376d7f03a green on try

Keywords: checkin-needed

Mounir Lamouri (:mounir)

Comment 81

•

13 years ago

This is only the first of the two patches, right?

Andrea Marchesini [:baku]

Assignee

Comment 82

•

13 years ago

both. the second has been renamed "try: -b do -p all -u all -t none"

Mounir Lamouri (:mounir)

Comment 83

•

13 years ago

I was going to land this but I forgot something... You need super-reviews. You can take r/sr=me for the first patch which means you can take my review as a super-review or a review, depending on the second reviewer you find. For the second patch, it's r=khuey sr=mounir I guess Jonas or Johnny could sr the first patch.

Mounir Lamouri (:mounir)

Updated

•

13 years ago

Keywords: checkin-needed

Andrea Marchesini [:baku]

Assignee

Updated

•

13 years ago

Attachment #669146 - Flags: superreview?(jonas)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 84

•

13 years ago

Comment on attachment 669146 [details] [diff] [review] first patch - encoding + dictionary Review of attachment 669146 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/file/ArchiveEvent.cpp @@ +28,5 @@ > nsCString > ArchiveItem::GetType() > { > + if (mType.IsEmpty()) { > + return NS_LITERAL_CSTRING("binary/octet-stream"); This isn't a new problem in this patch, but I wanted to point it out. You shouldn't be returning string objects. At best that will cause a bunch of reference counting of nsStringBuffers which are all threadsafe atomic refcounts. You don't need to fix this in this patch, but it would be nice to get fixed. ::: dom/file/ArchiveReader.cpp @@ +56,5 @@ > } > > + // Extra param is an object > + if (aArgc > 1 && NS_FAILED(mOptions.Init(aCx, &aArgv[1]))) { > + return NS_ERROR_INVALID_ARG; If the mOptions.Init call fails, you should propagate the error code that it returns, rather than always returning INVALID_ARG. ::: dom/file/nsIDOMArchiveReader.idl @@ +15,5 @@ > }; > > +/* This dictionary is the optional argument for the > + * ArchiveReader constructor: > + * var a = new ArchiveReader({ encoding: "iso-8859-1" }); */ Shouldn't this say "new ArchiveReader(blob, { encoding: ...."?

Attachment #669146 - Flags: superreview?(jonas) → superreview+

Andrea Marchesini [:baku]

Assignee

Comment 85

•

13 years ago

Attached patch first patch - encoding + dictionary - 2 — Details — Splinter Review

Attachment #669146 - Attachment is obsolete: true

Attachment #678798 - Flags: superreview+

Attachment #678798 - Flags: review+

Andrea Marchesini [:baku]

Assignee

Comment 86

•

13 years ago

Attached patch second patch - getFiles() - 2 — Details — Splinter Review

Attachment #669149 - Attachment is obsolete: true

Attachment #678803 - Flags: superreview+

Attachment #678803 - Flags: review+

Andrea Marchesini [:baku]

Assignee

Updated

•

13 years ago

Keywords: checkin-needed

(PTO June 19-July 7) Ryan VanderMeulen

Comment 87

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5bc6448929cb https://hg.mozilla.org/integration/mozilla-inbound/rev/36a99706bcbe

Flags: in-testsuite+

Keywords: checkin-needed

Ed Morley [:emorley]

Comment 88

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/5bc6448929cb https://hg.mozilla.org/mozilla-central/rev/36a99706bcbe

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla19

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly 13 years ago Andrea Marchesini [:baku] 7.47 KB, patch	emk : feedback+	Details \| Diff \| Splinter Review
Failing example for the record 13 years ago Masatoshi Kimura [:emk] 528 bytes, application/zip		Details
Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly 13 years ago Andrea Marchesini [:baku] 9.39 KB, patch	emk : feedback+	Details \| Diff \| Splinter Review
Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly 13 years ago Andrea Marchesini [:baku] 32.58 KB, patch	emk : feedback+	Details \| Diff \| Splinter Review
Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly 13 years ago Andrea Marchesini [:baku] 32.63 KB, patch		Details \| Diff \| Splinter Review
Bug 781425 - ArchiveReader doesn't handle non-ASCII non-UTF-8 filenames correctly 13 years ago Andrea Marchesini [:baku] 32.64 KB, patch		Details \| Diff \| Splinter Review
patch 1 - encoding + dictionary 13 years ago Andrea Marchesini [:baku] 38.49 KB, patch		Details \| Diff \| Splinter Review
patch 1b - getFiles 13 years ago Andrea Marchesini [:baku] 5.87 KB, patch	emk : feedback-	Details \| Diff \| Splinter Review
patch 2 - encoding + dictionary 13 years ago Andrea Marchesini [:baku] 34.58 KB, patch	mounir : review- emk : feedback+ mounir : feedback+	Details \| Diff \| Splinter Review
patch 2 - getFiles() 13 years ago Andrea Marchesini [:baku] 7.79 KB, patch	mounir : review+ khuey : review+ emk : feedback+	Details \| Diff \| Splinter Review
patch 3 - encoding + dictionary 13 years ago Andrea Marchesini [:baku] 47.01 KB, patch		Details \| Diff \| Splinter Review
patch 3 - getFiles() 13 years ago Andrea Marchesini [:baku] 10.09 KB, patch	baku : review+	Details \| Diff \| Splinter Review
patch 3b - encoding + dictionary 13 years ago Andrea Marchesini [:baku] 46.95 KB, patch	mounir : review-	Details \| Diff \| Splinter Review
patch 3b - getFiles() 13 years ago Andrea Marchesini [:baku] 10.02 KB, patch	baku : review+	Details \| Diff \| Splinter Review
patch 3c - encoding + dictionary 13 years ago Andrea Marchesini [:baku] 46.95 KB, patch		Details \| Diff \| Splinter Review
patch 3c - encoding + dictionary 13 years ago Andrea Marchesini [:baku] 46.62 KB, patch	mounir : review+	Details \| Diff \| Splinter Review
first patch - encoding + dictionary 13 years ago Andrea Marchesini [:baku] 46.83 KB, patch	baku : review+ sicking : superreview+	Details \| Diff \| Splinter Review
second patch - getFiles() 13 years ago Andrea Marchesini [:baku] 10.18 KB, patch	baku : review+	Details \| Diff \| Splinter Review
first patch - encoding + dictionary - 2 13 years ago Andrea Marchesini [:baku] 30.32 KB, patch	baku : review+ baku : superreview+	Details \| Diff \| Splinter Review
second patch - getFiles() - 2 13 years ago Andrea Marchesini [:baku] 10.31 KB, patch	baku : review+ baku : superreview+	Details \| Diff \| Splinter Review