File name not parsed correctly while trying to save page via "Save link"

VERIFIED FIXED

Status

VERIFIED FIXED
8 years ago
8 years ago

People

(Reporter: ahoza, Assigned: azakai)

Tracking

Details

(Whiteboard: [needs-landing])

Attachments

(5 attachments, 5 obsolete attachments)

(Reporter)

Description

8 years ago
Created attachment 502450 [details]
Screenshot

BuildID: Mozilla /5.0 (Android;Linux armv7l;rv:2.0b9pre) Gecko/20110109
Firefox/4.0b9pre Fennec /4.0b4pre 
and
Mozilla /5.0 (Maemo;Linux armv7l;rv:2.0b9pre) Gecko/20110109 Firefox/4.0b9pre
Fennec/4.0b4pre

Steps to reproduce:

1. Visit yahoo.com (desktop version)
2. Try to save a link from "News" section

Actual results:
A dialog with the following text: "<file> could not be saved, because an error occurred. Try saving to a different location" is displayed (see screenshot)

And I believe this is due to the fact that browser parses a name that starts with a special character (e.g.: *)

Expected results:
Page is saved.
(Reporter)

Comment 1

8 years ago
Created attachment 502451 [details]
Screenshot

I've attached wrong screenshot before. This is the correct one
tracking-fennec: --- → ?

Updated

8 years ago
Assignee: nobody → azakai

Updated

8 years ago
tracking-fennec: ? → 2.0+
(Assignee)

Comment 2

8 years ago
Created attachment 506938 [details]
test page
(Assignee)

Comment 3

8 years ago
contentUtils.js::validateFilename()

http://mxr.mozilla.org/mozilla-central/source/toolkit/content/contentAreaUtils.js#920

is where the intended filename is validated. There is platform-specific code for Windows and OS X there, to remove invalid characters. Nothing for Linux/Android though, so if the underlying filesystem is picky, things will fail as they do here.

It seems the simplest thing to do would be to add a Fennec section in that function, to remove all dangerous characters. Which, to ensure success on all possible filesystems for Linux/Android, means sanitizing basically everything but [a-zA-Z0-9_.].

Does that make sense? If so, what would be the proper way to detect that we are Fennec in that function? (that is, what value of navigator.appVersion indicates Fennec, or is there some better way?)

Comment 4

8 years ago
alon, why are we passing the full url as the filename?  Shouldn't the filename be something like "medicaid-and-mental-health-how-can-fraud-be-contained.html"?
(Assignee)

Comment 5

8 years ago
I would guess that yahoo has a weird url there (yahoo.com/news/http://blah). (I can't easily check since I get force-bumped to the mobile site which doesn't have that issue. How do I get to the normal site on device? Phony?)

The normal behavior is as you describe, and I see that normal behavior on all other websites I check (and the test page).

(In any case invalid characters can appear anywhere in a file url, I'm surprised we haven't run into this sooner.)

Comment 6

8 years ago
maybe the lowest risk is to add an else for linux/android.  Alternatively, maybe the nsLocalFile already has this logic?
We already have platform specific code for handing filename validation on maemo and android for PDFs here:

http://mxr.mozilla.org/mobile-browser/source/chrome/content/common-ui.js#248

We could move some of this logic into the validation routines in contentUtils.js::validateFilename()
(Assignee)

Comment 8

8 years ago
(In reply to comment #6)
> maybe the lowest risk is to add an else for linux/android.  Alternatively,
> maybe the nsLocalFile already has this logic?

I can't find anything related in nsLocalFile, but it does seem like a logical place. It seems though that the relevant code is in contentUtils.js as mentioned above.

(In reply to comment #7)
> We already have platform specific code for handing filename validation on maemo
> and android for PDFs here:
> 
> http://mxr.mozilla.org/mobile-browser/source/chrome/content/common-ui.js#248
> 
> We could move some of this logic into the validation routines in
> contentUtils.js::validateFilename()

Sounds good.
(Assignee)

Comment 9

8 years ago
Created attachment 507323 [details] [diff] [review]
m-c part

mozilla-central patch that sanitizes filenames on android and maemo.

The patch errs on the side of caution and sanitizes quite a lot of characters. A possible downside is that it sanitizes non-latin characters, so people in other locales might not be happy. But, as we are seeing, files cannot be created with their charactersets anyhow on common mobile devices. So this patch seems the best thing for now.
Attachment #507323 - Flags: feedback?
(Assignee)

Comment 10

8 years ago
Created attachment 507325 [details] [diff] [review]
m-b part

mobile-browser patch that simplifies code that we no longer need.

1. A maemo-only part is removed. It is no longer needed since we earlier call ContentAreaUtils.getDefaultFileName, and that calls validateFilename, which the m-c patch adds to.

2. Remove the bit about trying the default filename and using another if it fails. Since the filename is sanitized, no error should occur there. I tested on the link in bug 620097 to make sure all is well.
Attachment #507325 - Flags: feedback?
(Assignee)

Comment 11

8 years ago
Note: the two patches are meant to make sense together.
(Assignee)

Comment 12

8 years ago
Comment on attachment 507323 [details] [diff] [review]
m-c part

Neil, I believe you are the owner of toolkit/content, in which this patch resides - can you please review it?

The patch makes sure that filenames on Android/Maemo will not contain characters that are potentially invalid on mobile filesystems.
Attachment #507323 - Flags: feedback? → review?(enndeakin)
(Assignee)

Updated

8 years ago
Attachment #507325 - Flags: feedback? → review?(mark.finkle)
Comment on attachment 507323 [details] [diff] [review]
m-c part

>diff --git a/toolkit/content/contentAreaUtils.js b/toolkit/content/contentAreaUtils.js
>+    aFileName = aFileName.replace(/[^a-zA-z0-9_.]/g, "_").replace(/\^/g, "_");

You can just do this one replace call. (^ can just be escaped)

>+    if (aFileName.replace(/_/g, "").length <= aFileName.length/2) {
>+      // Preserve a suffix, if there is one
>+      if (aFileName.indexOf(".") == -1)
>+        aFileName = "download";

This filename will need to be localized.

>+      else
>+        aFileName = "download." + aFileName.replace(/_/g, "").split(".").slice(-1)[0];

But you removed all the '.' earlier.
Attachment #507323 - Flags: review?(enndeakin) → review-
(Assignee)

Comment 14

8 years ago
Created attachment 508635 [details] [diff] [review]
m-c part (v2)

Thanks for the comments, here's an updated patch.

> >+    aFileName = aFileName.replace(/[^a-zA-z0-9_.]/g, "_").replace(/\^/g, "_");
> 
> You can just do this one replace call. (^ can just be escaped)
> 

Done. (I was surprised to find that it can't be placed in the [], even escaped,
but it works fine if escaped outside the [], joined by an or.)

> >+    if (aFileName.replace(/_/g, "").length <= aFileName.length/2) {
> >+      // Preserve a suffix, if there is one
> >+      if (aFileName.indexOf(".") == -1)
> >+        aFileName = "download";
> 
> This filename will need to be localized.
> 

I think we purposefully should not localize this, since
in all likelihood that would contain invalid characters.
We can sanitize the localized name, but I believe that
would be much more confusing than just using 'download'.
I added a comment in the patch.

> >+      else
> >+        aFileName = "download." + aFileName.replace(/_/g, "").split(".").slice(-1)[0];
> 
> But you removed all the '.' earlier.

'.' were not removed. The patch removes everything except for |a-zA-z0-9_.|. So the filename suffix, if there was one, remains, and this code uses it if so.
Attachment #507323 - Attachment is obsolete: true
Attachment #508635 - Flags: review?(enndeakin)
(In reply to comment #14)
> I think we purposefully should not localize this, since
> in all likelihood that would contain invalid characters.
> We can sanitize the localized name, but I believe that
> would be much more confusing than just using 'download'.
> I added a comment in the patch.

It would need to be localized as this is text that would be displayed to the user.

I don't follow what you mean by the localized name containing invalid characters. The string can just be a translated string 'Downloaded File' or somesuch.

> '.' were not removed. The patch removes everything except for |a-zA-z0-9_.|. So
> the filename suffix, if there was one, remains, and this code uses it if so.

OK, I misread the earlier line.
(Assignee)

Comment 16

8 years ago
(In reply to comment #15)
> (In reply to comment #14)
> > I think we purposefully should not localize this, since
> > in all likelihood that would contain invalid characters.
> > We can sanitize the localized name, but I believe that
> > would be much more confusing than just using 'download'.
> > I added a comment in the patch.
> 
> It would need to be localized as this is text that would be displayed to the
> user.
> 
> I don't follow what you mean by the localized name containing invalid
> characters. The string can just be a translated string 'Downloaded File' or
> somesuch.
> 

Well, my concern is that we translate 'download' to something like 'télécharger' (google translate says that is 'download' in French, apologies to French speakers if that is wrong ;) But that translation contains two potentially problematic characters on some mobile filesystems, and exactly for that reason, if the original filename was télécharger.pdf, then the patch would sanitize it to t_l_charger.pdf.

So we can use a translated name, but we'd need to sanitize it, and t_l_charger.pdf seems more confusing than download.pdf. And of course things would be much worse if we translated to Hindi or Chinese, as all the characters would need to be sanitized, so it would be something like _______.pdf.
I might be mistaken, but I think the "download" rename applies only to the filename, not the name of the entry in the download manager, if that makes any difference.
Comment on attachment 507325 [details] [diff] [review]
m-b part

Looks good for m-b
Attachment #507325 - Flags: review?(mark.finkle) → review+
Does the filename never get displayed to the user? Would someone not be expected to locate the file to open it with some other application?

It would best to ask some localization related people just in case.
Neil, lets do the localization in a follow up bug.  Can you review the patch for correctness?
Neil, review ping?

also does doing the localization as a follow up bug work for you?
Whiteboard: [needs-review (enndeakin)]
(Assignee)

Comment 22

8 years ago
Created attachment 515979 [details] [diff] [review]
m-c part v3

New version with some IRC comment fixes.
Attachment #508635 - Attachment is obsolete: true
Attachment #515979 - Flags: review?(enndeakin)
Attachment #508635 - Flags: review?(enndeakin)

Comment 23

8 years ago
Two things:
1. This is absolutely terrible for people in non-ascii locales. We should blacklist bad characters, but even then, there should be very few bad characters since we're suppose to support utf8.
2. Azakai can't reproduce this bug. Ahoza, what device are you testing on?
(Assignee)

Comment 24

8 years ago
> Azakai can't reproduce this bug.

I wasn't 100% sure if I remembered whether I could or not. I rechecked now, and I *can* reproduce this on a Galaxy S.

The problem is finding a list of all filesystems used on mobile devices, and the valid characters for them. Or, we can do what the m-b patch above undos, which is to try one way, and if a disk error occurs, try a fallback - but that leads to ugly code that we will need in various places (which is why the m-b patch above wants to remove that ugly code).

Comment 25

8 years ago
(In reply to comment #24)
> > Azakai can't reproduce this bug.
> 
> I wasn't 100% sure if I remembered whether I could or not. I rechecked now, and
> I *can* reproduce this on a Galaxy S.
> 
> The problem is finding a list of all filesystems used on mobile devices, and
> the valid characters for them. Or, we can do what the m-b patch above undos,
> which is to try one way, and if a disk error occurs, try a fallback - but that
> leads to ugly code that we will need in various places (which is why the m-b
> patch above wants to remove that ugly code).

I recommend blocking |\?*<":>+[]/ and all control characters, as per http://en.wikipedia.org/wiki/Filename#Comparison_of_file_name_limitations .
(In reply to comment #25)

> I recommend blocking |\?*<":>+[]/ and all control characters, as per
> http://en.wikipedia.org/wiki/Filename#Comparison_of_file_name_limitations .

I agree with this plan, but does that handle the case where the whole filename is blocked? We already block characters based on OS limitations, so this would be no different.
(Assignee)

Comment 27

8 years ago
That's a separate issue. First question is what characters are valid. Second question is what to do if most characters were in fact invalid.

For the second question, I think we should default to "download" for now, since there isn't much else to do, and we can investigate internationalization stuff later. The first question is harder though.
(Assignee)

Comment 28

8 years ago
Created attachment 516032 [details] [diff] [review]
m-c part, mwu style

Here is a version that uses mwu's approach, of sanitizing characters based on the best known list of invalid characters.

Benefit: Users can save files with filenames in their own languages. That might work on some phones.

Detriment: On the Galaxy S, that will actually fail - saving a filename with '*' will work with this patch, but characters from other languages will not. The same thing happened in bug 620097 (so in particular, using this version of the patch means that we must still keep the m-b hacks that the m-b patch attached to this bug here wanted to get rid of.)

So we do leave some filenames on which we fail. But, at least we don't entirely stop users from saving files with titles in their languages, which might work on other devices than the Galaxy S.

I do not know what is the better approach here. This is a policy question. What should be done?
(Assignee)

Comment 29

8 years ago
Comment on attachment 515979 [details] [diff] [review]
m-c part v3

No need to review this until a decision is made as to which approach we should do.
Attachment #515979 - Flags: review?(enndeakin)
(Assignee)

Comment 30

8 years ago
Comment on attachment 516032 [details] [diff] [review]
m-c part, mwu style

With mwu's help, further investigating shows that I was wrong in comment #28. The 'mwu style' patch *does* work properly, it fixes exactly the dangerous characters.

Patch is ready for review.
Attachment #516032 - Flags: review?(enndeakin)
(Assignee)

Updated

8 years ago
Attachment #515979 - Attachment is obsolete: true

Comment 31

8 years ago
(In reply to comment #28)
> Here is a version that uses mwu's approach, of sanitizing characters based on
> the best known list of invalid characters.
> 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=fs/fat/namei_vfat.c;h=f88f752babd9c4e2e1054fde3dd7306ec6e457fe;hb=HEAD#l195 for a definitive list of invalid characters on VFAT.
(Assignee)

Comment 32

8 years ago
Created attachment 516103 [details] [diff] [review]
m-c part v4

Fixed a few small things.
Attachment #516032 - Attachment is obsolete: true
Attachment #516103 - Flags: review?(enndeakin)
Attachment #516032 - Flags: review?(enndeakin)
Comment on attachment 516103 [details] [diff] [review]
m-c part v4

>+    var dangerousChars = "*?<>|\":/\\[];,+=";
>+    var dangerousCodes = [];
>+    for (var i = 0; i < 0x20; i++)
>+      dangerousCodes.push(i);
>+    var regexp = new RegExp("[" +
>+                            dangerousChars.split("").map(function(str) {
>+                              return str.charCodeAt(0);
>+                            }).concat(dangerousCodes).map(function(code) {
>+                              return "\\x" + (code < 16 ? "0" : "") +
>+                                      code.toString(16);
>+                            }).join("") + 
>+                            "]", "g");
>+    aFileName = aFileName.replace(regexp, "_");

I couldn't really understand what this code was doing, but it seems that you just want to do:

var outstr = "";
const dangerChars = "*?<>|\":/\\[];,+=";
for (i = 0; i < str.length; i++)
  outstr += str.charCodeAt(i) >= 32 && !(dangerChars.indexOf(str[i]) >= 0) ? str[i] : "_";


>+
>+    // Last character should not be a space
>+    if (aFileName[aFileName.length-1] == " ")
>+      aFileName = aFileName.substr(0, aFileName.length-1) + "_";

You may just want to use trim().
(Assignee)

Comment 34

8 years ago
Created attachment 516331 [details] [diff] [review]
m-c part v5

Ok, made those changes.
Attachment #516103 - Attachment is obsolete: true
Attachment #516331 - Flags: review?(enndeakin)
Attachment #516103 - Flags: review?(enndeakin)
Attachment #516331 - Flags: review?(enndeakin) → review+
Whiteboard: [needs-review (enndeakin)] → [needs-landing]
pushed http://hg.mozilla.org/mozilla-central/rev/0b6f92a2cc74 and https://hg.mozilla.org/mobile-browser/rev/8873d4419281
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED

Comment 37

8 years ago
VERIFIED FIXED on:
Build ID: Mozilla /5.0 (Android;Linux armv7l;rv:2.0b13pre) Gecko/20110315
Firefox/4.0b13pre Fennec /4.0b6pre
Device: HTC Desire
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.