Closed Bug 1366645 Opened 3 years ago Closed 4 months ago

Consider appending extensions from extraMimeEntries to the list obtained from OS

Categories

(Firefox :: File Handling, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 78
Tracking Status
firefox78 --- fixed

People

(Reporter: jessica, Assigned: Gijs)

References

Details

(Whiteboard: tpi:-)

Attachments

(2 files)

We call `GetMIMEInfoFromOS`[1] from `GetFromTypeAndExtension`, and each platform implements this using the OS API to get the extension list from mime type or mime type from extension. However, sometimes we don't get a complete extension list from the OS, so we should consider having a predefined extension list to add it to the list returned by OS.


[1] http://searchfox.org/mozilla-central/rev/79f625ac56d936ef5d17ebe13a123b25efee4241/uriloader/exthandler/nsExternalHelperAppService.h#87
It looks like we try to do that with `FillHandlerInfo()` [1], but there is no record about "image/jpeg" in the datastore.

[1] http://searchfox.org/mozilla-central/rev/79f625ac56d936ef5d17ebe13a123b25efee4241/uriloader/exthandler/nsExternalHelperAppService.cpp#2628
Whiteboard: tpi:-
Blocks: 1367490
Summary: Consider having a predefined extension list for a mime type → Consider appending extensions from extraMimeEntries to the list obtained from OS
Blocks: 1619154
See Also: → 1619203

Hi Boris,

I see you the module owner for uriloader.
What do you think about modifying nsExternalHelperAppService.cpp:GetFromTypeAndExtension so it always check if some extensions are present for a given mime type, even if the OS returns something (that might be incomplete)? i.e. change the logic so step 3 1 is always executed, to add any common file extension missing from step 2.

This would help fix some file extensions missing for a given mime type when constructing a filter for accept attribute for input type=file 2. This would fix bug 1367490, bug 1619153 (and probably a few others I'm missing).

Since this code is used for other purposes in other part of the application, I would like to be sure this modification won't break anything (and I'm unsure how well this is covered by automatic testing).

The algorithm is described in 3 but I'm not sure if this is up to date.

Flags: needinfo?(bzbarsky)

(removing the dependency on bug 1619154 because image/* video/* and audio/* filters are handled differently).

No longer blocks: 1619154
Blocks: 1619154
No longer blocks: 1619154

Adding the extensions in our built-in list to the "valid extensions for this type" list seems OK to me... Gijs might also have thoughts on this.

Flags: needinfo?(bzbarsky) → needinfo?(gijskruitbosch+bugs)

Yes, we should do this. It was also discussed in bug 1602220, which AIUI would be fixed if we always considered the "extra" entries.

The tricky thing, from memory when I last looked at this, is that for some purposes we're really OK with converting mimetype->extension or vice versa, and in some cases we actually want default handler info etc. In the former case, the "extra" entries have enough information, but not in the latter. We should be careful not to regress things there (e.g. without checking the consumers, perhaps there are cases where we handle the dummy mime info object that we somehow always return (!?) differently), as I did when causing bug 1609466 by making us prioritize file extensions over mimetypes on Windows in bug 1597985.

Arnaud, are you interested in providing a patch?

Blocks: 1602220
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(arnaud.bienner)

(In reply to :Gijs (he/him) from comment #6)

Arnaud, are you interested in providing a patch?

Yes! :)
If it's not too urgent: I work on this on my spare time, but I don't have much spare time those days.
I can probably get something ready for sometime next week though.

Flags: needinfo?(arnaud.bienner)

(In reply to Arnaud Bienner from comment #7)

(In reply to :Gijs (he/him) from comment #6)

Arnaud, are you interested in providing a patch?

Yes! :)

Great!

If it's not too urgent: I work on this on my spare time, but I don't have much spare time those days.
I can probably get something ready for sometime next week though.

Of course. Next week sounds great. It's not "needs fixing tomorrow" urgent, though equally if we can come up with something for the 75/76 cycles that would be great, because per bug 1619567 the dupe of 1602220, this is now affecting e.g. downloads of webp images on some systems (which then get saved as "jpg" files), which is unfortunate :-(

Having a deeper look at this, I'm not sure fixing this bug will fix bug 1602220.
As you suggested in bug 1602220 comment 9 (suggestion 3) bug 1602220 could be fixed by checking the extra entries when nothing is returned from the OS. This sounds good to me for bug 1602220, but that wouldn't always help to fix bug 1367490 and co: we might have retrieved something from the OS, but with an incomplete list of extension.

WRT to bug 1367490 (everything that relates to accept attribute for <input type=file>) I'm starting to think it might be better to define a new method (e.g. GetFromType) instead of using GetFromTypeAndExtension so we could implement the logic we want, without impacting other part of the code where GetFromTypeAndExtension is called.

If we head into that direction, we could close this bug and have bug 1367490 and bug 1602220 fixed separately.

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Arnaud Bienner from comment #9)

Having a deeper look at this, I'm not sure fixing this bug will fix bug 1602220.
As you suggested in bug 1602220 comment 9 (suggestion 3) bug 1602220 could be fixed by checking the extra entries when nothing is returned from the OS. This sounds good to me for bug 1602220, but that wouldn't always help to fix bug 1367490 and co: we might have retrieved something from the OS, but with an incomplete list of extension.

It seems fine to append the list of extensions; I'm pretty OK with not having the user be able to "override" (via the OS, in some obscure way) having .jpg and .jpeg and .webp as extensions for their respective mimetypes. That would address both of these cases, right?

WRT to bug 1367490 (everything that relates to accept attribute for <input type=file>) I'm starting to think it might be better to define a new method (e.g. GetFromType) instead of using GetFromTypeAndExtension so we could implement the logic we want, without impacting other part of the code where GetFromTypeAndExtension is called.

Why do you think we need a separate method? Don't get me wrong, I am not opposed in principle, but I'd like to understand why you think we shouldn't do this in the existing code; what's the benefit of having a new method?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(arnaud.bienner)

(In reply to :Gijs (he/him) from comment #10)
My understanding of bug 1602220 is that we call GetFromTypeAndExtension("webp", ".jpeg") and that if fails to find "webp" so it uses ".jpeg" to retrieve the mimetype.
Once this is done, the mime type info is "image/jpeg" and then the step 3 (check extra entries) won't help because we overrode the mimetype.
Are you suggesting to remove this part of the code which checks for mimetype based on file extension if none is found? Or to move it down so we check the extra entries prior to this?

Flags: needinfo?(arnaud.bienner) → needinfo?(gijskruitbosch+bugs)

(In reply to Arnaud Bienner from comment #11)

(In reply to :Gijs (he/him) from comment #10)
My understanding of bug 1602220 is that we call GetFromTypeAndExtension("webp", ".jpeg") and that if fails to find "webp" so it uses ".jpeg" to retrieve the mimetype.
Once this is done, the mime type info is "image/jpeg" and then the step 3 (check extra entries) won't help because we overrode the mimetype.
Are you suggesting to remove this part of the code which checks for mimetype based on file extension if none is found? Or to move it down so we check the extra entries prior to this?

I'm suggesting that we add/merge the extras info after step (1), before we check the extra entries, and ensure found is true if the mimetype is known in that list of entries. Then we will not enter the "look this up by the extension" path in this case, because the mimetype is known.

This may be equivalent to moving up the "extras" check; I'm not 100% sure without looking at it in a lot more detail...

To be clear, I could also be convinced that we want a different API surface to make this a more straightforward change. It does feel like GetFromTypeAndExtension is doing a lot of different work at the moment, and perhaps if we clarified what the consumers actually wanted (e.g. "is extension X acceptable for mimetype Y", or "what is the primary extension of mimetype Y" or "what app should I use to handle this file"), that'd make it easier to make changes and reason about how this code behaves, and that'd be a good thing. Right now it tries to do all of these things at once (by returning a mimetype object which professes to answer all these questions), and the result is not great... We could transition consumers over gradually...

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(arnaud.bienner)

(In reply to :Gijs (he/him) from comment #12)

I'm suggesting that we add/merge the extras info after step (1), before we check the extra entries, and ensure found is true if the mimetype is known in that list of entries. Then we will not enter the "look this up by the extension" path in this case, because the mimetype is known.

This may be equivalent to moving up the "extras" check; I'm not 100% sure without looking at it in a lot more detail...

That sounds like it, thanks for the clarification.
Now things are clearer, I will try to implement something that follows that plan: merge what we retrieve from the OS with extras, then perform "look this up by extension" if nothing is found.
This would fix both bug 1602220 and bug 1367490.

It does feel like GetFromTypeAndExtension is doing a lot of different work at the moment

Yes, that is my feeling too (thus my feeling we might need a new implementation).

Flags: needinfo?(arnaud.bienner)

Hello Arnaud, are you still in a position to work on this bug? I'm aware the world is a bit, uh, different, than it was a month or so ago, so no worries if not - but if you have the chance to let us know that would be helpful. :-)

Flags: needinfo?(arnaud.bienner)

Hi,

Life is different for me too :)
I still hope I can work on this sometime soon, but don't know exactly when that would be, so feel free to take over this bug if you wish (just let me know if you do so I don't start working on this).

Flags: needinfo?(arnaud.bienner)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a2f2407cc345b4e6335ce9552ea08f5d4b7e5ea

Wonder what breaks, if anything. I'm still seeing this get me .jpeg.webp on macOS, and need to investigate if we can fix, that, too, but it's better than the status quo so...

Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Component: Widget → File Handling
Product: Core → Firefox

Made it a bit more aggressive, to fix more of the deps. Maybe more things will break?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=af9ab24c4a5dfc638e33f724aad3ddd5a88b4a4a

Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/666659de5856
check builtin extra entries for mimetype information before falling back to the extension, r=mak
https://hg.mozilla.org/integration/autoland/rev/cdfedc4707bd
overwrite image file extensions when initializing file info, r=mak
Flags: needinfo?(gijskruitbosch+bugs)
Blocks: 1617623
Blocks: 1638480
Flags: needinfo?(gijskruitbosch+bugs)
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/6c09436a7125
check builtin extra entries for mimetype information before falling back to the extension, r=mak
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 78
You need to log in before you can comment on or make changes to this bug.