When privacy.spoof_english is true, don't reveal locale by charset fallback
Categories
(Core :: DOM: HTML Parser, enhancement, P3)
Tracking
()
People
(Reporter: arthur, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [tor 20025][fingerprinting][fp-triaged])
Reporter | ||
Comment 1•6 years ago
|
||
Comment 2•6 years ago
|
||
Comment 3•6 years ago
|
||
Updated•6 years ago
|
Comment 4•6 years ago
|
||
This appears to be controlled by https://searchfox.org/mozilla-central/source/dom/encoding/FallbackEncoding.h and seems fairly straightforward to avoid doing any locale-based decisions if spoof_english is true. The hard part is probably writing the test.
Updated•6 years ago
|
Comment 5•6 years ago
|
||
(In reply to Tom Ritter [:tjr] from comment #4)
This appears to be controlled by https://searchfox.org/mozilla-central/source/dom/encoding/FallbackEncoding.h and seems fairly straightforward to avoid doing any locale-based decisions if spoof_english is true.
More specifically, if spoof_english
is set, setting mFallback
to WINDOWS_1252_ENCODING
should take precedence over this block:
https://searchfox.org/mozilla-central/source/dom/encoding/FallbackEncoding.cpp#69-79
Comment 6•5 years ago
|
||
Tor Ticket: https://trac.torproject.org/projects/tor/ticket/20025
When a document, or server, fails to set a charset, the page falls back to the following setting: General>Language and Appearance>Fonts and Colors>Advanced>Text Encoding for Legacy Content. The default is to fallback to the "current locale" based on your "app language" (the pref value is blank). For non en-US app languages, this can then create entropy depending on the language. You can test this on https://hsivonen.com/test/moz/check-charset.htm. e.g. japanese will reveal Shift_JIS
, traditional chinese will reveal Big5
, etc.
Solution: set intl.charset.fallback.override
= windows-1252
when privacy.spoof_english == 2
, and reset it when privacy.spoof_english !== 2
Comment 7•5 years ago
|
||
(In reply to Simon Mainey from comment #6)
Solution: set
intl.charset.fallback.override
=windows-1252
when privacy.spoof_english ==2
, and reset it when privacy.spoof_english !==2
A less brittle solution would be to check privacy.spoof_english
where intl.charset.fallback.override
is checked and make privacy.spoof_english
take precedence.
Updated•5 years ago
|
Comment 8•5 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) (not reading bugmail until 2020-08-03) from comment #7)
A less brittle solution would be to check
privacy.spoof_english
whereintl.charset.fallback.override
is checked and makeprivacy.spoof_english
take precedence.
Any chance on nudging this now intl.charset.fallback.override
has been deprecated (Bug 1603712)
Comment 9•5 years ago
|
||
AFAICT bug 1603712 made this bug moot except for display of non-ASCII file paths in FTP directory listings. I'm unsure if it's possible for Web sites to scrape those. Without testing, it seems to me that the ftp
scheme should at least block scraping from an http
or https
page.
Comment 10•5 years ago
|
||
AFAICT, the issue will go away entirely with bug 1647898.
Comment 11•4 years ago
|
||
FTP is disabled in Bug 1691890 - can we confirm the leak is resolved now?
Comment 12•4 years ago
|
||
^ or I guess it's not really resolved as users could still toggle the pref: nevermind. Guess we'll wait for the FTP code to get ripped out :)
Updated•2 years ago
|
Comment 13•2 years ago
|
||
FallbackEncoding.h doesn't exist anymore, but I am not totally sure if that means that the characterSet
is never based on the language now. I tried tracing Document::SetDocumentCharacterSet
and didn't immediately see anything related to the language, but I was by no means exhaustive.
Something else I noticed is that Document::RecomputeLanguageFromCharset
calls RecomputeLanguageFromCharset
, which seems to use the LocaleLanguage, but that might be unproblematic and unrelated to this bug anyway.
Henri, would you mind updating us with the current status when you are back?
Comment 14•2 years ago
|
||
For text/html and text/plain, the UI locale is no longer used as an input to determining the character encoding. When the character encoding is not declared, the content of the text/html or text/plain stream and, possibly, the top-level domain are used for guessing the encoding, so neither depends on user-side configuration.
I'm not aware of UI locale-dependent encoding determination other than bug 1824325.
(In reply to Tom Schuster (MoCo) from comment #13)
Something else I noticed is that
Document::RecomputeLanguageFromCharset
callsRecomputeLanguageFromCharset
, which seems to use the LocaleLanguage, but that might be unproblematic and unrelated to this bug anyway.
That's indeed unrelated but still problematic.
What that does is guessing the language (group) of the page for font selection purposes in the absence of explicit lang="..."
tagging. For example, if the encoding is Shift_JIS, you get Japanese glyph forms for ideographs whose preferred glyph details vary by locale. The problem here is that if the encoding isn't locale-affiliated (e.g. UTF-8 isn't locale-affiliated), the UI language participates in font selection (e.g. if the UI language is Japanese, you get Japanese glyph forms). (There are probably non-CJK examples where the difference affects glyphs metrics and, therefore, line breaks.)
So preventing a UI locale leak on that point is relevant, but out of scope for this bug.
Description
•