1485258 - When privacy.spoof_english is true, don't reveal locale by charset fallback

Reporter

Description

•

6 years ago

In Tor Browser, we want to make sure that the locale is not revealed to content when "privacy.spoof_english" is enabled. But in Firefox, the charset encoding can depend on the user's locale. See https://dxr.mozilla.org/mozilla-esr60/source/dom/encoding/FallbackEncoding.h#34 So we'd like to make sure the fallback behavior is the same, regardless of locale, when "privacy.spoof_english" is enabled.

Arthur Edelstein [:arthur]

Reporter

Comment 1

•

6 years ago

We should also examine the behavior of the following two prefs, that are set by torbutton: pref("intl.accept_charsets", "iso-8859-1,*,utf-8"); pref("intl.charsetmenu.browser.cache", "UTF-8");

Henri Sivonen (:hsivonen)

Comment 2

•

6 years ago

(In reply to Arthur Edelstein (Tor Browser dev) [:arthuredelstein] from comment #1) > We should also examine the behavior of the following two prefs, that are set > by torbutton: > pref("intl.accept_charsets", "iso-8859-1,*,utf-8"); > pref("intl.charsetmenu.browser.cache", "UTF-8"); I can't find either of these on searchfox. I'm pretty sure I've personally removed the latter one.

Priority: -- → P3

Thorin [:thorin]

Comment 3

•

6 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #2) > I can't find either of these on searchfox. I'm pretty sure I've personally removed the latter one. Neither are found on ESR-60 source either: https://dxr.mozilla.org/mozilla-esr60/source/

Ethan Tseng [:ethan]

Updated

•

6 years ago

Assignee: nobody → xeonchen

Whiteboard: [tor] → [tor][fingerprinting][fp-triaged]

Tom Ritter [:tjr]

Comment 4

•

6 years ago

This appears to be controlled by https://searchfox.org/mozilla-central/source/dom/encoding/FallbackEncoding.h and seems fairly straightforward to avoid doing any locale-based decisions if spoof_english is true. The hard part is probably writing the test.

Tom Ritter [:tjr]

Updated

•

6 years ago

Assignee: xeonchen → nobody

Henri Sivonen (:hsivonen)

Comment 5

•

6 years ago

(In reply to Tom Ritter [:tjr] from comment #4)

This appears to be controlled by https://searchfox.org/mozilla-central/source/dom/encoding/FallbackEncoding.h and seems fairly straightforward to avoid doing any locale-based decisions if spoof_english is true.

More specifically, if spoof_english is set, setting mFallback to WINDOWS_1252_ENCODING should take precedence over this block:
https://searchfox.org/mozilla-central/source/dom/encoding/FallbackEncoding.cpp#69-79

Thorin [:thorin]

Comment 6

•

5 years ago

Tor Ticket: https://trac.torproject.org/projects/tor/ticket/20025

When a document, or server, fails to set a charset, the page falls back to the following setting: General>Language and Appearance>Fonts and Colors>Advanced>Text Encoding for Legacy Content. The default is to fallback to the "current locale" based on your "app language" (the pref value is blank). For non en-US app languages, this can then create entropy depending on the language. You can test this on https://hsivonen.com/test/moz/check-charset.htm. e.g. japanese will reveal Shift_JIS, traditional chinese will reveal Big5, etc.

Solution: set intl.charset.fallback.override = windows-1252 when privacy.spoof_english == 2, and reset it when privacy.spoof_english !== 2

Henri Sivonen (:hsivonen)

Comment 7

•

5 years ago

(In reply to Simon Mainey from comment #6)

Solution: set intl.charset.fallback.override = windows-1252 when privacy.spoof_english == 2, and reset it when privacy.spoof_english !== 2

A less brittle solution would be to check privacy.spoof_english where intl.charset.fallback.override is checked and make privacy.spoof_english take precedence.

Georg Koppen

Updated

•

5 years ago

Whiteboard: [tor][fingerprinting][fp-triaged] → [tor 20025][fingerprinting][fp-triaged]

Thorin [:thorin]

Comment 8

•

5 years ago

(In reply to Henri Sivonen (:hsivonen) (not reading bugmail until 2020-08-03) from comment #7)

A less brittle solution would be to check privacy.spoof_english where intl.charset.fallback.override is checked and make privacy.spoof_english take precedence.

Any chance on nudging this now intl.charset.fallback.override has been deprecated (Bug 1603712)

Henri Sivonen (:hsivonen)

Comment 9

•

5 years ago

AFAICT bug 1603712 made this bug moot except for display of non-ASCII file paths in FTP directory listings. I'm unsure if it's possible for Web sites to scrape those. Without testing, it seems to me that the ftp scheme should at least block scraping from an http or https page.

Henri Sivonen (:hsivonen)

Comment 10

•

5 years ago

AFAICT, the issue will go away entirely with bug 1647898.

Thorin [:thorin]

Comment 11

•

4 years ago

FTP is disabled in Bug 1691890 - can we confirm the leak is resolved now?

Thorin [:thorin]

Comment 12

•

4 years ago

^ or I guess it's not really resolved as users could still toggle the pref: nevermind. Guess we'll wait for the FTP code to get ripped out :)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Tom Schuster

Comment 13

•

2 years ago

FallbackEncoding.h doesn't exist anymore, but I am not totally sure if that means that the characterSet is never based on the language now. I tried tracing Document::SetDocumentCharacterSet and didn't immediately see anything related to the language, but I was by no means exhaustive.

Something else I noticed is that Document::RecomputeLanguageFromCharset calls RecomputeLanguageFromCharset, which seems to use the LocaleLanguage, but that might be unproblematic and unrelated to this bug anyway.

Henri, would you mind updating us with the current status when you are back?

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Comment 14

•

2 years ago

For text/html and text/plain, the UI locale is no longer used as an input to determining the character encoding. When the character encoding is not declared, the content of the text/html or text/plain stream and, possibly, the top-level domain are used for guessing the encoding, so neither depends on user-side configuration.

I'm not aware of UI locale-dependent encoding determination other than bug 1824325.

(In reply to Tom Schuster (MoCo) from comment #13)

Something else I noticed is that Document::RecomputeLanguageFromCharset calls RecomputeLanguageFromCharset, which seems to use the LocaleLanguage, but that might be unproblematic and unrelated to this bug anyway.

That's indeed unrelated but still problematic.

What that does is guessing the language (group) of the page for font selection purposes in the absence of explicit lang="..." tagging. For example, if the encoding is Shift_JIS, you get Japanese glyph forms for ideographs whose preferred glyph details vary by locale. The problem here is that if the encoding isn't locale-affiliated (e.g. UTF-8 isn't locale-affiliated), the UI language participates in font selection (e.g. if the UI language is Japanese, you get Japanese glyph forms). (There are probably non-CJK examples where the difference affects glyphs metrics and, therefore, line breaks.)

So preventing a UI locale leak on that point is relevant, but out of scope for this bug.

Flags: needinfo?(hsivonen)

Bugzilla

When privacy.spoof_english is true, don't reveal locale by charset fallback

Categories

(Core :: DOM: HTML Parser, enhancement, P3)

Tracking

()

People

(Reporter: arthur, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tor 20025][fingerprinting][fp-triaged])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Comment 14