Open Bug 1719738 Opened 4 years ago Updated 7 months ago

Simplify Timezone Names to Reduce Fingerprinting

Categories

(Core :: Privacy: Anti-Tracking, enhancement)

enhancement

Tracking

()

People

(Reporter: me, Unassigned)

Details

(Whiteboard: [fingerprinting])

Timezone is one of the canonical browser fingerprinting vectors, but as the Mozilla Wiki's Fingerprinting page put it, it is "Too useful to break." However, that doesn't say anything about the timezone name, which is a lot more unique: there are >300 timezone names while only ~40 timezone offsets. Indeed, timezone name is being used for fingerprinting (see this snippet from fingerprintjs).

Building on this insight, fmarier (Brave) proposed simplifying timezone names to one per timezone offset, like America/Toronto would become Etc/GMT+4. I think a further refinement to picking the most popular timezone name in that offset, like like America/Toronto becomes America/New York might reduce breakage even further.

I wrote a quick patch (with the above example) to show that this is possible: https://github.com/sgmenda/ideas/blob/main/simplify-timezones-to-resist-fingerprinting/simplify-timezones.patch

(In reply to sanketh from comment #0)

there are >300 timezone names while only ~40 timezone offsets

There are at least 374 distinct sets of timezone offsets when taking into consideration DST and covering all years - reducing to anything less than this will cause breakage

forgot to link the PoC: https://arkenfox.github.io/TZP/tests/timezones.html .. click [combine years]

from https://bugzilla.mozilla.org/show_bug.cgi?id=1364261#c27

That doesn't work. Instead of 374 unique sets of timezone offsets, where everyone is happy all year round with correct times and DST, you would reduce that to 25 timezones with no DST, and put all those in a timezone with DST, with the wrong time for half a year.(

I don't understand.
My current time zone is America/Los_Angeles. As of today, that is equivalent to Etc/GMT-7. In November, that will change to Etc/GMT-8. The browser could be reporting Etc/GMT-7 today (DST on) and Etc/GMT-8 when DST is off. Where's the issue?

Alessandro, I misunderstood. I thought you meant applying a static mapped equivalent. Etc/GMT doesn't have DST. But if you assigned the right one based on the current DST, then sure, the "wrong time for half a year" wouldn't exactly apply.

But that still doesn't cater for non full-hours, e.g. Indian/Cocos today is UTC+06:30 (and literally thousands more examples of TZs and dates in past) and it lacks offsets - i.e your solution only applies to the current date

there are >300 timezone names while only ~40 timezone offsets

update/number crunching: at the time of writing

  • https://arkenfox.github.io/TZP/tests/timezones.html
  • 444 "supported" timezone names
  • 596 timezone names tested if you uncheck supported
    • the only invalid one now is Factory
    • does not alter the final group count
  • 152 extra timezone names IIUIC act as aliases for a supported one
  • 58-59 seems to be the number of timezone offsets over a year (for recent years, not old-timey stuff)
    • concurrent varies at any given point but is around 40 IIRC
    • e.g. 2023 = 58, 2024 = 59, 2025 = 58, 2026+ = 58
    • therefore the minimum timezone names we could use would be those combined over our supported years
    • e.g. 2023-2026+ = 62

testing with 444 supported timezone names

  • without affecting any datetime: i.e any offsets are correct throughout history/calendars
    • we can reduce this to 338
    • e.g. America/Lower_Princes, America/Marigot, America/Montserrat,... are identical
    • e.g. Asia/Dubai, Asia/Muscat, Indian/Mahe, Indian/Reunion are identical
  • with only ensuring 1990+ datetime is always correct we can reduce this to 214
  • with only ensuring 2000+ datetime is always correct we can reduce this to 171
  • with only ensuring 2010+ datetime is always correct we can reduce this to 132
  • with only ensuring 2020+ datetime is always correct we can reduce this to 73
    • enter 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027 into the field and run combined
    • of which 45 are unique with a single member: which is a little misleading, because some of those with 2 or 3 are only grouped with Etc/
      • e.g. Pacific/Fiji is a group, so unique
      • e.g. Etc/GMT-14, Pacific/Kiritimati is a group of 2, but in reality is unique
      • there isn't much we can do about the long thin tail
    • but there are large groups: look at the group summary: e.g. 33, 32, 28, 23, 17, 16, 16, 16, 16, 15, 15, 14, 13, 11, 11, 10, 10, 10, 10 ...
    • e.g. these 33 timezone names
      • Africa/Ceuta, Arctic/Longyearbyen, Europe/Amsterdam, Europe/Andorra, Europe/Belgrade, Europe/Berlin, Europe/Bratislava, Europe/Brussels, Europe/Budapest, Europe/Busingen, Europe/Copenhagen, Europe/Gibraltar, Europe/Ljubljana, Europe/Luxembourg, Europe/Madrid, Europe/Malta, Europe/Monaco, Europe/Oslo, Europe/Paris, Europe/Podgorica, Europe/Prague, Europe/Rome, Europe/San_Marino, Europe/Sarajevo, Europe/Skopje, Europe/Stockholm, Europe/Tirane, Europe/Vaduz, Europe/Vatican, Europe/Vienna, Europe/Warsaw, Europe/Zagreb, Europe/Zurich
        
      • could use and report as Europe/Paris
  • if we supported accuracy only for the last year (2024) + current year + future we can reduce this to 60

Note: all groupings would need to be checked/updated with tzdata changes.

As we ignore older datetime accuracy, we reduce minimum viable timezone names (see offsets over supported years). The question becomes at what point do we care about accuracy in older calendar entries.

Reduction from 444 (or 596?) possible (IDK how many are used in reality) timezone names to ~60-70 (with a long thin tail) seems feasible, with minimal usability issues - e.g. you order a local pizza it has your correct local time, you look up calendar entries, send emails, add calendar entries in the future - all correct time.

e.g. these 33 timezone names ... could use and report as Europe/Paris

Compat/usability i am unsure of here, is websites using timezone name to provide the web content language - e.g. in Portugal google search kindly provided Portuguese despite my locale being en-US - the laptop was on portugal time, you see (automatic OS change). I have no idea how widespread this is, or if it's even good practice - I would have expected best practice would be to respect the locale/requested-web-content language(s)

edit: a mitigation option here (is this is an actual issue) would be to report the timezone based on the language part of navigator.language (so en from en-US) - this wouldn't add entropy: e.g.

  • pl would be Europe/Warsaw, fr would be Europe/Paris, a non-match would need to fallback to something in our group

Reduction from 444 (or 596?) possible (IDK how many are used in reality) timezone names to ~60-70 ..

There are metaZones in CLDR. IDK what Windows uses, but for example, in Windows 11 there are 139 timezone choices where Windows doesn't cares about ye-olde-timey changes, and only recent time and the future matters - it has used "something" to group and simplify IANA timeZones

e.g here are some choices

  • (UTC +01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna changes your timezone to Europe/Berlin
  • (UTC +01:00) Belgrade, Bratislava, Budapest, Ljubljana, Prague = timezone is set as Europe/Budapest

IDK why e.g. Berlin over Rome. Maybe it changes with your platform language? It's going to need some telemetry breakdown per platform

You need to log in before you can comment on or make changes to this bug.