bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

Mechanism needed to specify user region

NEW
Unassigned

Status

()

Core
Internationalization
6 years ago
6 years ago

People

(Reporter: basta, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

6 years ago
Currently, the best way to determine the user's region is to first look at the Accept-Language header and then to fall back on looking up the region for the user's IP address. The problems associated with this are great:

1. Accept-Language is rarely accurate for users that don't speak the native language of a region (e.g.: English-speaking persons in Brazil would likely have en-US instead of something like en-BR). There is no way to provide this during setup and no obvious way to specify it after setup.

2. Targeting by IP is dubious at best and requires a large database of IP ranges. It's slow and often inaccurate. Using a third-party service tends to be expensive and incurs overhead.

3. Targeting by IP will likely not be possible or practical once IPv6 starts to set in due to the sheer number of addresses.


Some other notes:

* The geolocation API doesn't help here because it requires user interaction, which happens too late. If you're routing the user to a page based on their region, they've already landed on a page at that point. Client-side APIs do not help.

* It is conceivably possible to determine the user's region with DNS or with the help of a CDN, but this isn't accessible to the average developer.

* Most machines have a general idea of where they're located based on the city specified for the time zone, but this isn't exposed via HTTP headers.


This is a consideration that the web will need going forward, as it's currently incredibly difficult to make sure that i18n is done properly.
(Reporter)

Comment 1

6 years ago
I should clarify that we don't necessarily need a new HTTP header or other mechanism to send the user's region if it's been provided, only a way for the user to optionally specify his or her region so that the locale code is completely accurate (possibly during setup). We should be prompting the user for this.
I'm not sure I follow.  Why is Accept-Language not what we want?  What are we trying to internationalize such that what I see would change if I pick up my computer and fly to Brazil with it for a vacation without changing anything else?
(Reporter)

Comment 3

6 years ago
Consider this: you're an english-speaking (non-Portugese-speaking) person in Brazil. As a site owner, we want to target Brazilian users with Brazil-specific content. A good example (what initially sparked this) is Marketplace's country stores. If you downloaded the English version of Firefox, chances are you have the locale "en-US". Your Accept-Language header thus doesn't reflect your region and it's not possible to determine whether or not you wish to view Brazilian content. Currently, the only way to work around this is to try to geolocate the user's IP address.

If a user has the default en-US language header, they might be shown items in the marketplace that are unavailable in his or her region due to laws or regional restrictions. The user won't find out about this until they enter their credit card info, which then tells them, "Oh, you're in a region that isn't allowed to purchase this item."

Having a mechanism that asks users "Where do you live and what do you speak?" would be important because it would ensure users never encounter content that isn't available to them.

Another important aspect would be to provide this regional information either via Accept-Language in some way. E.g.: `Accept-Language: en-ca-ca, en-ca` to denote a Canadian English-speaking Canadian user or `Accept-Language: zh-cmn-cn-hk, zh-cmn-cn` for a Chinese Mandarin-speaking person living in Hong Kong. This would be backwards-compatible with servers that do not recognize such a notation.

Another solution would be to add a new HTTP header (i.e.: User-Region or Accept-Region), though this would be expensive.
(In reply to Matt Basta from comment #3)
> Consider this: you're an english-speaking (non-Portugese-speaking) person in
> Brazil. As a site owner, we want to target Brazilian users with
> Brazil-specific content. A good example (what initially sparked this) is
> Marketplace's country stores. If you downloaded the English version of
> Firefox, chances are you have the locale "en-US". Your Accept-Language
> header thus doesn't reflect your region and it's not possible to determine
> whether or not you wish to view Brazilian content. Currently, the only way
> to work around this is to try to geolocate the user's IP address.
> 
> If a user has the default en-US language header, they might be shown items
> in the marketplace that are unavailable in his or her region due to laws or
> regional restrictions. The user won't find out about this until they enter
> their credit card info, which then tells them, "Oh, you're in a region that
> isn't allowed to purchase this item."
> 
> Having a mechanism that asks users "Where do you live and what do you
> speak?" would be important because it would ensure users never encounter
> content that isn't available to them.

Why not just ask the user when they register an account on your site, which serves different content to different regions? It seems to me that most sites do not need this information, and that making it available to them could allow for artificial or malicious censoring of content. Not to mention that relying on a user-set region to enforce region locking will lead to users changing their regions to get around these restrictions, rendering it moot.

> Another important aspect would be to provide this regional information
> either via Accept-Language in some way. E.g.: `Accept-Language: en-ca-ca,
> en-ca` to denote a Canadian English-speaking Canadian user or
> `Accept-Language: zh-cmn-cn-hk, zh-cmn-cn` for a Chinese Mandarin-speaking
> person living in Hong Kong. This would be backwards-compatible with servers
> that do not recognize such a notation.

The Accept-Language header is standardized to use BCP47 language tags, which have a specific format and which encode language, script, region, and variant information in a specific way. None of the language tags you propose are compatible with BCP47; thus, they are not in fact "backwards-compatible with servers that do not recognize such a notation".
Ah, ok.  So the point is that this would be a user setting somewhat independent of actual location?
(Reporter)

Comment 6

6 years ago
(In reply to Gordon P. Hemsley [:gphemsley] from comment #4)
> Why not just ask the user when they register an account on your site, which
> serves different content to different regions?

If we don't tailor the content from the user's first visit to the page, they could use the site for a significant amount of time only to find that they're unable to complete the action that they're looking to complete simply because we didn't know that they were in a "restricted" locale. A good example that was brought up is Canadian users that find that they can't have certain products delivered to their address from Amazon after they've already moved to the cart. This is an incredibly bad user experience that could otherwise be prevented.

> It seems to me that most
> sites do not need this information, and that making it available to them
> could allow for artificial or malicious censoring of content.

I'm not entirely sure that I understand this. There's no reason why we can't provide an option to the user to disable this as well, nor do we need to require the user to provide their region.

Presently, content providers can censor based on Accept-Language if they chose to. They could also require the user to provide their location with navigator.geolocation or simply use any other client-side API.

> Not to mention
> that relying on a user-set region to enforce region locking will lead to
> users changing their regions to get around these restrictions, rendering it
> moot.

This is meant as a heuristic and not as a means of verifying the user's location. Ideally, this would replace inefficient ip-to-geolocation databases and APIs. Enforcing regional restrictions needs to be accomplished with credit card verification or other more official means. This feature is meant solely as one to improve user experience.

In effect, it has the same use case as Accept-Language itself. Accept-Language allows the content to be localized. Providing the region would allow the content to also be internationalized. Content isn't just translated between languages; there are restrictions and variations in content depending on the user's location and not just their language.

> The Accept-Language header is standardized to use BCP47 language tags, which
> have a specific format and which encode language, script, region, and
> variant information in a specific way. None of the language tags you propose
> are compatible with BCP47; thus, they are not in fact "backwards-compatible
> with servers that do not recognize such a notation".

If a server does not recognize a language tag, the language tag is and should be ignored. Thus, it is backwards compatible.

The format that I proposed is simply an example. I'd like to encourage the development of a more fitting method for delivering region information. I'm not an expert on the intricacies of Firefox's HTTP implementation or the various IEEE specs and RFCs, so I assume that there is a better way to implement this by someone that is more knowledgeable in that domain.
(Reporter)

Comment 7

6 years ago
(In reply to Boris Zbarsky (:bz) from comment #5)
> Ah, ok.  So the point is that this would be a user setting somewhat
> independent of actual location?

Yes, exactly.
(In reply to Matt Basta from comment #6)
> (In reply to Gordon P. Hemsley [:gphemsley] from comment #4)
> > Why not just ask the user when they register an account on your site, which
> > serves different content to different regions?
> 
> If we don't tailor the content from the user's first visit to the page, they
> could use the site for a significant amount of time only to find that
> they're unable to complete the action that they're looking to complete
> simply because we didn't know that they were in a "restricted" locale. A
> good example that was brought up is Canadian users that find that they can't
> have certain products delivered to their address from Amazon after they've
> already moved to the cart. This is an incredibly bad user experience that
> could otherwise be prevented.

If you're allowing users to browse and add items to a cart without first logging in, I see no reason why you can't take the time to prompt them for their region upon first visit. Your proposal requires that users be prompted when they first open the browser, so it doesn't make much difference. (It is arguably also a bad user experience for a user to be prompted for a bunch of things when they just want to start browsing the Internet—look what happens when you open Internet Explorer for the first time, for example.)

> > It seems to me that most
> > sites do not need this information, and that making it available to them
> > could allow for artificial or malicious censoring of content.
> 
> I'm not entirely sure that I understand this. There's no reason why we can't
> provide an option to the user to disable this as well, nor do we need to
> require the user to provide their region.

But once they do, the region is then sent to every site they visit. There is no mechanism to disable sending it to a particular site—and if there were, it would inherently have to be opt-out. It seems to me that a per-site (Web-based), opt-in mechanism would make more sense.

> Presently, content providers can censor based on Accept-Language if they
> chose to. They could also require the user to provide their location with
> navigator.geolocation or simply use any other client-side API.

The Accept-Language header is intended to give the user the ability to say which language they would prefer to receive content in, if it is available, in order to improve their understanding of the site text. (Note there aren't too many sites which don't use text.) I can't imagine it is particularly effective for a site to censor based on that information.

On the other hand, this proposal to send location information seems specifically geared towards limiting the content a user receives. Users do not browse the Web with a desire to *not* have access to certain portions of it, and I don't think we should encourage that behavior any more than is required by law. I think the cons of allowing a site to have direct access to the user's self-described location outway the pros in a way that is not comparable to Accept-Language.

As for Geolocation or other APIs, any site that requires the user to provide such information in order to operate will likely not be very popular, especially with users who do not want to identify where they are. These APIs are designed to be *supplemental* information intended to improve the user experience from a generic baseline. I do not think your proposal fits into such a description.

> > Not to mention
> > that relying on a user-set region to enforce region locking will lead to
> > users changing their regions to get around these restrictions, rendering it
> > moot.
> 
> This is meant as a heuristic and not as a means of verifying the user's
> location. Ideally, this would replace inefficient ip-to-geolocation
> databases and APIs. Enforcing regional restrictions needs to be accomplished
> with credit card verification or other more official means. This feature is
> meant solely as one to improve user experience.
> 
> In effect, it has the same use case as Accept-Language itself.
> Accept-Language allows the content to be localized. Providing the region
> would allow the content to also be internationalized. Content isn't just
> translated between languages; there are restrictions and variations in
> content depending on the user's location and not just their language.

I disagree. Every user is entitled to learn or attempt learn any language, and this choice of which content to access is dictated by the user themselves. They are free to change what language they prefer the most at any point.

On the other hand, the region they are in, and the restrictions that come along with it, are much more difficult to change. The user should not be concerned with what those restrictions are until the last possible moment.

Language is user-controlled; regional restrictions are state-controlled. There is a major difference between the two, and I think they should be handled as such.

> > The Accept-Language header is standardized to use BCP47 language tags, which
> > have a specific format and which encode language, script, region, and
> > variant information in a specific way. None of the language tags you propose
> > are compatible with BCP47; thus, they are not in fact "backwards-compatible
> > with servers that do not recognize such a notation".
> 
> If a server does not recognize a language tag, the language tag is and
> should be ignored. Thus, it is backwards compatible.

There is a difference between valid and well-formed language tags. Language tags can be well-formed without being valid (i.e. having meaning), and a language tag must be well-formed in order to be valid. The language tags you propose are not well-formed, as BCP47 currently stands, and they may be expressly forbidden for backwards-compatibility reasons already.

Assuming the server can even determine that it cannot parse such a language tag, if it does not support the proposed format, it would be required to discard the tag wholesale. If all language tags are extended to include such region information (which, FTR, I believe is an abuse of the language tags), then the server would discard all of the language tags, leaving the Accept-Language header useless. And including duplicate tags without the region information would require the header to be twice as long as necessary.

> The format that I proposed is simply an example. I'd like to encourage the
> development of a more fitting method for delivering region information. I'm
> not an expert on the intricacies of Firefox's HTTP implementation or the
> various IEEE specs and RFCs, so I assume that there is a better way to
> implement this by someone that is more knowledgeable in that domain.

If such a usecase is deemed sufficient for inclusion, I think it's fair to say that Accept-Language is not the proper place for it. But I won't speak to the feasibility of other methods without seeing concrete proposals.
(Reporter)

Comment 9

6 years ago
(In reply to Gordon P. Hemsley [:gphemsley] from comment #8)
> If you're allowing users to browse and add items to a cart without first
> logging in, I see no reason why you can't take the time to prompt them for
> their region upon first visit.

One could say the same thing for Accept-Language. Let's have all of our users select which language they speak every time they visit our site. It's neither practical nor user-friendly. If the user can't take advantage of the content because they can't read it, it's not useful. If a user can't take advantage of content because they aren't able to purchase it it's not relavent to them, then it's also not useful.

> Your proposal requires that users be prompted
> when they first open the browser, so it doesn't make much difference. (It is
> arguably also a bad user experience for a user to be prompted for a bunch of
> things when they just want to start browsing the Internet—look what happens
> when you open Internet Explorer for the first time, for example.)

I also suggested that we could provide the information based on the city associated with the user's time zone or other already-available data sources. Simply polling the geolocation API for a country-level location on the first start of the browser would be sufficient.


The whole point of this feature is remove the need for IP geotargeting, which itself was introduced to remove the need to prompt the user for his or her location. A per-site opt-in defeats the purpose of this feature.

> On the other hand, this proposal to send location information seems
> specifically geared towards limiting the content a user receives. Users do
> not browse the Web with a desire to *not* have access to certain portions of
> it, and I don't think we should encourage that behavior any more than is
> required by law.

Once again, this mechanism isn't anything new, it simply makes it straightforward for services to take advantage of the user's region. It's probably better that we provide something standard and safe now rather than wait for someone else to build a technology or standard later on that isn't as considerate of the user's rights.

This feature isn't meant to limit a user's access to content (as it's certainly not enforceable). It's simply meant to enable sites to tailor content to the user's region. eCommerce is probably the best example of this: there are many regional conditions on prices that could be shown to the user up-front rather than after the user has provided his or her credit card info to provide hard verification of his or her region.

As I've already said, content in the wrong language is just as irrelevant as content that doesn't apply to the user. Your suggestion to use an opt-in per-site is the status quo and the status quo is insufficient. IP geotargeting won't work in a decade (or less) and we've got nothing to replace it. *Something* should be done now or developers will be forced to use services like Google in the future (who could end up doing god knows what with the data).


> On the other hand, the region they are in, and the restrictions that come
> along with it, are much more difficult to change. The user should not be
> concerned with what those restrictions are until the last possible moment.

I think you would feel differently if you lived in a region that had such restrictions. Consider a shopping cart where you were denied access to products after you had entered your credit card details (and your region could be verified). This would be exceptionally frustrating and would easily have been prevented by simply having had your region available to the eCommerce site you're visiting.

News sites might show Italian articles to Italian users and stories from America to visitors that are state-side. Support sites could forward visitors to pages which show information relavent to support in the user's region. Map-related applications can phrase their copy differently for regions which have different naming rules for places than the rest of the world (e.g.: Japanese content might show the names of blocks rather than streets, but only in Japan). There are plenty of legitimate uses for this technology.

> There is a difference between valid and well-formed language tags. Language
> tags can be well-formed without being valid (i.e. having meaning), and a
> language tag must be well-formed in order to be valid. The language tags you
> propose are not well-formed, as BCP47 currently stands, and they may be
> expressly forbidden for backwards-compatibility reasons already.

Again, the example was not meant to be taken literally as the exact request for implementation.

> If such a usecase is deemed sufficient for inclusion, I think it's fair to
> say that Accept-Language is not the proper place for it. But I won't speak
> to the feasibility of other methods without seeing concrete proposals.

In that case, I'd like to propose the HTTP header User-Region:

    User-Region = "User-Region" ":"
    region      = 1*8ALPHA

Regions should be ISO-3166 two-letter country code. The absense of User-Region denotes that the user's region is unknown or that the user has chosen not to share his or her location.

The User-Region header should not be sent when the user has enabled Do Not Track.
I just realized that you might be interested to know about the Unicode extension to BCP 47 language tags that allows for the specification of details such as calendar, currency, number format, timezone, and other details about a locale:

http://unicode.org/reports/tr35/

None of these subtags are currently sent out by any user agent, AFAIK, but this information might be useful to you, and the parsing of it is at least compliant with BCP 47, with proper fallback capabilities. (And you might be able to petition Unicode to include additional information, if this is not sufficient.)

Though I'm still not entirely convinced that this is the best way to go about things.
(Reporter)

Comment 11

6 years ago
After thinking about it more and more, relying on anything that's based on/around standard locale codes likely isn't the correct way of passing the information. While the language code that Accept-Language provides requests a transformed (translated) version of the same piece of content, the problem that I'm trying to solve is to provide information which requests an alternate piece of content. Because of this, a header along the lines of the one that I listed in #9 is likely the most correct means of achieving this.

While UTS35 does provide some information that would make this useful, it doesn't address the issue of the user's regional requirements. The region provided by the spec would imply the region in which the user's language is spoken, not the user's current location. While something like currency might be usable for this purpose, it seems like more of a hack than anything.

I should also note that by allowing the user to provide his own region and reducing dependence on IP geolocation, we'd be reducing the amount of frustration experienced by users that live on the borders of two countries. Many users that live in upstate New York, for instance, are often automatically determined to live in Canada (and vise versa). Depending on which website or service they're interacting with, it isn't always possible to specify otherwise.
I'm inclined to agree with you, though I would like to see some potential usecases that don't revolve around censorship of some kind (which regional restrictions indeed are).

However, I don't know that we should restrict the value of the header to simply 1-8 letters. In addition to specifying region codes, ISO 3166 also specifies subdivisions. So, for example, it might be useful for a user to specify which state/province they live in:

User-Region: US-NY
or
User-Region: CA-QC
or
User-Region: CA-ON

Of course, then we'd have to worry (or maybe we wouldn't) about places that have multiple codes under ISO 3166. For example, would Puerto Rico identify as 'US-PR' or as 'PR'?

In addition, we'd have to ensure that the region is specified with the 2-char code, not the 3-char code (although it seems that subdivision codes already make explicit that they use 2-char codes).

References:
http://en.wikipedia.org/wiki/ISO_3166
http://en.wikipedia.org/wiki/ISO_3166-2
http://en.wikipedia.org/wiki/ISO_3166-2:US
http://en.wikipedia.org/wiki/ISO_3166-2:CA
The definition to match my suggestion:

    User-Region = "User-Region" ":"
    region      = 1*8ALPHA [ "-" 1*3(ALPHA / DIGIT) ]

However, I realize that you allow up to 8 letters to define the region. However, ISO 3166 only specifies regions with 2-letter, 3-letter, or 3-digit codes. As 3 characters will likely be enough to list all regions, even if the 2-letter codes are eventually exhausted, I recommend this instead:

    User-Region = "User-Region" ":"
    region      = (2*3ALPHA / 3DIGIT) [ "-" 1*3(ALPHA / DIGIT) ]

(Each region has a different specification of subdivisions; some have 2-letter codes, others have 2-digit codes, others have 1-letter codes, etc.)

On the other hand, we may want to future-proof this even further: What if a state gets codes for counties, for example? This definition would not allow multiple subtags.
(Reporter)

Comment 14

6 years ago
It might be smart to also include support for something similar to a ZIP code, or to provide a means of providing data in a key-value style for future expansion.

Some use cases:
- "Poor man's CDN" to provide users with URLs hosted on mirrors nearest to their location.
- Display product availability in user's region before user proceeds to checkout.
- Setting start location for maps applications
- Sports websites showing relavent teams to user's region by default
- Stores advertising summer/winter products depending on the user's hemisphere
- Automatically switch users to the least expensive payment gateway for the user's region
- Defaulting setting for downloads on video file hosts to NTSC or PAL depending on user's location
- Automatic internationalization of units (metric/imperial)

There are many more, but these are just a few trivial examples.
(In reply to Matt Basta [:basta] from comment #14)
> It might be smart to also include support for something similar to a ZIP
> code,

The productivity gain versus privacy loss ratio gets progressively smaller as the contents get more specific.

> or to provide a means of providing data in a key-value style for
> future expansion.

I think they call that the Geolocation API. ;)

> Some use cases:
> - "Poor man's CDN" to provide users with URLs hosted on mirrors nearest to
> their location.

Increased download speed, good.

> - Display product availability in user's region before user proceeds to
> checkout.

That's the original one you mentioned, which I think is a form of censorship. (One that is very common and often required by law, but still a form of censorship.)

> - Setting start location for maps applications

Coarsely, yes. It wouldn't make much sense for me to go to Yandex Maps and have my default view be Russia, since I'm in the United States.

> - Sports websites showing relavent teams to user's region by default

That makes sense, I suppose. Though you'll always have the fans who don't live near their favorite teams.

> - Stores advertising summer/winter products depending on the user's
> hemisphere

I like this one.

> - Automatically switch users to the least expensive payment gateway for the
> user's region

I'm not quite sure what that would mean, but OK.

> - Defaulting setting for downloads on video file hosts to NTSC or PAL
> depending on user's location

Is that even still important for digital videos nowadays?

> - Automatic internationalization of units (metric/imperial)

That seems to me like something that would be better suited to the Unicode language tag extension I mentioned earlier.

> There are many more, but these are just a few trivial examples.

My opinion matters very little with regard to whether these usecases are good/important enough, but I would think there would need to be a large number of non-trivial usecases in order to make this worthwhile to implement. (Keep in mind the existence of efforts like bug 572650, as well.)
(Reporter)

Comment 16

6 years ago
> That's the original one you mentioned, which I think is a form of censorship. (One that is very common and often required by law, but still a form of censorship.)

It's not a form of censorship at all. Stores that are barred from selling or delivering products to a user use credit card details to perform a check on the user. The header would be used solely to show a notification along the lines of, "Note: This product may not be deliverable to your address due to legal restrictions. See ... for more details."

If a product can't be delivered to a customer in, say, Hawaii or Guam because it can't be shipped that far, it's not censorship to tell the user that in advance. Using a HTTP header as the means of enforcing such restrictions would be silly, especially when a more reliable source of region is available at the end of the site's transaction with the user.

> Is that even still important for digital videos nowadays?

It's surprising how often this happens. It probably won't be a common use case, but it shows the nature of the problem.

> That seems to me like something that would be better suited to the Unicode language tag extension I mentioned earlier.

Yes, perhaps.

> (Keep in mind the existence of efforts like bug 572650, as well.)

As I said in comment #3, it will be expensive to add a header, which is why I was hesitant to suggest one. In terms of minimizing entropy, I think that the value of such a header for any given user will be the same for most other users who have the same Accept-Language header, making the net effect on fingerprintability negligible.
(In reply to Matt Basta [:basta] from comment #16)
> > That's the original one you mentioned, which I think is a form of censorship. (One that is very common and often required by law, but still a form of censorship.)
> 
> It's not a form of censorship at all. Stores that are barred from selling or
> delivering products to a user use credit card details to perform a check on
> the user.

It is this barring that I am referring to as censorship, not the use of the header to comply with it. :)

> > (Keep in mind the existence of efforts like bug 572650, as well.)
> 
> As I said in comment #3, it will be expensive to add a header, which is why
> I was hesitant to suggest one. In terms of minimizing entropy, I think that
> the value of such a header for any given user will be the same for most
> other users who have the same Accept-Language header, making the net effect
> on fingerprintability negligible.

Well, that depends. For the people who do not change their Accept-Language header, the effect of adding User-Region could be minimal. But for everyone who DOES change their Accept-Language header, those who are in the middle of the fingerprintability spectrum will be skewed more towards uniqueness than if they didn't have User-Region. (And those whose Accept-Language headers are already unique, the addition of User-Region would make them even more unique!)

But I think I've made my point by now. I'll await feedback from other people as to whether this might be worth implementing.
You need to log in before you can comment on or make changes to this bug.