Closed Bug 841518 Opened 11 years ago Closed 3 years ago

Require Latin characters for Full Name profile field

Categories

(Participation Infrastructure :: Phonebook, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: williamr, Unassigned)

References

Details

(Whiteboard: [Triage 2015-04-17] [iam-RFE])

Most users search for names using Latin characters, and this makes it difficult to find users who have non-Latin characters in their name. To make these users easier to find, let's require Latin characters for the Full Name profile field. We will also create a Local Name field (bug 841224) for users to display their name using non-Latin characters too.

As suggested by Giorgos in bug 840332 comment 2:
> So the full_name field should contain one or more of the following
> characters A-Z or a-z or ' or space (or in more precise regex "^[A-Za-z'
> ]+$").

We will need a migration plan as well. Justin or Giorgos, can you find out how many users currently have non-Latin characters in their Full Name profile field? That will help us understand how many people will be affected by this change.

As a suggested migration plan, for any user with non-Latin characters in their Full Name field, we can copy that value to their Local Name field.

We will need to update the Full Name field value for users who have non-Latin characters. Depending on the number of users affected, we might be able to do this manually. If there are lots of users impacted, we could remove the non-Latin characters and prompt users to update their Full Name field by email or the next time they visit the site. 

Users with names using only Latin characters will not be affected by this bug. Their Local Name field will be empty by default.

Would love some feedback on the migration plan for this.
I think this is a bad idea, we should not have people enter stuff that isn't their name into the "Full Name" field. The world isn't Latin-only.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #1)
> I think this is a bad idea, we should not have people enter stuff that isn't
> their name into the "Full Name" field. The world isn't Latin-only.

Kairo, thanks for the feedback. Does the 'Local Name' field mentioned in comment 0 address your concern? If not, do you have any other suggestions? The problem we're trying to solve is being able to make all names easily searchable, and profiles with non-Latin names are currently difficult to find.
I think having people enter their Full Name in a wrong way (yes, expressing a name like "Jörg Düsterstraßer" in all-ASCII is a wrong way, and it's probably similar for languages with different scripts).
Adding a "Latin Name" or "Name in Latin spelling" field might be OK if there's some description of what it's needed for (and maybe dynamically hide it as long as Full Name is all-Latin anyhow).
All in all, the best idea would be to have a search mechanism that actually knows about how to transcribe names to Latin and search in them - possibly transcribing both search expressions and fields searched in the background. Of course, this can be tricky as the way to transcribe may be different per-language... :(
Addressing Kairo's concern we can keep the FullName as is and create a new 'Name in Latin' field which will be required for all users to fill in and will only contain latin characters. 

There are cases with languages where people actually have different names when it comes to the western world (e.g. Chinese) so we cannot just have a smart search engine, although I agree that this could help significantly with the search problem.

Question: My Greek passport has a latin only representation of my name. I believe that there is standard on how to translate people's name and I would argue that it's not "wrong" it's just a different global representation. How about passports from other countries with non-latin scripts?
I think the two concepts at play here are "Display Name" vs. "Search-Optimized Name". And yeah, "Display Name" should definitely be whatever people want it to be, in any characters. So the question is, "How do we make search-optimized names?" 

Ideas:

1) Give people the option to create n nicknames using any characters, explaining that nicknames enhance search. Invite them to show up in more search results, but don't require it. (Assume the person with the name knows better than anyone else what someone might search for if they want to find that person.)
2) Require people to create a search-optimized name using latin characters. (Assume that people will search with latin characters, and we can only optimize search by forcing latin name strings into profiles somehow.)

I agree with Giorgos that automating it is probably not feasible.
Giorgos: From what I heard so far, I think there's probably some kind of transcription to Latin for most scripts, though it's probably language-specific what the rules are, and in some cases, there might not be any official rules, but rather informal "let's try to replicate the intonation using the English interpretation of Latin characters/syllables" transcriptions.

Greek has pretty clear and strict rules, from what I can tell (you probably know better than me), which probably have long-going historical development stemming back to Greek-Roman relationships of old days. ;-)
Other script and languages that didn't have so much contact with the Latin-dominated parts of the world might have more loose rules, if any.

BTW, are you planning to restrict those fields to Basic Latin ("ASCII") only or Latin-1 or even including Extended Latin? (Note that e.g. people in the US or UK might already have problems typing things like German umlauts or accented characters when trying to search for a name - OTOH a German named "Jörg Düsterstraßer" like my example from comment #3 will probably type his Basic-Latin representation as "Joerg Duesterstrasser" while a number of people from elsewhere in their ignorance of our rules might search for "Jorg DusterstraBer" instead...)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #6)
> 
> Greek has pretty clear and strict rules, from what I can tell (you probably
> know better than me), which probably have long-going historical development
> stemming back to Greek-Roman relationships of old days. ;-)
> Other script and languages that didn't have so much contact with the
> Latin-dominated parts of the world might have more loose rules, if any.

Just for sake of knowledge here a list of 'romanizations'
https://en.wikipedia.org/wiki/List_of_ISO_transliterations


> 
> BTW, are you planning to restrict those fields to Basic Latin ("ASCII") only
> or Latin-1 or even including Extended Latin? (Note that e.g. people in the
> US or UK might already have problems typing things like German umlauts or
> accented characters when trying to search for a name - OTOH a German named
> "Jörg Düsterstraßer" like my example from comment #3 will probably type his
> Basic-Latin representation as "Joerg Duesterstrasser" while a number of
> people from elsewhere in their ignorance of our rules might search for "Jorg
> DusterstraBer" instead...)

My original idea is to 

> So the full_name field should contain one or more of the following
> characters A-Z or a-z or ' or space (or in more precise regex "^[A-Za-z'
> ]+$").

to solve the problem you're describing with keyboard layouts.
(In reply to Justin Crawford [:hoosteeno] from bug 924190 comment #3)
> The idea behind the field I'm
> talking about is more like, "People might also search for me as....".

This is actually a great name for the field. ;) Very self explanatory.

> This content might be nicknames. It could be "Mikey" or "M-Dog" for
> "Michael".
> 
> But it could also be more like search keywords. For someone named "Jérôme"
> it could be "Jerome" and "Gerome" and "Jerry". People whose names are often
> shortened or who have accented characters can probably anticipate at least a
> few variations, and this lets them do their own SEO. :)

I guess if you made this a space-separated list of keywords, that might work. I guess the search terms are already being tokenized when searching, so you'd just need to check each token against this field as well.


(In reply to Giorgos Logiotatidis [:giorgos] from comment #7)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #6)
> My original idea is to 
> 
> > So the full_name field should contain one or more of the following
> > characters A-Z or a-z or ' or space (or in more precise regex "^[A-Za-z'
> > ]+$").
> 
> to solve the problem you're describing with keyboard layouts.

This wouldn't solve the "Jurgen" vs "Juergen" problem. And it's harsh and unfriendly as well.
AFAICT, the Mozillian ID, IRC nickname, and email adress (and parts thereof), which are already ASCII text, will already be found by a search. This is already a step in the right direction.

OTOH, I imagine that searching on non-ASCII text would not necessarily be bad. If someone is named "François Lebœuf", which is nothing out of the ordinary for someone from a French-speaking country, anyone knowing French would spell his name in a search either the way I just did (with ç which is part of Latin-1 but not of ASCII, and œ which is not even in Latin-1) or possibly replacing œ by oe if he doesn't know how to get the digraph on his keyboard.

So we come to an other possibility: what about alternative spellings?

François Lebœuf →
  "François Lebœuf" or "François Leboeuf" or "Francois Leboeuf"

Jörg Düsterstraßer →
  "Jörg Dürsterstraßer" or "Joerg Duersterstrasser" or "Jorg Dursterstrasser" or "Jorg Dursterstraβer" (with Greek beta) or "Jorg DursterstraBer"

Élő Árpád (with family name first as is the Hungarian usage) →
  "Élő Árpád" or "Árpád Élő" or "Arpad Élö" or "Arpad Elo"

etc. I think that in addition to the Mozillian ID, IRC nickname and email address (which, as said, are already in ASCII) there would be at most a handful of variants for the Full Name. The problem of course would be how to make them up. Maybe a semi-automatic process (with the user putting up a suggestion or two, and a "fuzzy search" engine filling in the rest) would be adequate?
Whiteboard: [Triage 2015-04-17]

This is an RFE specific to mozillians, which is decom'ed.
If this is found to pertain to people.mozilla.org then please open a new issue.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INVALID
Whiteboard: [Triage 2015-04-17] → [Triage 2015-04-17] [iam-RFE]
You need to log in before you can comment on or make changes to this bug.