Closed Bug 492679 Opened 15 years ago Closed 15 years ago

Rename Serbian locale "sr" to "sr-LATN"

Categories

(support.mozilla.org :: Localization, task)

task
Not set
minor

Tracking

(Not tracked)

VERIFIED WONTFIX

People

(Reporter: paulc, Assigned: paulc)

Details

As per a talk with Milos, the Cyrillic alphabet accounts for only 10% of the users, and most user sites in Serbia today use Latin. This is a bug for the Serbian localization community to use the Latin alphabet for articles in Serbian language.
Also, there is a tool (called Vucko) that automatically converts between the two alphabets which we could consider. That way we could offer articles in both alphabets without having to translate manually.

I'm not sure how much support for both alphabets is needed, Milos could say better.
Also part of this will be the locale name change from sr(Српски) to sr(Srpski)
Chris, David, know anything else about this? Is this okay?
Let's hear what the l10n team says.
We shouldn't start supporting sr-Latn at SUMO.

Doing that for the product will likely involve writing a conversion script in python, I don't see us putting arbitrary exes on our build pool.

Before we start making decisions on what to do on SUMO, I'd prefer to get independent numbers on the actual use of both scripts for Serbian.
(In reply to comment #5)
> We shouldn't start supporting sr-Latn at SUMO.
> 
> Doing that for the product will likely involve writing a conversion script in
> python, I don't see us putting arbitrary exes on our build pool.
> 
> Before we start making decisions on what to do on SUMO, I'd prefer to get
> independent numbers on the actual use of both scripts for Serbian.

Take a look at biggest serbian web portals and forums. Then you'll see.
Axel, what do you propose we do with this bug? WONTFIX? Or block it for some other bug or action?
(In reply to comment #7)
> Axel, what do you propose we do with this bug? WONTFIX? Or block it for some
> other bug or action?

Pascal has been chatting with both the web and the product locaizers and there is a decision looming on which alphabet to use.  Pascal, any comments here from what you have learned in the recent past?
We have the product in cyrillic and no Sumo effort in Serbian, be it cyrillic and latin script. It is my understanding that people in Serbia can read/write their language both in latin and cyrillic scripts so having Sumo in latin script would I think be helpful also for our users of Firefox in cyrillic Serbian.

I see a potential value in the long term in having both Cyrillic and Latin script versions of the product, but it is IMO a different issue from support. I think that what Paul and Milos are asking is to be authorized to translate documentation in the script they use on a daily basis, the Latin one, it doesn't mean that if tomorrow another volunteer comes and prefers to translate some articles on Sumo in Cyrillic, he would not be allowed to do it.

If I look at other languages, you can make a parallel with Norwegian on Sumo, there are articles written in Bokmal and others in Nynorsk but they both share the same support.mozilla.com/no/ address. So why not having the two varieties of Serbian sharing the same support.mozilla.com/sr/ host?

Maybe in a couple of years from know, we will have two varieties (cyrillic/latin) of Firefox and we can then think of having a separate support site for both, or maybe not. Like nn-NO/nb-NO share the support site or es-ES/AR/MX/CL share the same support site as well.

CCing Filip as he should be involved in this discussion as the product localizer.
(In reply to comment #9)
I'd hesitate to support both UNLESS we have them in parallel (as in have duplicate articles -- linked with an automated tool if necessary).  Otherwise searching is impossible because if you search for text in Cyrillic and it turns out the article you want is in latin, you won't get the right results and vice versa.
(In reply to comment #9)
> think that what Paul and Milos are asking is to be authorized to translate
> documentation in the script they use on a daily basis, the Latin one,[...] 
> If I look at other languages, you can make a parallel with Norwegian on Sumo,
> there are articles written in Bokmal and others in Nynorsk but they both share
> [...] CCing Filip as he should be involved in this discussion as the product
> localizer.

Thanks for cc-ing me, reading this thread I have a feeling that I should have been involved much earlier on this matter.

Your parallel with Bokmal and Nynorsk does not carry over to the relationship
between the Cyrillic and the Latin script in the Serbian language.  

The reason is that the cost of having a Bokmal/Nynorsk mixed SUMO is higher than the cost of having both Cyrillic and Latin scripts fully supported.  

For Bokmal/Nynorsk, you need someone to translate all the material in both languages, so it's twice the cost of a single language.

For Serbian, if you have a Cyrillic translation, the Latin one can be produced at zero ammortized cost, and fully automatically.  That is, you can have two translations for the price of one, provided that it is done the right way, from Cyrillic towards Latin.  It does not work the other way around, as even with advanced conversion tools like Vucko, the process is at most one-quarter automated and requires extensive manual intervention.  

From my experience, doing Serbian localization in Latin is the best way to not ever have proper Serbian l10n, and to set projects back.  The Serbian l10n community has learned this the hard way in the last 8 to 9 years, and if you go out to any of the prominent Serbian l10n teams (Fedora, GNOME, KDE, OpenOffice etc) you will find people having reached the same conclusion: localize in Cyrillic and autoconvert to Latin where needed, to obtain two translations of identical quality.  Doing things the other way around is a ticket to days of painstaking maintenance, after which localizers typically decide to ditch Cyrillic l10n altogether to retain sanity.

It doesn't matter whether in the end you use one, or the other, or both --- if you localize in Cyrillic and then convert to Latin, your gain for the same amount of effort is twice as large.

To summarize, from where I'm standing the OP's suggestion of using Latin for Serbian mean (disregarding non-technical reasons, we may consider them separately):

1) Lowering the overall UX due to the awkward resulting mix of Latin and Cyrillic articles / instructions:  note that Firefox is currently translated in Cyrillic, and you will necessarily have to show cyrillic screenshots, command sequence transcripts and such --- in Cyrillic.  Does it make sense to have something like this?
2) A waste of engineering/translation/support/testing resources, as with Latin-to-Cyrillic approach the amount of work is about 2 times (in practice about 1.75 times) as large compared to going the other way around.  Are we really that resource-rich?

Yes, it may be a slight inconvenience to the website translators to get used to the Cyrillic keyboard if they don't use it on a daily basis. 

But, in practice, the setup required to get the keyboard up and running on a Linux or a Windows machine amounts to about 15 minutes of work, and getting used to it amounts to roughly a workday of actually using it.  To me this seems a small price to pay for the immense gain of being able to have full support for both scripts for the price of one --- even if it means maintaining one of these two outside of Mozilla.  

Today I talked to Stas and Seth about the possibility of setting up a Latin language pack for the Mozilla tools.  Once the automatic conversion is set up, these would be autogenerated from the main Cyrillic source.  Not to say that anyone has to do this, but with today's state-of-the-art, this is possible to do at an extremely low cost.

For these reasons, I deem using Latin script for any l10n related work in Mozilla (and otherwise) inefficient and wasteful, and can not support nor endorse it.

For further reference, and some non-technical points, please see also my comment at: https://bugzilla.mozilla.org/show_bug.cgi?id=484553#c2 . Looking at this thread, I have this strange feeling of Deja Vu.  Why are we having this discussion again?
Great info. Maybe you should have a standard reply from now on. I filed this bug at the request of a user. It's good to know all of this. 

In conclusion: Sounds like a WONTFIX to me. David?
(In reply to comment #11)

It sounds like we're going to ask localizers to translate to Cyrillic and then use some kind of tool to make the Latin version but that we should have both on SUMO.  It's not QUITE a wontfix it's more of a redirect.  If our users are expecting Latin searches to work, we should have it.  While I don't mind the localizers putting in the day of work to get used to a Cyrillic keyboard, that's NOT something we can expect of our users.

As I see it we had the following options:

1) Cyrillic ONLY
2) Latin ONLY
3) Cyrillic translated and Latin autogenerated
4) Latin translated and Cyrillic autogenerated
5) Mix of Latin/Cyrillic
6) Maintaining both Cyrillic and Latin locales manually making sure they're both up to date and have the same content.

1 is the current state which isn't acceptable because 90% of internet users use Latin (per Milos). 
2 would make future back-translation difficult and is a disconnect with the Firefox UI. 
3 is the suggested option. 
4 is more work than 3 (per Filip). 
5 makes for impossible searching unless we update our search engine to handle this special case. 
6 requires no code changes on our end but is much more work than 3 or 4.
Filip, thanks for your comment and your thorough technical and practical points.

Pascal, Seth, Axel: any comments or objections on Filip's recommendation to ask SUMO localizers to use Cyrillic and then (lower priority) figure out a way of automating the Latin conversion?
(In reply to comment #14)
> Pascal, Seth, Axel: any comments or objections on Filip's recommendation to ask
> SUMO localizers to use Cyrillic and then (lower priority) figure out a way of
> automating the Latin conversion?

No comments from me, I trust Filip's intuition and thorough explanation.  Pascal may have some other evidence from his contacts and I'll let him provide a response if necessary.
What about current artiles translations? They`re all written in latin script. For now, it`s not necessary to create a character conversion script, but it would be nice to have both locales available. Then i could make both translations for one article pretty easy, and it wouldn`t take much time.
I believe Milos knows how to use an automatic conversion tool. Maybe a temporary solution would be to just have an article on using this tool and getting <the locale leaders for languages in this situation> familiar with it. Then we can later patch the SUMO site to handle these automatically?
The list of already translated KB articles (ten articles, plus the three navigation pages; see https://support.mozilla.com/sr/kb/Localization+Dashboard) changes the situation for me.

I agree with Filip's recommendation of now renaming the language from sr(Српски) to sr(Srpski) and I definitely see the potential benefit of getting both versions if SUMO localizers could agree to translate articles in Cyrillic instead. However, as I just mentioned, we already have 13 articles translated into Latin, so unless all of those articles are updated, we will have to accept having a mix of both versions anyway for the time being.

I think that's fine, but I would definitely recommend (but not enforce, at this time) to use the Cyrillic alphabet for future articles since that means we can get both Cyrillic and Latin versions of the articles once we have the script up and running. 

In other words, if SUMO localizers all agree on using only Cyrillic, we will eventually end up in a state where we'll have both Cyrillic and Latin. If SUMO localizers don't agree, we'll end up with one mixed Serbian version. 

Milos, does this sound fair to you?
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
(In reply to comment #18)
> I agree with Filip's recommendation of now renaming the language from
> sr(Српски) to sr(Srpski)

I of course meant "I agree with Filip's recommendation of not renaming the language from sr(Српски) to sr(Srpski)"

Looks like I got the resolution wrong too. Sorry!
Resolution: FIXED → WONTFIX

(In reply to comment #17)
> I believe Milos knows how to use an automatic conversion tool. Maybe a
> temporary solution would be to just have an article on using this tool and
> getting <the locale leaders for languages in this situation> familiar with it.
> Then we can later patch the SUMO site to handle these automatically?

Yes, i`m familiar with Vucko, automatic conversion tool. Currently, I`m the only one that works on web sites translation. Due to a lack of time, Filip is able only to maintain the Firefox (and Thunderbird i guess)translation. So, basically, I`m both SUMO and Firefox web site locale leader.

(In reply to comment #18)
> The list of already translated KB articles (ten articles, plus the three
> navigation pages; see https://support.mozilla.com/sr/kb/Localization+Dashboard)
> changes the situation for me.
> 
> I agree with Filip's recommendation of now renaming the language from
> sr(Српски) to sr(Srpski) and I definitely see the potential benefit of getting
> both versions if SUMO localizers could agree to translate articles in Cyrillic
> instead. However, as I just mentioned, we already have 13 articles translated
> into Latin, so unless all of those articles are updated, we will have to accept
> having a mix of both versions anyway for the time being.
> 
> I think that's fine, but I would definitely recommend (but not enforce, at this
> time) to use the Cyrillic alphabet for future articles since that means we can
> get both Cyrillic and Latin versions of the articles once we have the script up
> and running. 
> 
> In other words, if SUMO localizers all agree on using only Cyrillic, we will
> eventually end up in a state where we'll have both Cyrillic and Latin. If SUMO
> localizers don't agree, we'll end up with one mixed Serbian version. 
> 
> Milos, does this sound fair to you?

I`m not sure i understood you quite good. I can support Filip`s recommendation if we have both locales available. So, when i create cyrillic translation of an article, i need no more than one minute to create a latin one. So, can you make another locale for us, please?
Milos, sounds good that creating an additional Latin copy from a Cyrillic version only takes a couple of minutes! We could set the latin locale up on SUMO and basically make the current locale an enforced Cyrillic version, and the new locale an enforced Latin version. We could then include a link to switch between the two on the Serbian SUMO start site. 

Only one problem: What do we call the locale URL/language code?

We currently have http://support.mozilla.com/sr/ -- what to we use instead of "sr" for the Latin version? Since they are not actually separate language versions (only separate alphabets), we'll have to make something up for this to work, e.g. "sr-latin" or similar. Any thoughts about that?
I suggest using sr-lat and sr-cyr, `cos it`s a standard practise for other portals.
OK, after chatting with Milos on IRC, here is the plan:

1. Rename the sr locale to sr-LAT (since all articles are in Latin already)
2. Add a new locale sr-CYR
3. Create redirect from sr to sr-CYR

Milos will then create Cyrillic versions (sr-CYR) of articles first and then use a script to convert the text to Latin for sr-LAT.

I will set up the relevant bugs and see if we can get them into 1.1/1.1.1 which I believe has a code freeze today.
Let's make this bug step 1 in the list above.

To do:

* change language string from sr to sr-LAT
* change local name of language from Српски to Srpski

I will file bugs for steps 2 and 3 shortly. Paul, let's try to get this into 1.1/1.1.1!
Assignee: nobody → paul.craciunoiu
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: Use Latin alphabet for articles in Serbian language → Rename Serbian locale "sr" to "sr-LAT"
Target Milestone: --- → 1.1
(In reply to comment #23)
> OK, after chatting with Milos on IRC, here is the plan:
> 
> 1. Rename the sr locale to sr-LAT (since all articles are in Latin already)

This bug

> 2. Add a new locale sr-CYR

Bug 495213

> 3. Create redirect from sr to sr-CYR

Bug 495215
I'll still need to really read through the latest comments in the bug, but neither LAT nor CYR are valid fragments of locale codes. Latn and Cyrl are, http://tools.ietf.org/html/rfc4646#section-2.2.3 and http://unicode.org/iso15924/iso15924-codes.html
It looks like Axel is right here. I compared google search results and "cyrl/latn" is definitely more common than "cyr/lat". Also, this table recommends the same thing: http://tlt.its.psu.edu/suggestions/international/bylanguage/serbocroatian.html#encode

Thanks Axel!
Summary: Rename Serbian locale "sr" to "sr-LAT" → Rename Serbian locale "sr" to "sr-Latn"
Fine with me, i`ve just suggested.
There was an existing sr-latn already. I have renamed sr-latn to sr-Latn (capitalization). This will need to be enabled.

The articles, however, were in sr, and must be assigned manually to sr-Latn. This shouldn't be a problem, at any rate, since there are so few of them.

r26474 / r26476 + r26479 (some renaming didn't happen with the patch)
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Summary: Rename Serbian locale "sr" to "sr-Latn" → Rename Serbian locale "sr-latn" to "sr-Latn"
Tiki issues cause the locale file to not be detected when naming is not uppercase (code located in lib/language.php). I don't see this as a big problem and I think it's better to be consistent with our own site, hence the second half of the locale needs to be capitalized (sr-CYRL and sr-LATN). Or else we need to change some code from the tiki library.

I think having the functionality there is good enough, rather than waiting for the next milestone (we can just do a rename then if a better solution comes up).

r26524 / r26527
The rename is actually going to be to sr-LATN (upper case part after "-"), because of tiki's handling of filenames for languages.
See bug 495215 comment 3 for more details.
Paul, can you file an upstream ticket on tiki to not break on script tags? The standard says that the casing is a "should", with language tag being lower case, region tag being upper case, and script tag being Initialcaps.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Rename Serbian locale "sr-latn" to "sr-Latn" → Rename Serbian locale "sr" to "sr-Latn"
Since 1.1 is frozen and will be pushed out on Thursday, there is no time to correct the capitalization of this script code for SUMO 1.1.

Let's definitely make sure we file a separate bug to correct this in 1.2/1.3 though. Thanks for pointing this out, Axel. I agree that we should comply with the standards.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Summary: Rename Serbian locale "sr" to "sr-Latn" → Rename Serbian locale "sr" to "sr-LATN"
sr-LATN is now enabled on support.mozilla.com.
Filed bug http://dev.tikiwiki.org/tiki-view_tracker_item.php?itemId=2568 on tiki. Sylvie, can you take a look please?
Resolution: FIXED → WONTFIX
Status: RESOLVED → VERIFIED
SUMO team, along with Filip and myself, decided to redirect sr to sr-CYRL, but not to sr-LATN as it`s mentioned in this bug. We created a new locale(sr-LATN), but have redirected old one(sr) to sr-CYRL. So, I`m marking this bug as WONTFIX `cos we found another solution.
Where is the strtoupper? I can not find it in Tiki code. We have sr-latn, pt-br and en-uk already....
You need to log in before you can comment on or make changes to this bug.