Detect numbers in non-English languages

RESOLVED WONTFIX

Status

Webtools Graveyard
Verbatim
RESOLVED WONTFIX
5 years ago
7 months ago

People

(Reporter: Amir Farsi, Unassigned)

Tracking

Details

Attachments

(2 attachments)

(Reporter)

Description

5 years ago
User Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36

Steps to reproduce:

The system can not detect Persian numbers in text. I writed persian numbers, as well as 400 in verbatim in Persian style which is ۴۰۰ but verbatim could not detect and showing number error.


Actual results:

The system show number error when i changed the number format to Persian(Farsi) format.


Expected results:

The verbatim must be detect Persian numbers in translated text, and don't show Numbers error.

Comment 1

5 years ago
Hi Amir,  thanks for the report.  We're working on a similar problem for Bengali that we'll role into first mozilla.locamotion.org and then that should role to Verbatim.

The one issue we need to work through is knowing when numbers should and shouldn't be translated, the check at the moment is simple it looks only that the numbers are the same in both source and target.

So we're looking at this and hopefully you will be able to review how well this works once it is deployed.

Comment 2

5 years ago
(In reply to comment #1)
> Hi Amir,  thanks for the report.  We're working on a similar problem for
> Bengali that we'll role into first mozilla.locamotion.org and then that should
> role to Verbatim.
> 
> The one issue we need to work through is knowing when numbers should and
> shouldn't be translated, the check at the moment is simple it looks only that
> the numbers are the same in both source and target.

Unicode libraries should be able to give you that information, but I'm not 100% sure what the usual way of doing this in a web server side application is.

Comment 3

5 years ago
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #2)
> (In reply to comment #1)
> > Hi Amir,  thanks for the report.  We're working on a similar problem for
> > Bengali that we'll role into first mozilla.locamotion.org and then that should
> > role to Verbatim.
> > 
> > The one issue we need to work through is knowing when numbers should and
> > shouldn't be translated, the check at the moment is simple it looks only that
> > the numbers are the same in both source and target.
> 
> Unicode libraries should be able to give you that information, but I'm not
> 100% sure what the usual way of doing this in a web server side application
> is.

The tests are expensive so we probably won't drag in a large library for this.  Probably just remove XML and then look for equivalence in numbers.  The simple mapping we'll need is 0 is X, 1 is Y, 2 is Z, etc.  Unicode tables mention if a character is a number but not what number it represents IIRC.
(Reporter)

Comment 4

3 years ago
Is anyone there for fixing this problem?
Pinging Dwayne for an update on this. Amir, have you tested to see if it's still an issue? Pootle has had multiple updates since 2013.
Flags: needinfo?(dwayne)
(Reporter)

Comment 6

3 years ago
Hi there.
I want to share a screenshot with you which shows number of Errors in Persian localization of Mozilla products in locamotion website. It will describe what is the problem?
In that screenshot you can see we have 276 Number Error in Persian localization. You know every translated string must be compatible with the software code. If that string have problem with the code, pontoon will show count of errors in that localization project. Error means that strings maybe make problem in final localized product. 

Note: Number error means pootle saying a number in original string is missed in translated string.

Please read this paragraph carefully:
For example in this screenshot, you can see 276 Number Error in Persian localization project. If you look at the second screenshot you can see it's because pootle can not understand that number which is in green rectangle, is a number(but in Persian) and shows number error because pootle thinking the number in red rectangle is missed in translated string.


I will send fist screenshot in next post, and second screenshot in after that post.

As result, the problem not solved yet after many updates.
(Reporter)

Comment 7

3 years ago
Created attachment 8689604 [details]
Count of persian localization project's number errors
(Reporter)

Comment 8

3 years ago
Created attachment 8689609 [details]
Persian Number in Green rectangle

Comment 9

3 years ago
Hi Amir,

We've worked on this in the past but not had much success at making a cheap number checker.  We'll start looking at this again in the simplistic approach we listed in comment 3.

Note that the Numbers checks is listed as a 'functional' check.  These are checks that we don't regard as critical failures.  Thus they don't get as much priority when we're sorting through bugs.  Just like all checks they can be marked as false positives.  We haven't had major issues with this one as languages like Arabic use Latin numbers to date.

The fix is actually quite large and invasive.  We'll handle it unless an Persian Python coder wants to pick it up.

Please see: https://github.com/translate/pootle/issues/4203
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(dwayne)
(Assignee)

Updated

2 years ago
Product: Webtools → Webtools Graveyard
(Reporter)

Updated

7 months ago
Status: NEW → RESOLVED
Last Resolved: 7 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.