Closed Bug 1077757 (Galician-WordPrediction) Opened 10 years ago Closed 10 years ago

Galician Dictionary/Autocorrection

Categories

(Firefox OS Graveyard :: Gaia::Keyboard, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.1+, b2g-v2.1 verified, b2g-v2.2 verified)

VERIFIED FIXED
2.1 S7 (24Oct)
blocking-b2g 2.1+
Tracking Status
b2g-v2.1 --- verified
b2g-v2.2 --- verified

People

(Reporter: delphine, Assigned: rudyl)

References

Details

(Whiteboard: [LocRun2.1-1], [p=1])

Attachments

(1 file)

This bug is to create a Galician Dictionary in order to get word suggestion in Firefox OS.
Adding Galician localizer so he can give his input

(still trying to figure out with partner if we should block on this or not)
I've added Enrique to this bug, and he's let me know by email that he would be interested in working on this.
Can someone here explain to him how to proceed please?
thanks :)
Hi.

What I have to do?
Is A word list more frecuent in Galician? If this is so.
Format of the list
Number of words in the list
Size of the list
Minimum length of the words in the list

Delivery date? (Very important)

Thanks.
Hi Enrique,

* This is an example of a word list: https://raw.githubusercontent.com/mozilla-b2g/gaia/master/apps/keyboard/js/imes/latin/dictionaries/es_wordlist.xml
* Average corpus is 150.000 words (at least for Spanish & Catalan)
* There is no minimum length
* Frequency goes from 1-255. Words with freq = 0 are profane words.

Ping me on IRC if you have any questions.
Flags: needinfo?(eu)
[Blocking Requested - why for this release]:

Partner just confirmed to me that this is a blocker for the release.
Ideally, we'll need to be able to test this by the 2nd Localization Run, which starts Oct 20th
blocking-b2g: --- → 2.1?
Enrique might not have a lot of time to work on this before this due date. Kevin, might you be able to help as well, given the deadline and the fact this is going to block, please?
Flags: needinfo?(kscanne)
Hi.

I think that I will have the file for the weekend. Maybe before.

The format of the file is (of the entries):

<w f="225" flags="">de</w>

I have a file with the words more frecuents with the next format:

1	de	1222980	4.248183

order\tword\tfrecuency\tpercentage

The Reedme of the file:

----
Lista de frecuencias do CORGA: Corpus de Referencia do Galego Actual,
versión 1.6 (http://corpus.cirp.es/corga)

Copyright 2013 Guillermo Rojo, Marisol López Martínez,
Eva Domínguez Noya e Fco. Mario Barcala Rodríguez,
Centro Ramón Piñeiro para a Investigación en Humanidades.

Esta lista de frecuencias distribúese baixo a licenza Lesser General
Public License For Linguistic Resources. Véxase o arquivo COPYING
para os detalles.

A lista inclúe todas as palabras do CORGA ordenadas segundo a súa
frecuencia de aparición. O formato do arquivo que contén as entradas
(frecuencias.txt) é:

orde\tpalabra\tfrecuencia\tporcentaxe
---

I can use the file (the license let me use the file). Can we have some problem with the license (with other licenses of Firefox OS)?

I have to develop a script to change/generate the final file. I think I am able to do it.

The file size is:
wc frecuencias.txt 
  413706  1654808 11648590 frecuencias.txt

Can we have some problem with its size?

Best regards.
Flags: needinfo?(eu)
Enrique, I just going to respond and offer my help in creating a word list but it looks like you have something good already.  I have a corpus of about 30M words of Galician crawled from the web - I can share the frequency list if you'd like to compare with the one you have, or combine them in some way.
Flags: needinfo?(kscanne)
Hi Kevin.

Can you share with me that corpus? How do you generate the corpus (the script)? Can you share with me the script? How do you generate the final file (whith the format of Firefox OS, the xml)?

I am not a good developer. I don't know if I will be able to generate the file with the format of Firefox OS. I am going to try it.

I don't know, how can I compare your file and the mine?

I apreciate any help or suggestion.

Best regards.
Hi Enrique, I'll email you some stuff if that's ok.
Triage: blocking, shipping locale.

Assign to Enrique as he seems to be taking charge on the progress. Thanks!
Assignee: nobody → eu
Status: NEW → ASSIGNED
blocking-b2g: 2.1? → 2.1+
Target Milestone: --- → 2.1 S7 (Oct24)
I'm not concerned about file size by the way, just try to keep the number of words to ~150.000.
I am working in the file, but my knowledges about programing are limited.

The treatment of the file I am doing it in Python. But before I have to filter the words that are wrong. I am checking the spelling with hunspell (in the console). A way not very efficient, but I think that I will get the final file.

With my data (corpus, http://corpus.cirp.es/corga/frecuencias.html), the file will stay 235.071 words. Some problem with this size?

I will also compare my list of words with the generated list by Kevin.

If you want, I explain the process to get my word list. And I also share the scripts. So you can improve it. At these moments it is very slow.
Hi.

For this version of Firefox OS, I think that the next file is good.

The file has 199036 words.

In the file there is to change the format of the date in the header.

<wordlist locale="gl" description="Galego" date="Sun Oct 12 21:04:00 2014" version="1">

What format are you using for the data? I don't know how I can get it in python.

time.strftime("????")

I need that you confirm to me that the file is OK.

Best regards.
I have uploaded the file to my Dropbox's account.

https://www.dropbox.com/s/0twrqrc7zlujc8c/gl_wordlist.xml?dl=0
Enrique,

Thanks for providing the word list.

I think the format for the date is just a timestamp in second, so you just convert the date you got with any programming tool you have or through this online tool, http://www.epochconverter.com/.

BTW, you could reference Bug 1007547 about how to submit a pull request to add the layout and dictionary.
If you need me to take over from your wordlist, that would be fine, please feel feel to let me know.
Flags: needinfo?(eu)
Hi Rudy.

The next days I will not have time for doing it. Can you add the layout and dictionary for the galician?

It is important, I think that a provider (operator-company) request the dictionary for the galician (locale). I understood this to Delphine.
Flags: needinfo?(eu)
Attached file Patch V1
Patch created to add Galician layout.
 - This layout is copied from Spanish layout.
 - To host the Galician dictionary.

Jan,

Could you please help review this patch?

Enrique, would need your feedback on if we could take Spanish layout directly and the native name for Galician, "Galego".

Thanks.
Attachment #8504038 - Flags: review?(janjongboom)
Attachment #8504038 - Flags: feedback?(eu)
I think so. I have done a diff with the es.js file and all seems ok.

Use Spanish layout? Yes.
The changed names? They are ok. (Galician, gl and galego).

You have to change the format of the time in the gl_wordlist.xml file.

Thank you so much.
(In reply to Enrique Estévez Fernández from comment #16)

> It is important, I think that a provider (operator-company) request the
> dictionary for the galician (locale). I understood this to Delphine.

Yes, they have confirmed they need this for 2.1
(In reply to Enrique Estévez Fernández from comment #18)
> I think so. I have done a diff with the es.js file and all seems ok.
> 
> Use Spanish layout? Yes.
> The changed names? They are ok. (Galician, gl and galego).
> 
> You have to change the format of the time in the gl_wordlist.xml file.
> 
> Thank you so much.

Ah, thanks for pointing this out, will update it before merging this patch.
Comment on attachment 8504038 [details] [review]
Patch V1

Also ask for Tim's help to review this since this is a v2.1+ blocker.
Thanks.


--
Took comment 18 as a f+.
Attachment #8504038 - Flags: review?(timdream)
Attachment #8504038 - Flags: feedback?(eu)
Attachment #8504038 - Flags: feedback+
Comment on attachment 8504038 [details] [review]
Patch V1

nit: See Github.
Attachment #8504038 - Flags: review?(timdream)
Attachment #8504038 - Flags: review?(janjongboom)
Attachment #8504038 - Flags: review+
Thanks for the noticing this nit.

master,
https://github.com/mozilla-b2g/gaia/commit/762bef92d1f3628ba5ff91298c8608d1670d8ff1
Assignee: eu → rlu
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [LocRun2.1-1] → [LocRun2.1-1], [p=1]
Comment on attachment 8504038 [details] [review]
Patch V1

[Approval Request Comment]
[Bug caused by] (feature/regressing bug #): This is a new feature to add Galician keyboard layout.
[User impact] if declined: Cannot present a Galician based keyboard layout and this is a shipping locale.
[Testing completed]: yes.
[Risk to taking this patch] (and alternatives if risky): pretty low, the layout file itself is just a definition of the layout without logic inside.
[String changes made]: N/A
Attachment #8504038 - Flags: approval-gaia-v2.1?
This should not be uplifted automatically, since the layout definition has changed.
Whiteboard: [LocRun2.1-1], [p=1] → [LocRun2.1-1], [p=1], NO_UPLIFT
Attachment #8504038 - Flags: approval-gaia-v2.1? → approval-gaia-v2.1+
v2.1,
2904ab80816896f569e2d73958427fb82aebaea5
Whiteboard: [LocRun2.1-1], [p=1], NO_UPLIFT → [LocRun2.1-1], [p=1]
This issue is verified fixed on Flame 2.2 and 2.1:
Galician word suggestion and autocorrection work properly with Galician keyboard.
  
Flame 2.2
Device: Flame 2.2 Master KK (319mb, Full Flash)
BuildID: 20141022040201
Gaia: db7720c2ff58fdba6ae59595329e63c719ecb63f
Gecko: ae4d9b4ff2ee
Gonk: 05aa7b98d3f891b334031dc710d48d0d6b82ec1d
Version: 36.0a1 (2.2 Master)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:36.0) Gecko/36.0 Firefox/36.0
  
Flame 2.1
Device: Flame 2.1 KK (319mb, Full Flash)
BuildID: 20141022001201
Gaia: 734d3547fb6c65e8bc4dd1a52b26f70bdfee7474
Gecko: 928b18f7d8ff
Gonk: 05aa7b98d3f891b334031dc710d48d0d6b82ec1d
Version: 34.0 (2.1)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: verifyme
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Alias: Galician-WordPrediction
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: