Closed Bug 1049279 (Japanese-WordPrediction) Opened 10 years ago Closed 10 years ago

[Keyboard][ja] Recover Japanese ime dictionary for jp-kanji

Categories

(Firefox OS Graveyard :: Gaia::Keyboard, defect)

defect
Not set
normal

Tracking

(b2g-v2.1 fixed)

RESOLVED FIXED
2.1 S2 (15aug)
Tracking Status
b2g-v2.1 --- fixed

People

(Reporter: marsf, Assigned: marsf)

References

Details

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140716183446

Steps to reproduce:

Japanese dictionary data was removed in bug 808849.
Since bug 884752 was fixed, there is no reason to exclude Japanese dictionary.

BTW, the originally used IPAdic has been obsoleted and we can use the NAIST Japanese Dictionary as source. This dictionary is created based upon IPAdic and licensed with the Modified BSD license.
Its format is completely same with IPAdic, so we can use this in place of IPAdic.

See the license:
http://sourceforge.jp/projects/naist-jdic/docs/License.txt/ja/1/License.txt.txt

Download "naist-jdic-0.4.3.tar.gz":
http://sourceforge.jp/projects/naist-jdic/releases/

This bug blocks bug 935866 and bug 1005407.
Blocks: 935866, 1005407
The naist-jdic data has several lacks of minor kanji letters caused by EUC encoding.
We have to fix them in UTF-8 encoding before generating dict.json file.
The dictionary size of fixed version of json.dict is about 24MB (24,495,161 bytes).
And contains 300k words.
x json.dic
o dict.json

sorry.
Comment on attachment 8468952 [details] [review]
fixed json.dic

I cannot do the review because I failed to get jskanji input method to work after adding this dict.
If there is anything else I need to do to get it working?

BTW, do we need to update the REAME, dict/ and Makefile for this change?
https://github.com/mozilla-b2g/gaia/tree/master/apps/keyboard/js/imes/jskanji  

--
Also set feedback to Yuan since he knows much more about jskanji than me.
Attachment #8468952 - Flags: review?(rlu) → feedback?(xyuan)
Please be informed that Luke helped to revive the support of jskanji with bug 933252.
I just tried to add the dictionary to Gaia v2.0 and it could work, so it seems we have some updates happened in v2.1/master that broke the support again.
See Also: → 933252
(In reply to Rudy Lu [:rudyl] from comment #5)
> Comment on attachment 8468952 [details] [review]
> fixed json.dic
> 
> I cannot do the review because I failed to get jskanji input method to work
> after adding this dict.
> If there is anything else I need to do to get it working?
jskanji is broken and bug 1050120 was filed to fix it.
> 
> BTW, do we need to update the REAME, dict/ and Makefile for this change?
> https://github.com/mozilla-b2g/gaia/tree/master/apps/keyboard/js/imes/
> jskanji  
Yes, we do.
Comment on attachment 8468952 [details] [review]
fixed json.dic

Thanks for your contribution.

It is a valid dictionary file and could be loaded properly by the jskanji engine.

We can test with the dictionary when bug 1050120 is fixed.
Attachment #8468952 - Flags: feedback?(xyuan) → feedback+
Comment on attachment 8468952 [details] [review]
fixed json.dic

Thanks to Yuan and wdeng for their help to get bug bug 1050120 addressed.

Masahiko,

Could you please help take a look at my comment and provide an update, after that please help set review again?
==
BTW, do we need to update the REAME, dict/ and Makefile for this change?
https://github.com/mozilla-b2g/gaia/tree/master/apps/keyboard/js/imes/jskanji  
==

Thanks.
Comment on attachment 8468952 [details] [review]
fixed json.dic

Thank you for quick fix!

Rudy:
I've updated some files to use naist-jdic.
Could you please review them?
Attachment #8468952 - Flags: review?(rlu)
Comment on attachment 8468952 [details] [review]
fixed json.dic

This looks good to me except for the .diff patch included, is that an issue of iconv?

My concern would be what if the source dict get updated, then we have to get the .diff updated in this repo?

--
I'll redirect the review to Yuan to see if he could help on this.
Attachment #8468952 - Flags: review?(xyuan)
Attachment #8468952 - Flags: review?(rlu)
Attachment #8468952 - Flags: feedback+
This is the another .diff for Windows users.

FYI:
There is common issue that iconv (or nkf) cannot convert all EUC-JP character to UTF-8 correctly on Windows. This occurs when converting EUC-JP or SHIFT_JIS to UTF-8.

For example:
 "~" to "〜",
 "-" to "−",
 "£" to "£"

> My concern would be what if the source dict get updated, then we have to get the .diff updated in this repo?

Yes. But, the source dict has not been updated over 6 years. I think they will not add more words to dictionary.
If the source dict is updated to UTF-8, we also have to update Makefile and others. So, I specified the version of naist-jdic to "0.4.3" in README.
I tried to make on Linux and compared with Windows version of dict.json.

I found that the word order in dict.json is not same between Linux version and Windows version, and the issue of iconv occurs on Linux, too.  For that reason, my proposed patch may not work.

Current process:
1. Collect *.dic files in "dict/src" in one "dict" file.
2. Change its encoding, EUC-JP to UTF-8.  <-- this process has iconv issue.
3. Convert dict.utf8 to dict.json.  <-- word order is different between Windows and Linux.
4. Apply the patch.  <-- The patch cannot be applied.


Following process can be worked.

1. Collect *.dic files in "dict/src" in one "dict" file.
2. Change its encoding, EUC-JP to UTF-8.  <-- this process has iconv issue.
3. Apply the patch to "dict.utf8".  <-- fix error characters.
4. Convert dict.utf8 to dict.json.
I've updated the patch.

Yuan:
Could you please review again?
Comment on attachment 8468952 [details] [review]
fixed json.dic

Thanks!

For improvement, could you fix the raw dict file (dict.utf8) with conv.py instead of a diff patch?
Attachment #8468952 - Flags: review?(xyuan) → review+
(In reply to Yuan Xulei [:yxl] from comment #15)
> For improvement, could you fix the raw dict file (dict.utf8) with conv.py
> instead of a diff patch?

I've updated the patch.
Thank you.

https://github.com/marsf/gaia/commit/e1513a3cde75457e82415eba781d7fdf43e203f0
Masahiko, I think it's ready to land.
Please squash your commits into one and set checkin-needed keyword to ask the sheriff to land it.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Can we apply this to v2.0 or v1.4?
Keywords: checkin-needed
Master: https://github.com/mozilla-b2g/gaia/commit/ca63e6712ed29866e6e224f01c1c3386611e2496
Assignee: nobody → chimantaea_mirabilis
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → 2.1 S2 (15aug)
Alias: Japanese-WordPrediction
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: