Closed
Bug 1049279
(Japanese-WordPrediction)
Opened 10 years ago
Closed 10 years ago
[Keyboard][ja] Recover Japanese ime dictionary for jp-kanji
Categories
(Firefox OS Graveyard :: Gaia::Keyboard, defect)
Firefox OS Graveyard
Gaia::Keyboard
Tracking
(b2g-v2.1 fixed)
RESOLVED
FIXED
2.1 S2 (15aug)
Tracking | Status | |
---|---|---|
b2g-v2.1 | --- | fixed |
People
(Reporter: marsf, Assigned: marsf)
References
Details
Attachments
(3 files)
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140716183446
Steps to reproduce:
Japanese dictionary data was removed in bug 808849.
Since bug 884752 was fixed, there is no reason to exclude Japanese dictionary.
BTW, the originally used IPAdic has been obsoleted and we can use the NAIST Japanese Dictionary as source. This dictionary is created based upon IPAdic and licensed with the Modified BSD license.
Its format is completely same with IPAdic, so we can use this in place of IPAdic.
See the license:
http://sourceforge.jp/projects/naist-jdic/docs/License.txt/ja/1/License.txt.txt
Download "naist-jdic-0.4.3.tar.gz":
http://sourceforge.jp/projects/naist-jdic/releases/
This bug blocks bug 935866 and bug 1005407.
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 1•10 years ago
|
||
The naist-jdic data has several lacks of minor kanji letters caused by EUC encoding.
We have to fix them in UTF-8 encoding before generating dict.json file.
Assignee | ||
Comment 2•10 years ago
|
||
The dictionary size of fixed version of json.dict is about 24MB (24,495,161 bytes).
And contains 300k words.
Assignee | ||
Comment 3•10 years ago
|
||
Attachment #8468952 -
Flags: review?(rlu)
Assignee | ||
Comment 4•10 years ago
|
||
x json.dic
o dict.json
sorry.
Comment 5•10 years ago
|
||
Comment on attachment 8468952 [details] [review]
fixed json.dic
I cannot do the review because I failed to get jskanji input method to work after adding this dict.
If there is anything else I need to do to get it working?
BTW, do we need to update the REAME, dict/ and Makefile for this change?
https://github.com/mozilla-b2g/gaia/tree/master/apps/keyboard/js/imes/jskanji
--
Also set feedback to Yuan since he knows much more about jskanji than me.
Attachment #8468952 -
Flags: review?(rlu) → feedback?(xyuan)
Comment 6•10 years ago
|
||
Please be informed that Luke helped to revive the support of jskanji with bug 933252.
I just tried to add the dictionary to Gaia v2.0 and it could work, so it seems we have some updates happened in v2.1/master that broke the support again.
See Also: → 933252
Comment 7•10 years ago
|
||
(In reply to Rudy Lu [:rudyl] from comment #5)
> Comment on attachment 8468952 [details] [review]
> fixed json.dic
>
> I cannot do the review because I failed to get jskanji input method to work
> after adding this dict.
> If there is anything else I need to do to get it working?
jskanji is broken and bug 1050120 was filed to fix it.
>
> BTW, do we need to update the REAME, dict/ and Makefile for this change?
> https://github.com/mozilla-b2g/gaia/tree/master/apps/keyboard/js/imes/
> jskanji
Yes, we do.
Comment 8•10 years ago
|
||
Comment on attachment 8468952 [details] [review]
fixed json.dic
Thanks for your contribution.
It is a valid dictionary file and could be loaded properly by the jskanji engine.
We can test with the dictionary when bug 1050120 is fixed.
Attachment #8468952 -
Flags: feedback?(xyuan) → feedback+
Comment 9•10 years ago
|
||
Comment on attachment 8468952 [details] [review]
fixed json.dic
Thanks to Yuan and wdeng for their help to get bug bug 1050120 addressed.
Masahiko,
Could you please help take a look at my comment and provide an update, after that please help set review again?
==
BTW, do we need to update the REAME, dict/ and Makefile for this change?
https://github.com/mozilla-b2g/gaia/tree/master/apps/keyboard/js/imes/jskanji
==
Thanks.
Assignee | ||
Comment 10•10 years ago
|
||
Comment on attachment 8468952 [details] [review]
fixed json.dic
Thank you for quick fix!
Rudy:
I've updated some files to use naist-jdic.
Could you please review them?
Attachment #8468952 -
Flags: review?(rlu)
Comment 11•10 years ago
|
||
Comment on attachment 8468952 [details] [review]
fixed json.dic
This looks good to me except for the .diff patch included, is that an issue of iconv?
My concern would be what if the source dict get updated, then we have to get the .diff updated in this repo?
--
I'll redirect the review to Yuan to see if he could help on this.
Attachment #8468952 -
Flags: review?(xyuan)
Attachment #8468952 -
Flags: review?(rlu)
Attachment #8468952 -
Flags: feedback+
Assignee | ||
Comment 12•10 years ago
|
||
This is the another .diff for Windows users.
FYI:
There is common issue that iconv (or nkf) cannot convert all EUC-JP character to UTF-8 correctly on Windows. This occurs when converting EUC-JP or SHIFT_JIS to UTF-8.
For example:
"~" to "〜",
"-" to "−",
"£" to "£"
> My concern would be what if the source dict get updated, then we have to get the .diff updated in this repo?
Yes. But, the source dict has not been updated over 6 years. I think they will not add more words to dictionary.
If the source dict is updated to UTF-8, we also have to update Makefile and others. So, I specified the version of naist-jdic to "0.4.3" in README.
Assignee | ||
Comment 13•10 years ago
|
||
I tried to make on Linux and compared with Windows version of dict.json.
I found that the word order in dict.json is not same between Linux version and Windows version, and the issue of iconv occurs on Linux, too. For that reason, my proposed patch may not work.
Current process:
1. Collect *.dic files in "dict/src" in one "dict" file.
2. Change its encoding, EUC-JP to UTF-8. <-- this process has iconv issue.
3. Convert dict.utf8 to dict.json. <-- word order is different between Windows and Linux.
4. Apply the patch. <-- The patch cannot be applied.
Following process can be worked.
1. Collect *.dic files in "dict/src" in one "dict" file.
2. Change its encoding, EUC-JP to UTF-8. <-- this process has iconv issue.
3. Apply the patch to "dict.utf8". <-- fix error characters.
4. Convert dict.utf8 to dict.json.
Assignee | ||
Comment 14•10 years ago
|
||
I've updated the patch.
Yuan:
Could you please review again?
Comment 15•10 years ago
|
||
Comment on attachment 8468952 [details] [review]
fixed json.dic
Thanks!
For improvement, could you fix the raw dict file (dict.utf8) with conv.py instead of a diff patch?
Attachment #8468952 -
Flags: review?(xyuan) → review+
Assignee | ||
Comment 16•10 years ago
|
||
(In reply to Yuan Xulei [:yxl] from comment #15)
> For improvement, could you fix the raw dict file (dict.utf8) with conv.py
> instead of a diff patch?
I've updated the patch.
Thank you.
https://github.com/marsf/gaia/commit/e1513a3cde75457e82415eba781d7fdf43e203f0
Comment 17•10 years ago
|
||
Masahiko, I think it's ready to land.
Please squash your commits into one and set checkin-needed keyword to ask the sheriff to land it.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment 19•10 years ago
|
||
Assignee: nobody → chimantaea_mirabilis
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
status-b2g-v2.1:
--- → fixed
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → 2.1 S2 (15aug)
Comment 20•10 years ago
|
||
\o/
Updated•10 years ago
|
Blocks: WordPrediction
Updated•9 years ago
|
Alias: Japanese-WordPrediction
You need to log in
before you can comment on or make changes to this bug.
Description
•