Closed
Bug 983257
Opened 10 years ago
Closed 6 years ago
[GSoC2014] [Week 3] Identify translatable text from extracted text nodes
Categories
(Intellego Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: GPHemsley, Unassigned)
References
()
Details
Filter out DOM text nodes with untranslatable (or non-translatable) text from the text extracted by bug 983143, in order identify only the text can be translated.
Continuing on from last week I researched on how the CSV file containing the translated terminology can be converted to the TBX format we wanted. That is when I came across the Translate Toolkit Project. It contained many useful command line utilities (http://translate-toolkit.readthedocs.org/en/latest/commands/index.html) for our project. Including a tool called poterminology that extracted terminology from the TMX files from transvision. I experimented with this tool to try and get the terminology extraction accurate. Using another command line utility I was able to convert the CSV file to a TBX file. The toolkit also contained a Django app called amaGama (http://docs.translatehouse.org/projects/amagama/en/latest/) that used a bilingual terminology file to create a translational memory data store that we could query over a REST API. I used amaGama to build a database containing a terminology by importing the Gaia CSV file. This tool was very important to the progress of the project, since we were not having much luck with our first objective of extracting terminology from the TMX files. The idea is to build on top of amaGama (http://docs.translatehouse.org/projects/amagama/en/latest/), utilising postgres as the db. The main thing that amaGama provides is a way to search through the terminology database with great speed. We can build a web interface on top of this platform utilising its features.
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•