Closed Bug 983257 Opened 10 years ago Closed 6 years ago

[GSoC2014] [Week 3] Identify translatable text from extracted text nodes

Categories

(Intellego Graveyard :: General, defect)

Production
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: GPHemsley, Unassigned)

References

()

Details

Filter out DOM text nodes with untranslatable (or non-translatable) text from the text extracted by bug 983143, in order identify only the text can be translated.
Blocks: 983266
Continuing on from last week I researched on how the CSV file containing the translated terminology can be converted to the TBX format we wanted. That is when I came across the Translate Toolkit Project. It contained many useful command line utilities (http://translate-toolkit.readthedocs.org/en/latest/commands/index.html) for our project. 

Including a tool called poterminology that extracted terminology from the TMX files from transvision. I experimented with this tool to try and get the terminology extraction accurate. Using another command line utility I was able to convert the CSV file to a TBX file. 

The toolkit also contained a Django app called amaGama (http://docs.translatehouse.org/projects/amagama/en/latest/) that used a bilingual terminology file to create a translational memory data store that we could query over a REST API. I used amaGama to build a database containing a terminology by importing the Gaia CSV file. This tool was very important to the progress of the project, since we were not having much luck with our first objective of extracting terminology from the TMX files.
The idea is to build on top of amaGama (http://docs.translatehouse.org/projects/amagama/en/latest/), utilising postgres as the db. The main thing that amaGama provides is a way to search through the terminology database with great speed. We can build a web interface on top of this platform utilising its features.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.