Closed Bug 983257 Opened 11 years ago Closed 7 years ago

[GSoC2014] [Week 3] Identify translatable text from extracted text nodes

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: GPHemsley, Unassigned)

References

(
URL
)

Details

Gordon P. Hemsley [:GPHemsley]

Reporter

Description

•

11 years ago

Filter out DOM text nodes with untranslatable (or non-translatable) text from the text extracted by bug 983143, in order identify only the text can be translated.

Gordon P. Hemsley [:GPHemsley]

Reporter

Updated

•

11 years ago

Blocks: 983266

Tharshan

Comment 1

•

11 years ago

Continuing on from last week I researched on how the CSV file containing the translated terminology can be converted to the TBX format we wanted. That is when I came across the Translate Toolkit Project. It contained many useful command line utilities (http://translate-toolkit.readthedocs.org/en/latest/commands/index.html) for our project. Including a tool called poterminology that extracted terminology from the TMX files from transvision. I experimented with this tool to try and get the terminology extraction accurate. Using another command line utility I was able to convert the CSV file to a TBX file. The toolkit also contained a Django app called amaGama (http://docs.translatehouse.org/projects/amagama/en/latest/) that used a bilingual terminology file to create a translational memory data store that we could query over a REST API. I used amaGama to build a database containing a terminology by importing the Gaia CSV file. This tool was very important to the progress of the project, since we were not having much luck with our first objective of extracting terminology from the TMX files. The idea is to build on top of amaGama (http://docs.translatehouse.org/projects/amagama/en/latest/), utilising postgres as the db. The main thing that amaGama provides is a way to search through the terminology database with great speed. We can build a web interface on top of this platform utilising its features.

Jeff Beatty [:gueroJeff]

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

[GSoC2014] [Week 3] Identify translatable text from extracted text nodes

Categories

(Intellego Graveyard :: General, defect)

Tracking

(Not tracked)

People

(Reporter: GPHemsley, Unassigned)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated