Closed Bug 983146 Opened 10 years ago Closed 6 years ago

[GSoC2014] [Week 6] All-At-Once terminology replacement method

Categories

(Intellego Graveyard :: General, defect)

Production
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gueroJeff, Unassigned)

References

()

Details

Tracking bug.

All-At-Once Replacement Method: Regenerate the DOM with matched target terminology, output content into a new webpage, and render it.
Terminology matching takes place over the entirety of extracted text from DOM, rather than node by node (or segment by segment). Regenerate the DOM with matched target terminology, output content into a new webpage, and render it.
Summary: [meta] Post-processes website regeneration → [meta] All-At-Once terminology replacement method
Depends on: 983149
Keywords: meta
Summary: [meta] All-At-Once terminology replacement method → [GSoC2014] [Week 6] All-At-Once terminology replacement method
Keywords: meta
No longer blocks: 983138
No longer depends on: 983149
No longer blocks: 983144
Depends on: 983144
No longer blocks: 983143
Last week I had progressed with the project in making a working prototype of web page translator. My mentor had pointed out a few issues to fix to improve the system. The TBX file we used had a few errors, some segment pairs had translated words separated by commas. The string replacement was also not taking into account pluralisation or capitalisation when replacing the source string. I also found that the text contents were sent all at once and I needed to attempt a different method to check for any difference. A segment by segment method seemed like a good approach.

The current approach to translating the text makes use of Javascript to replace the text found in the DOM. Through research I came across the NLTK in python and it has many utilities that we could reuse for our project such as tokenising segments of text. Moving the translation to the server side meant that utilities provided by NLTK could be used in the process to translate the DOM and sent to the browser already translated to the target language.

All hyperlinks on the website has to be changed so that when clicked it would load within the iframe and go through our proxy. Many sites including Mozilla Support sites have the X-Frame:Deny header, meaning that the website cannot be browsed within an IFrame. To get around this issue, we load each link through our translation engine - so it fetches the raw html, translates the DOM and send it to the client for the IFrame to load.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.